With ChatCPT 3.5 I’m getting unexpected results when I provide the full schema. For example, it keeps trying to find the topic author in the topic_users
table.
I get successful results by limiting the information in the prompt to the tables and columns that are required to write the query. Also by adding some additional information about how the columns are used. For example:
# Table name: user_actions
#
# id :integer not null, primary key
# action_type :integer not null (:like=>1,:was_liked=>2,:new_topic=>4,:reply=>5,:response=>6,:mention=>7,:quote=>9,:edit=>11,:new_private_message=>12,:got_private_message=>13,:solved=>15,:assigned=>16)
# user_id :integer not null (the user who is to be credited with the action)
# target_topic_id :integer
# acting_user_id :integer (the user who performed the action, for example, a staff user can perform an action on behalf of a regular user)
# created_at :datetime not null
Another way to look at getting this to work with the Data Explorer plugin would be to have users fill out a dynamically generate form that lists the data they are looking for, and what conditions they want to apply to the data. The prompt could be generated programmatically by Discourse, then sent to the LLM to have the query written.
Edit: @jordan-violet, this might be a useful starting point. Here’s an annotated partial schema that’s working for me. It works well as long as I limit my queries so that they can be answered by the provided schema. I’ve added additional details to the schema to clarify details that seem to confuse ChatGPT. The obvious downside of this approach is that a fully annotated schema will exceed ChatGPT 3.5’s token limit. I’ve included some information in the schema that could be omitted to reduce the number of tokens that are used.
Summary
# == Schema Information
#
# Table name: application_requests
#
# id :integer not null, primary key
# date :date not null
# req_type :integer not null ("http_total"=>0,"http_2xx"=>1,"http_background"=>2,"http_3xx"=>3,"http_4xx"=>4,"http_5xx"=>5,"page_view_crawler"=>6,"page_view_logged_in"=>7,"page_view_anon"=>8,"page_view_logged_in_mobile"=>9,"page_view_anon_mobile"=>10,"api"=>11,"user_api"=>12)
# count :integer default(0), not null
#
# Table name: users
#
# id :integer not null, primary key
# username :string(60) not null
# created_at :datetime not null
# updated_at :datetime not null
# name :string (the user's real name)
# last_posted_at :datetime
# active :boolean default(FALSE), not null
# username_lower :string(60) not null
# last_seen_at :datetime
# admin :boolean default(FALSE), not null
# trust_level :integer not null
# approved :boolean default(FALSE), not null
# approved_by_id :integer
# approved_at :datetime
# previous_visit_at :datetime
# suspended_at :datetime
# suspended_till :datetime
# date_of_birth :date
# ip_address :inet
# moderator :boolean default(FALSE)
# title :string
# locale :string(10)
# primary_group_id :integer
# registration_ip_address :inet
# staged :boolean default(FALSE), not null
# first_seen_at :datetime
# silenced_till :datetime
#
# Table name: topics
#
# id :integer not null, primary key
# title :string not null
# last_posted_at :datetime
# created_at :datetime not null
# updated_at :datetime not null
# views :integer default(0), not null
# posts_count :integer default(0), not null
# user_id :integer (the id of the user who created the topic)
# last_post_user_id :integer not null (the id of the user who created the last post in the topic)
# reply_count :integer default(0), not null
# deleted_at :datetime
# highest_post_number :integer default(0), not null
# like_count :integer default(0), not null
# category_id :integer
# visible :boolean default(TRUE), not null
# moderator_posts_count :integer default(0), not null
# closed :boolean default(FALSE), not null
# archived :boolean default(FALSE), not null
# bumped_at :datetime not null
# archetype :string default("regular"), not null (can be set to either "regular" or "private_message")
# slug :string
# deleted_by_id :integer (the id of the user who deleted the topic)
# participant_count :integer default(1)
# word_count :integer
# excerpt :string
# highest_staff_post_number :integer default(0), not null
#
# Table name: posts
#
# id :integer not null, primary key
# user_id :integer (the id of the user who created the post)
# topic_id :integer not null
# post_number :integer not null (indicates the post's order in its topic)
# raw :text not null (the post's content)
# created_at :datetime not null
# updated_at :datetime not null
# reply_to_post_number :integer (the post_number that the post is a reply to)
# reply_count :integer default(0), not null
# deleted_at :datetime
# like_count :integer default(0), not null
# bookmark_count :integer default(0), not null
# reads :integer default(0), not null (the number of times the post has been read)
# post_type :integer default(1), not null (:regular=>1, :moderator_action=>2, :small_action=>3, :whisper=>4)
# last_editor_id :integer (the id of the user who last edited the post)
# hidden :boolean default(FALSE), not null
# hidden_reason_id :integer (:flag_threshold_reached=>1,:flag_threshold_reached_again=>2,:new_user_spam_threshold_reached=>3,:flagged_by_tl3_user=>4,:email_spam_header_found=>5,:flagged_by_tl4_user=>6,:email_authentication_result_header=>7,:imported_as_unlisted=>8)
# edit_reason :string
# word_count :integer
# wiki :boolean default(FALSE), not null
#
# Table name: categories
#
# id :integer not null, primary key
# name :string(50) not null
# topic_id :integer (the id of the topic that is used for the category's description)
# topic_count :integer default(0), not null
# created_at :datetime not null
# updated_at :datetime not null
# user_id :integer not null (the id of the user who created the topic)
# topics_year :integer default(0)
# topics_month :integer default(0)
# topics_week :integer default(0)
# slug :string not null
# description :text
# text_color :string(6) default("FFFFFF"), not null
# read_restricted :boolean default(FALSE), not null
# auto_close_hours :float
# post_count :integer default(0), not null
# latest_post_id :integer
# latest_topic_id :integer
# position :integer
# parent_category_id :integer
# posts_year :integer default(0)
# posts_month :integer default(0)
# posts_week :integer default(0)
# topics_day :integer default(0)
# posts_day :integer default(0)
#
# Table name: groups
#
# id :integer not null, primary key
# name :string not null
# created_at :datetime not null
# automatic :boolean default(FALSE), not null
# user_count :integer default(0), not null
# title :string
# bio_raw :text (the group's description)
# allow_membership_requests :boolean default(FALSE), not null
# full_name :string
# visibility_level :integer default(0), not null (who can see the group :public=>0, :logged_on_users=>1, :members=>2, :staff=>3, :owners=>4)
# messageable_level :integer default(0) (who can message the group :public=>0, :logged_on_users=>1, :members=>2, :staff=>3, :owners=>4)
# mentionable_level :integer default(0) (who can mention the group :public=>0, :logged_on_users=>1, :members=>2, :staff=>3, :owners=>4)
# members_visibility_level :integer default(0), not null (who see the group's members :public=>0, :logged_on_users=>1, :members=>2, :staff=>3, :owners=>4)
#
# Table name: group_users
#
# id :integer not null, primary key
# group_id :integer not null
# user_id :integer not null
# created_at :datetime not null
#
# Table name: user_actions
#
# id :integer not null, primary key
# action_type :integer not null (:like=>1,:was_liked=>2,:new_topic=>4,:reply=>5,:response=>6,:mention=>7,:quote=>9,:edit=>11,:new_private_message=>12,:got_private_message=>13,:solved=>15,:assigned=>16)
# user_id :integer not null (the user who is to be credited with the action)
# target_topic_id :integer
# target_post_id :integer
# target_user_id :integer (for example, the id of the user whose post was liked)
# acting_user_id :integer (the user who performed the action, for example, a staff user can perform an action on behalf of a regular user)
# created_at :datetime not null
# updated_at :datetime not null
#
# Table name: topic_views
#
# topic_id :integer not null
# viewed_at :date not null
# user_id :integer (will be set if the topic was viewed by a logged in user)
# ip_address :inet (will be set if the topic was viewed by an anonymous user)
#
# Table name: user_visits
#
# id :integer not null, primary key
# user_id :integer not null
# visited_at :date not null
# posts_read :integer default(0)
# mobile :boolean default(FALSE) (will be set to TRUE if the user visited on a mobile device)
# time_read :integer default(0), not null (the value returned is the number of seconds of the visit)
Here’s a link to a chat session where I tried it out: https://chat.openai.com/share/d108c104-3aa3-45d9-9161-6da21d5b3a77
If you want to try expanding on this list, the easiest way to do it is to copy the schema from the bottom of the model you’re interested in, directly from the Discourse code: discourse/app/models at main · discourse/discourse · GitHub.