The problem is that just providing the schema isn’t enough information for ChatGPT. You need to provide it with details about things like:
- what the
application_requests
req_type
integer codes stand for - what the
topics
user_id
column is used for - what the
user_actions
table’saction_type
codes stand for a what’s the difference between that table’suser_id
,target_user_id
, andacting_user_id
columns
With those types of details, GPT 3.5 seems to do a good job without any additional training. The problem then becomes that to provide this level of detail about the entire database will result in the prompt exceeding the ChatGPT token limit (4096 tokens, including both the prompt text and the generated output.) If this type of approach was used, there’s need to be a way to limit what gets set in the prompt based on what information the user wanted to get from the Data Explorer query.