The problem is that just providing the schema isn’t enough information for ChatGPT. You need to provide it with details about things like:
- what the
application_requestsreq_typeinteger codes stand for - what the
topicsuser_idcolumn is used for - what the
user_actionstable’saction_typecodes stand for a what’s the difference between that table’suser_id,target_user_id, andacting_user_idcolumns
With those types of details, GPT 3.5 seems to do a good job without any additional training. The problem then becomes that to provide this level of detail about the entire database will result in the prompt exceeding the ChatGPT token limit (4096 tokens, including both the prompt text and the generated output.) If this type of approach was used, there’s need to be a way to limit what gets set in the prompt based on what information the user wanted to get from the Data Explorer query.