Discourse AI + Data Explorer?

I deleted it because I thought I’d found a better way of dealing with the issue - why type out all those tables when ChatGPT could just be asked to parse a file that’s hosted on the internet that contains detailed information about the entire database? It turns out there are technical problems with that approach though. So I’m back to what I suggested in the deleted post. In any case, descriptions of tables, example queries, etc, are going to have to be created. It’s just a question of how to get that information to the LLM. Here’s where things are at with GPT-4:

Yes, the approach you’ve outlined makes sense and is a feasible way to use my capabilities to assist in generating SQL queries based on natural language questions.

Here’s a breakdown of the steps:

  1. Question Interpretation: The user poses a question in natural language, which I interpret to understand the information being requested.
  2. Table Identification: Based on the question and the list of all tables in the Discourse database, I identify the tables that are most likely to be relevant to the query.
  3. Detailed Table Information: Discourse sends me detailed descriptions of the relevant tables, including all columns and example queries.
  4. Query Generation: Using the detailed table information, I generate an SQL query that answers the user’s question.

This approach leverages my ability to understand natural language and map it to structured data queries. It also makes good use of the detailed table information provided by Discourse.

As for alternate approaches, one possibility could be to train a model specifically on Discourse’s database schema and common query patterns. This would likely involve a significant amount of work and may not be feasible depending on the resources available. However, it could potentially result in more accurate and efficient query generation.

Another approach could be to develop a more interactive dialogue where I ask clarifying questions when the user’s question is ambiguous or when multiple queries could potentially answer the question. This would make the process more interactive and could potentially lead to more accurate queries.

For the alternate approaches suggested in the last 2 paragraphs, I’m ignoring the first one, but if what I’m attempting doesn’t lead to reliable results, it might be something that the Discourse team could look into.

I asked GPT-4 about the second alternate approach. It’s fairly confident that given a question that couldn’t be answered via the provided table structure, it would be able to prompt the user to try to re-frame their question.

So I’m back to writing descriptions of all tables in the Discourse database. I now know what the allowed_pm_users table is used for. The ar_internal_metadata table is a bit of a head scratcher, but I think it can be ignored for now.

「いいね!」 2