That is my favorite option so far, albeit a very hard one to implement. Thus, you guessed it, another paper. Again this is not an authoritative paper as there are many similar approaches.
āDiagGPT: An LLM-based Chatbot with Automatic Topic Management for Task-Oriented Dialogueā by Lang Cao (pdf)
Just replace the medical information with PostgreSQL information as needed. The nice part is the paper gives the prompts and suggest which tool with agents to use.
Going down the rabbit hole. (Click triangle to enter)
Since Task Oriented Dialogue
is what appears is needed.
Google search: task oriented dialogue
Search result includes: Papers With Code - Task-Oriented Dialogue Systems
The leader board entry is T5-3b(UnifiedSKG)
which includes a link to paper
āUnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Modelsā by Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida I. Wang, Victor Zhong, Bailin Wang, Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer and Tao Yu (pdf)
Notice this
EDIT
From https://python.langchain.com/
Construct an SQL agent from an LLM and tools. (ref)
It list a few at the bottom of the page, this one should be looked at
Use case
Enterprise data is often stored in SQL databases.
LLMs make it possible to interact with SQL databases using natural language.
LangChain offers SQL Chains and Agents to build and run SQL queries based on natural language prompts.
These are compatible with any SQL dialect supported by SQLAlchemy (e.g., MySQL, PostgreSQL, Oracle SQL, Databricks, SQLite).
They enable use cases such as:
- Generating queries that will be run based on natural language questions
- Creating chatbots that can answer questions based on database data
- Building custom dashboards based on insights a user wants to analyze
EDIT (08/23/2023)
Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. It consists of 10,181 questions and 5,693 unique complex SQL queries on 200 databases with multiple tables covering 138 different domains. In Spider 1.0, different complex SQL queries and databases appear in train and test sets. To do well on it, systems must generalize well to not only new SQL queries but also new database schemas .
EDIT (0824/2023)
Dataherald is a natural language-to-SQL engine built for enterprise-level question answering over structured data. It allows you to set up an API from your database that can answer questions in plain English.
As I often tell others, for some things in life you just have to wait and someone will do the work for you for free.