This many and varied replies to a feature request was not expected, usually after posting they sit there and collect dust, the replies are appreciated.
Here is a walk through of two scenarios related to this that will hopefully give a better understanding of the feature request.
- User identifies hallucinations
A user uses an LLM, e.g. ChatGPT, for information about Gregorian chants. They paste the ChatGPT completion into a Discourse reply. For the parts of the reply/completion that have hallucinations the user selects the data, clicks a icon for hallucinations and the meta data for the section, think HTML span or similar, is updated to show the span contains a hallucination.
The spans could be as small as an option for a command line. For this command line generated from ChatGPT
gregorio -i --gregfont=âCaeciliaeâ myfile.gabc
it seems that the gregfont
options is a hallucination, thus this section should be marked --gregfont="Caeciliae"
as a hallucination.
If one were to inspect the HTML before and after annotating then something like this would be seen
Before
<pre>
<code class="hljs language-bash">gregorio -i --gregfont=<span class="hljs-string">"Caeciliae"</span> myfile.gabc
</code>
</pre>
After
<pre>
<code class="hljs language-bash">gregorio -i <span class="hallucination">--gregfont=<span class="hljs-string">"Caeciliae"</span></span> myfile.gabc
</code>
</pre>
- API consumes data with hallucinations
A user is searching for a command line to create Gregorian chat sheet music, they adjust the query to not include hallucinations. As the search engine generates results it finds a hit for a page with the command
gregorio -i --gregfont=âCaeciliaeâ myfile.gabc
The search engine then checks the command lines on the page and finds the specific one of note. The search engine then checks the command line to see if it contains a hallucination and finds the span element with the hallucination and does not include that in the search result.
Obviously one could create a plugin for tools such as Chrome to add the needed spans but there also needs to be a standard, think RFC, of the meta data to make it parseable for use with APIs.
The scenarios above were tailored for web pages but similar should apply for LaTeX, etc.
While the scenarios above only used a scalar to identify a hallucination, the meta data could be more complex, think JSON or algebraic data type.
Gregorio references