It might be useful to define some human generated benchmarks for the summaries, then keep iterating prompts until the results meet or exceed those benchmarks.
A couple of examples:
Gist for Understanding and managing bootstrap mode
Discourse’s bootstrap mode is a special state facilitating community growth, automatically adjusting user trust, communication frequency, and directory updates, and can be identified through the “Getting started” button or staff action logs.
Gist for a bug topic:
Forum theme is causing the user menu visibility issue, which can be tracked and solved, making it a manageable problem to fix despite being a little challenging.
I don’t think either of those excerpts are approaching human level performance - with human level performance defined as what can be achieved by a good writer. My guess is that the problem is that the LLM is being given an impossible task by being asked to fit too much information into a single sentence.
The primary goal of the excerpt should be to give users some idea of what to expect in the topic. It doesn’t need to do much beyond that.
It’s likely that the LLM that’s generating the excerpts needs to be aware of the context of the topic. For example, for the bootstrap documentation topic, I’d expect a simple definition of bootstrap mode. For a topic where the OP is a user generated question, the excerpt might just restate the question in terms that are likely to be understood by the site’s users. A topic initiated by a highly technical user might have an excerpt that uses a few technical terms, in order to attract the right audience to the topic.