Discourse AI + Data Explorer?

您能分享一些边缘情况的例子吗?

1 个赞

当然,我计划下周再进行一轮,你可以试试我在仓库里分享的 ruby 脚本。

@simon 我认为真正的价值在于理解,至少在可预见的未来,这永远不会一次性做好。但如果你知道你想要什么,你就可以像一个不知疲倦的实习生一样引导它,让它去做繁重的工作。

所以,除了记得select from where之外,我没有任何关于SQL的现有知识,并且知道我想要最终结果是什么,我通过与它进行对话就得到了它构建了我想要的查询,而且并没有真正影响我做我的日常工作。这真的就像拥有一个免费的个人助理/实习生,我只需要不断地把它引导到正确的方向。

首先,这是我的最终查询。我想要一个查询,它能返回点赞最多的前100名用户,并给出他们的用户ID、用户名、总点赞数,以及他们获得最多点赞的帖子的链接。我不知道/不知道如何获得这个,坦白说甚至不知道从哪里开始。当我想要这样的东西时,我会去打扰我的一个工程师。我能够不打扰他们,不减缓我自己的工作速度,仍然指导/指示ChatGPT完成我需要完成的事情。

最终查询:

WITH Most_Liked_Posts AS (
  SELECT
    p.user_id,
    p.topic_id,
    p.post_number,
    ROW_NUMBER() OVER (PARTITION BY p.user_id ORDER BY likes_count DESC) AS row_number
  FROM (
    SELECT
      p.user_id,
      p.topic_id,
      p.post_number,
      COUNT(l.id) AS likes_count
    FROM
      posts p
      LEFT JOIN post_actions l ON p.id = l.post_id AND l.post_action_type_id = 2
    WHERE
      p.user_id NOT IN (SELECT id FROM users WHERE username = 'codey')
    GROUP BY
      p.user_id,
      p.topic_id,
      p.post_number
  ) p
),
User_Likes AS (
  SELECT
    u.id AS user_id,
    COUNT(pa.id) AS total_likes
  FROM
    users u
    LEFT JOIN posts p ON u.id = p.user_id
    LEFT JOIN post_actions pa ON p.id = pa.post_id AND pa.post_action_type_id = 2
  WHERE
    u.username != 'codey'
  GROUP BY
    u.id
)
SELECT
  ul.user_id,
  u.username AS username,
  ul.total_likes,
  '\u003ca href=\"/discuss/t/' || mlp.topic_id || '/' || mlp.post_number || '\"\u003e' || 'Link to most-liked post' || '\u003c/a\u003e' AS html$post
FROM
  User_Likes ul
  JOIN users u ON ul.user_id = u.id
  LEFT JOIN Most_Liked_Posts mlp ON ul.user_id = mlp.user_id AND mlp.row_number = 1
ORDER BY
  ul.total_likes DESC
LIMIT 100

在开始提示之前,我在ChatGPT账户中的说明:

我只会问你与PostgreSQL相关的查询,并且我只希望你回复相关的SQL查询。

这些查询都与我的社区平台Discourse中的数据库有关。
回复SQL查询和查询说明。不要在SQL语句末尾使用任何分号,因为它们是不需要的。

请在此处查看我在ChatGPT上构建此查询的完整对话:

3 个赞

@sam 使用上述完整的 schema(再次抱歉,我不知道在哪里可以获取它以及/或者以这种格式获取它),并使用 langchain 和向量数据库在将文档发送到 ChatGPT 之前 正确处理它们,再加上您可能拥有的关于使用 Data Explorer 的任何文档……我相当确信这个过程将非常接近魔法。

我尝试了一个向量数据库,并输入了相似的示例,但它引导得太用力了,我们可能需要 1000 个非常接近的示例才能达到神奇的效果。

1 个赞

我也会让我的工程师试试,因为我们已经建立了一个“工厂”来快速生产这些。\n\n有没有什么地方可以提供所有数据库模式的完整详尽的文档,格式如下?

# == Schema Information
#
# Table name: application_requests
#
#  id       :integer          not null, primary key
#  date     :date             not null
#  req_type :integer          not null ("http_total"=>0,"http_2xx"=>1,"http_background"=>2,"http_3xx"=>3,"http_4xx"=>4,"http_5xx"=>5,"page_view_crawler"=>6,"page_view_logged_in"=>7,"page_view_anon"=>8,"page_view_logged_in_mobile"=>9,"page_view_anon_mobile"=>10,"api"=>11,"user_api"=>12)
#  count    :integer          default(0), not null
#
# Table name: users
#
#  id                        :integer          not null, primary key
#  username                  :string(60)       not null
#  created_at                :datetime         not null
#  updated_at                :datetime         not null
#  name                      :string           (the user's real name)
#  last_posted_at            :datetime
#  active                    :boolean          default(FALSE), not null
#  username_lower            :string(60)       not null
#  last_seen_at              :datetime
#  admin                     :boolean          default(FALSE), not null
#  trust_level               :integer          not null
#  approved                  :boolean          default(FALSE), not null
#  approved_by_id            :integer
#  approved_at               :datetime
#  previous_visit_at         :datetime
#  suspended_at              :datetime
#  suspended_till            :datetime
#  date_of_birth             :date
#  ip_address                :inet
#  moderator                 :boolean          default(FALSE)
#  title                     :string
#  locale                    :string(10)
#  primary_group_id          :integer
#  registration_ip_address   :inet
#  staged                    :boolean          default(FALSE), not null
#  first_seen_at             :datetime
#  silenced_till             :datetime
1 个赞

这种格式在令牌方面非常浪费,但您可以在数据浏览器中直接从 pg schema 表中查询它 :slight_smile:

2 个赞

哈哈,好的,我会让他去检查一下。就像我说的,我不会再深入追究了。如果是我,我可能就不会问这些问题了!

1 个赞

我真的很想让某件事成功,但这太难了。

3 个赞

这只是一个想法,我还没有尝试过。
我认为所需的查询可以根据职责进行细分:

  • 版主
  • 管理员
  • 开发人员

因此,所需的表可以分组为不断增长的集合,其中最小的集合是版主所需的表。

现在,版主所需的大部分数据将基于具有特定列的表的常见连接,因此需要一个视图

如果使用常用视图而不是整个模式,那么希望许多提示只需要传递视图而不是整个模式,从而使 LLM 更容易生成可能的解决方案。

希望有所帮助


对于更难的查询,如果你知道得足够多以至于需要这样的查询,那么你很可能也知道如何构建查询。

1 个赞

我试过了。结果参差不齐。一个有趣的实验是为 GPT-3.5 提供一个最小化的、带注释的 Discourse 数据库模式,以测试其 SQL 能力。我知道这在 token 方面效率不高,但它可读性强:

最小模式
# == Schema Information
#
# Table name: users
#
#  id                        :integer          not null, primary key
#  username                  :string(60)       not null
#  created_at                :datetime         not null
#
# Table name: groups
#
#  id                                 :integer          not null, primary key
#  name                               :string           not null
#  created_at                         :datetime         not null
#
# Table name: group_users
#
#  id                 :integer          not null, primary key
#  group_id           :integer          not null
#  user_id            :integer          not null
#
# Table name: posts
#
#  id                      :integer          not null, primary key
#  user_id                 :integer
#  topic_id                :integer          not null
#  deleted_at              :datetime         (应用程序“软删除”帖子。当帖子被删除时,其 `deleted_at` 属性被设置为 :datetime。除非明确要求返回已删除的帖子,否则在编写与帖子相关数据的查询时,请确保 `deleted_at` 列为 `NOT NULL`。)
#
# Table name: topics
#
#  id                        :integer          not null, primary key
#  title                     :string           not null
#  category_id               :integer
#  created_at                :datetime         not null
#  user_id                   :integer          (创建主题的用户 ID)
#  deleted_at                :datetime         (应用程序“软删除”主题。当主题被删除时,其 `deleted_at` 属性被设置为 :datetime。除非明确要求返回已删除的主题,否则在编写与主题相关数据的查询时,请确保 `deleted_at` 列为 `NOT NULL`。)
#
# Table name: categories
#
#  id                                        :integer          not null, primary key
#  name                                      :string(50)       not null

到目前为止,我得到的结果令人印象深刻。这让我认为,只需在提示中提供一个带注释的模式就可以解决问题。

为了好玩,我问它可能会在哪些查询上遇到困难:

GPT-3.5 限制
  1. 分层查询: 涉及分层数据结构(如递归关系)的查询可能很复杂。例如,在类别层次结构中查询父子关系。
  2. 时间查询: 涉及时间数据的查询,例如计算历史趋势或分析特定时间间隔内的数据,可能需要更复杂的 SQL 结构。
  3. 窗口函数: 虽然我在之前的查询中使用了窗口函数,但窗口函数的更高级用法,用于复杂的分析任务,可能会有挑战性。
  4. 地理空间查询: 空间数据库和基于地理数据(如查找附近位置或复杂空间关系)的查询可能很复杂。
  5. 复杂聚合: 涉及多个分组级别和计算的复杂聚合可能需要更复杂的 SQL 知识。
  6. 优化查询: 编写涉及索引、查询调优和性能改进的高度优化的查询可能很复杂。
  7. 自定义算法: 在 SQL 查询中实现自定义算法或逻辑可能由于所涉及逻辑的复杂性而具有挑战性。
  8. 高级 JOIN: 复杂的连接条件或组合多个连接可能需要仔细考虑,并可能需要更复杂的 SQL 结构。
  9. 子查询和相关子查询: 虽然我可以处理子查询,但极其复杂或深度嵌套的子查询可能更具挑战性。
  10. 透视和反透视: 在某些情况下,在宽格式和高格式之间转换数据(透视和反透视)可能会变得复杂。

我发现问题在于尝试区分完整的数据库模式。例如,找到一种注释 user_actions 表的方法。仅提供其 action_type 代码的定义是不够的。它开始猜测 user_idtarget_user_idacting_user_id

最常请求的查询不使用数据库中的大部分表和列。如果将 AI 添加到 Data Explorer 中,可能值得考虑设置“基本”和“高级”模式。“基本”模式可以提供涵盖大多数用例的提示。“高级”模式允许用户选择在提示中设置哪些信息。

从一些关于元数据的查询请求反向工作,看看需要向提示提供什么才能让 GPT-3.5 成功创建查询,这可能会很有趣。

3 个赞

一种潜在的 Langchain 方法,我们首先要求 gpt 确定哪些表是相关的,然后进入第二个阶段生成 sql,可能会有帮助。

3 个赞

我们自己的产品实现目前使用了 langchain。我们实际上构建了一个可重用的工厂,我将让我的首席工程师尽快试用一下。

不过正如我所说,我对目前的结果非常满意。这就像有一个助手帮我跑腿一样——虽然还需要跑几趟,但目前已经为我节省了大量的时间和金钱。

2 个赞

供参考

2 个赞

This is super addictive. Based on the “LLMs and SQL” blog post, and a bit of trial and error, I created this prompt that contains a partial description of the Discourse database:

Discourse database prompt
The text between the /* Discourse database documentation start */ and /* Discourse database documentation end */ comments contains details about the Discourse forum application's PostgreSQL database.
All tables and columns are outlined in the `CREATE TABLE` statements. Take note of the sample queries that follow each of the `CREATE TABLE` statements. Some additional important details are contained in
inline (`-- --`) and multi-line (`/* */`) comments. After having sent you this information, I will ask you to write some queries that are to be run by the Discourse Data Explorer plugin. All of the tables and columns
required to write these queries are in the `CREATE TABLE` statements that I have sent you.

/* Discourse database documentation start */

CREATE TABLE users (
    id integer NOT NULL, -- the application has the concept of 'real' users. A 'real' user is a user with an id > 0 --
    username character varying(60) NOT NULL,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL,
    name character varying,
    seen_notification_id integer DEFAULT 0 NOT NULL,
    last_posted_at timestamp without time zone,
    password_hash character varying(64),
    salt character varying(32),
    active boolean DEFAULT false NOT NULL,
    username_lower character varying(60) NOT NULL,
    last_seen_at timestamp without time zone,
    admin boolean DEFAULT false NOT NULL,
    last_emailed_at timestamp without time zone,
    trust_level integer NOT NULL,
    approved boolean DEFAULT false NOT NULL,
    approved_by_id integer,
    approved_at timestamp without time zone,
    previous_visit_at timestamp without time zone,
    suspended_at timestamp without time zone,
    suspended_till timestamp without time zone,
    date_of_birth date,
    views integer DEFAULT 0 NOT NULL,
    flag_level integer DEFAULT 0 NOT NULL,
    ip_address inet,
    moderator boolean DEFAULT false,
    title character varying,
    uploaded_avatar_id integer,
    locale character varying(10),
    primary_group_id integer,
    registration_ip_address inet,
    staged boolean DEFAULT false NOT NULL,
    first_seen_at timestamp without time zone,
    silenced_till timestamp without time zone,
    group_locked_trust_level integer,
    manual_locked_trust_level integer,
    secure_identifier character varying,
    flair_group_id integer,
    last_seen_reviewable_id integer
);
SELECT * FROM users WHERE id = 1 OR id = 2 OR id = 121;
id  | username |         created_at         |         updated_at         |     name     | seen_notification_id |       last_posted_at       |                          password_hash                           |               salt               | active | username_lower |        last_seen_at        | admin |      last_emailed_at       | trust_level | approved | approved_by_id |        approved_at         |     previous_visit_at      |        suspended_at        |     suspended_till      | date_of_birth | views | flag_level | ip_address | moderator |   title    | uploaded_avatar_id | locale | primary_group_id | registration_ip_address | staged |       first_seen_at        | silenced_till | group_locked_trust_level | manual_locked_trust_level |            secure_identifier             | flair_group_id | last_seen_reviewable_id |      password_algorithm
-----+----------+----------------------------+----------------------------+--------------+----------------------+----------------------------+------------------------------------------------------------------+----------------------------------+--------+----------------+----------------------------+-------+----------------------------+-------------+----------+----------------+----------------------------+----------------------------+----------------------------+-------------------------+---------------+-------+------------+------------+-----------+------------+--------------------+--------+------------------+-------------------------+--------+----------------------------+---------------+--------------------------+---------------------------+------------------------------------------+----------------+-------------------------+------------------------------
   1 | scossar  | 2019-04-26 22:59:44.685893 | 2023-08-14 04:40:20.823438 | Simon Cossar |                56395 | 2023-08-14 04:08:43.430717 | 9547d42a1dc5759a0c22ed2c97c490dac845ed76ebc4a412f885ceb908965794 | 304898f78b8b732b1d64011c0d086e91 | t      | scossar        | 2023-08-14 04:40:56.769353 | t     | 2023-08-14 04:33:46.44485  |           3 | t        |             -1 | 2020-09-22 19:54:41.05418  | 2023-08-13 22:35:00.020816 |                            |                         | 1904-02-14    |     0 |          0 | ::1        | t         | Member     |                747 |        |                  |                         | f      | 2019-04-26 23:10:43.250255 |               |                          |                         3 |                                          |                |                     432 | $pbkdf2-sha256$i=64000,l=32$
   2 | sally    | 2019-04-26 23:15:47.859691 | 2023-08-14 04:40:56.831344 |              |                56396 | 2023-08-14 04:11:37.417456 | e1f0be57f784827602613c35ebd4b4087f858c715ebea1b3027f8c520bffbdf9 | ff59f100b4bdd43524f94e3a2f808106 | t      | sally          | 2023-08-14 04:33:57.054779 | f     | 2023-08-14 04:33:06.727322 |           2 | t        |             -1 | 2020-05-19 19:35:15.79381  | 2023-08-13 22:08:05.099486 |                            |                         |               |     0 |          0 | 127.0.0.1  | t         | Regular    |                 22 | en     |               49 | 127.0.0.1               | f      | 2019-04-26 23:16:58.912958 |               |                          |                           | a292161dd2ebbedbcd0e79f96baca06d9f399083 |            194 |                     432 | $pbkdf2-sha256$i=64000,l=32$
 121 | Ben      | 2019-11-15 16:31:38.216013 | 2023-08-14 04:40:41.907605 |              |                56314 | 2023-07-07 20:48:33.496471 | 364180ae133b9b8bb560d30a41b9854f96e069ef2f7f95d957d2dc7122752074 | b6b40fd3b4e2e1236cef027c222f73bc | t      | ben            | 2023-08-14 04:39:47.42192  | f     | 2023-08-14 04:30:26.764459 |           2 | t        |              1 | 2019-11-15 16:31:38.089553 | 2023-07-22 02:14:01.540478 | 2022-05-05 17:39:02.632952 | 2022-05-06 17:38:55.054 |               |     0 |          0 | 127.0.0.1  | f         | Prime Four |                    | en     |              196 | 127.0.0.1               | f      | 2019-11-15 16:31:38.714185 |               |                          |                           |                                          |            196 |                         | $pbkdf2-sha256$i=64000,l=32$

CREATE TABLE groups (
    id integer NOT NULL,
    name character varying NOT NULL,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL,
    automatic boolean DEFAULT false NOT NULL,
    user_count integer DEFAULT 0 NOT NULL,
    automatic_membership_email_domains text,
    primary_group boolean DEFAULT false NOT NULL,
    title character varying,
    grant_trust_level integer,
    incoming_email character varying,
    has_messages boolean DEFAULT false NOT NULL,
    flair_url character varying,
    flair_bg_color character varying,
    flair_color character varying,
    bio_raw text,
    bio_cooked text,
    allow_membership_requests boolean DEFAULT false NOT NULL,
    full_name character varying,
    default_notification_level integer DEFAULT 3 NOT NULL,
    visibility_level integer DEFAULT 0 NOT NULL,
    public_exit boolean DEFAULT false NOT NULL,
    public_admission boolean DEFAULT false NOT NULL,
    membership_request_template text,
    messageable_level integer DEFAULT 0,
    mentionable_level integer DEFAULT 0,
    members_visibility_level integer DEFAULT 0 NOT NULL,
    publish_read_state boolean DEFAULT false NOT NULL,
    flair_icon character varying,
    flair_upload_id integer,
    smtp_server character varying,
    smtp_port integer,
    smtp_ssl boolean,
    imap_server character varying,
    imap_port integer,
    imap_ssl boolean,
    imap_mailbox_name character varying DEFAULT ''::character varying NOT NULL,
    imap_uid_validity integer DEFAULT 0 NOT NULL,
    imap_last_uid integer DEFAULT 0 NOT NULL,
    email_username character varying,
    email_password character varying,
    imap_last_error text,
    imap_old_emails integer,
    imap_new_emails integer,
    allow_unknown_sender_topic_replies boolean DEFAULT false NOT NULL,
    smtp_enabled boolean DEFAULT false,
    smtp_updated_at timestamp without time zone,
    smtp_updated_by_id integer,
    imap_enabled boolean DEFAULT false,
    imap_updated_at timestamp without time zone,
    imap_updated_by_id integer,
    assignable_level integer DEFAULT 0 NOT NULL,
    email_from_alias character varying
); -- users who have either 'admin' or 'moderator' status are added to the automatic "staff" group --

SELECT * FROM groups WHERE id = 1 OR id = 11 OR id = 49;
id |     name      |         created_at         |         updated_at         | automatic | user_count | automatic_membership_email_domains | primary_group |   title    | grant_trust_level | incoming_email | has_messages | flair_bg_color | flair_color |      bio_raw       |        bio_cooked         | allow_membership_requests |         full_name         | default_notification_level | visibility_level | public_exit | public_admission | membership_request_template | messageable_level | mentionable_level | members_visibility_level | publish_read_state | flair_icon | flair_upload_id | smtp_server | smtp_port | smtp_ssl | imap_server | imap_port | imap_ssl | imap_mailbox_name | imap_uid_validity | imap_last_uid | email_username | email_password | imap_last_error | imap_old_emails | imap_new_emails | allow_unknown_sender_topic_replies | smtp_enabled |      smtp_updated_at      | smtp_updated_by_id | imap_enabled | imap_updated_at | imap_updated_by_id | assignable_level | email_from_alias
----+---------------+----------------------------+----------------------------+-----------+------------+------------------------------------+---------------+------------+-------------------+----------------+--------------+----------------+-------------+--------------------+---------------------------+---------------------------+---------------------------+----------------------------+------------------+-------------+------------------+-----------------------------+-------------------+-------------------+--------------------------+--------------------+------------+-----------------+-------------+-----------+----------+-------------+-----------+----------+-------------------+-------------------+---------------+----------------+----------------+-----------------+-----------------+-----------------+------------------------------------+--------------+---------------------------+--------------------+--------------+-----------------+--------------------+------------------+------------------
  1 | admins        | 2019-04-26 22:58:35.997964 | 2021-08-05 19:11:22.699825 | t         |          1 |                                    | f             |            |                   |                | t            |                |             |                    |                           | f                         |                           |                          3 |                1 | f           | f                |                             |                99 |                 0 |                        0 | f                  |            |                 |             |           |          |             |           |          |                   |                 0 |             0 |                |                |                 |                 |                 | f                                  | f            |                           |                    | f            |                 |                    |                0 |
 11 | trust_level_1 | 2019-04-26 22:58:36.033238 | 2021-10-05 19:54:51.043121 | t         |        116 |                                    | f             |            |                   |                | t            |                |             |                    |                           | f                         |                           |                          3 |                1 | f           | f                |                             |                 0 |                 0 |                        0 | f                  |            |                 |             |           |          |             |           |          |                   |                 0 |             0 |                |                |                 |                 |                 | f                                  | f            |                           |                    | f            |                 |                    |                0 |
 49 | eurorack      | 2019-10-03 17:28:42.323203 | 2022-08-16 19:54:09.223307 | f         |         84 | example.com                        | t             | Euroracker |                 3 |                | t            |                |             | All about eurorack+| <p>All about eurorack</p> | f                         | Eurorack Enthusiasts Club |                          3 |                0 | f           | t                | Can I join this group?      |                99 |                99 |                        0 | t                  |            |                 |             |           |          |             |           |          |                   |                 0 |             0 |                |                |                 |                 |                 | f                                  | f            | 2022-02-11 23:24:28.76631 |                  1 | f            |                 |                    |                0 |

/* group_users joins the groups and users tables */
CREATE TABLE group_users (
    id integer NOT NULL,
    group_id integer NOT NULL,
    user_id integer NOT NULL,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL,
    owner boolean DEFAULT false NOT NULL,
    notification_level integer DEFAULT 2 NOT NULL,
    first_unread_pm_at timestamp without time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
);
SELECT * FROM group_users WHERE id = 13 OR id = 8219 OR id = 9137;
  id  | group_id | user_id |         created_at         |         updated_at         | owner | notification_level |     first_unread_pm_at
------+----------+---------+----------------------------+----------------------------+-------+--------------------+----------------------------
   13 |        3 |       1 | 2019-04-26 22:59:47.828533 | 2019-04-26 22:59:47.828533 | f     |                  2 | 2023-08-14 01:14:54.229593
 8219 |       13 |     121 | 2022-04-21 08:15:47.946036 | 2022-04-21 08:15:47.946036 | f     |                  2 | 2023-07-05 06:49:04.48265
 9137 |       49 |       2 | 2022-09-08 17:34:39.290504 | 2022-09-08 17:34:39.290504 | t     |                  3 | 2020-11-21 02:40:15.868728

CREATE TABLE posts (
    id integer NOT NULL,
    user_id integer,
    topic_id integer NOT NULL,
    post_number integer NOT NULL,
    raw text NOT NULL,
    cooked text NOT NULL,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL,
    reply_to_post_number integer,
    reply_count integer DEFAULT 0 NOT NULL,
    quote_count integer DEFAULT 0 NOT NULL,
    deleted_at timestamp without time zone, -- the application only "soft deletes" posts and topics. Unless explicitly asked to return details about deleted posts or topics, always check that `deleted_at IS NULL` when writing queries related to posts or topics. --
    off_topic_count integer DEFAULT 0 NOT NULL,
    like_count integer DEFAULT 0 NOT NULL,
    incoming_link_count integer DEFAULT 0 NOT NULL,
    bookmark_count integer DEFAULT 0 NOT NULL,
    score double precision,
    reads integer DEFAULT 0 NOT NULL,
    post_type integer DEFAULT 1 NOT NULL, -- :regular=>1, :moderator_action=>2, :small_action=>3, :whisper=>4 --
    sort_order integer,
    last_editor_id integer,
    hidden boolean DEFAULT false NOT NULL,
    hidden_reason_id integer,
    notify_moderators_count integer DEFAULT 0 NOT NULL,
    spam_count integer DEFAULT 0 NOT NULL,
    illegal_count integer DEFAULT 0 NOT NULL,
    inappropriate_count integer DEFAULT 0 NOT NULL,
    last_version_at timestamp without time zone NOT NULL,
    user_deleted boolean DEFAULT false NOT NULL,
    reply_to_user_id integer,
    percent_rank double precision DEFAULT 1.0,
    notify_user_count integer DEFAULT 0 NOT NULL,
    like_score integer DEFAULT 0 NOT NULL,
    deleted_by_id integer,
    edit_reason character varying,
    word_count integer,
    version integer DEFAULT 1 NOT NULL,
    cook_method integer DEFAULT 1 NOT NULL,
    wiki boolean DEFAULT false NOT NULL,
    baked_at timestamp without time zone,
    baked_version integer,
    hidden_at timestamp without time zone,
    self_edits integer DEFAULT 0 NOT NULL,
    reply_quoted boolean DEFAULT false NOT NULL,
    via_email boolean DEFAULT false NOT NULL,
    raw_email text,
    public_version integer DEFAULT 1 NOT NULL,
    action_code character varying,
    locked_by_id integer,
    image_upload_id bigint
);
SELECT * FROM posts WHERE id = 11094 OR id = 11095 OR id = 11096;
  id   | user_id | topic_id | post_number |                     raw                      |                                                                                                  cooked                                                                                                   |         created_at         |         updated_at         | reply_to_post_number | reply_count | quote_count | deleted_at | off_topic_count | like_count | incoming_link_count | bookmark_count | score | reads | post_type | sort_order | last_editor_id | hidden | hidden_reason_id | notify_moderators_count | spam_count | illegal_count | inappropriate_count |      last_version_at       | user_deleted | reply_to_user_id |   percent_rank    | notify_user_count | like_score | deleted_by_id | edit_reason | word_count | version | cook_method | wiki |          baked_at          | baked_version | hidden_at | self_edits | reply_quoted | via_email | raw_email | public_version | action_code | locked_by_id | image_upload_id |      outbound_message_id
-------+---------+----------+-------------+----------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+----------------------------+----------------------+-------------+-------------+------------+-----------------+------------+---------------------+----------------+-------+-------+-----------+------------+----------------+--------+------------------+-------------------------+------------+---------------+---------------------+----------------------------+--------------+------------------+-------------------+-------------------+------------+---------------+-------------+------------+---------+-------------+------+----------------------------+---------------+-----------+------------+--------------+-----------+-----------+----------------+-------------+--------------+-----------------+--------------------------------
 11094 |       1 |     1852 |           7 | This is an example of a Discourse post.      | <p>This is an example of a Discourse post.</p>                                                                                                                                                            | 2023-08-14 04:08:43.430717 | 2023-08-14 04:08:43.430717 |                      |           0 |           0 |            |               0 |          0 |                   0 |              0 |   0.2 |     1 |         1 |          7 |              1 | f      |                  |                       0 |          0 |             0 |                   0 | 2023-08-14 04:08:43.441371 | f            |                  | 0.166666666666667 |                 0 |          0 |               |             |          8 |       1 |           1 | f    | 2023-08-14 04:08:43.430685 |             2 |           |          0 | f            | f         |           |              1 |             |              |                 |
 11095 |       2 |    10863 |          11 | This is another example of a Discourse post. | <p>This is another example of a Discourse post.</p>                                                                                                                                                       | 2023-08-14 04:11:37.417456 | 2023-08-14 04:11:37.417456 |                    5 |           0 |           0 |            |               0 |          0 |                   0 |              0 |   0.4 |     2 |         1 |         11 |              2 | f      |                  |                       0 |          0 |             0 |                   0 | 2023-08-14 04:11:37.430459 | f            |              121 |               0.8 |                 0 |          0 |               |             |          8 |       1 |           1 | f    | 2023-08-14 04:11:37.417417 |             2 |           |          0 | f            | f         |           |              1 |             |              |                 | discourse/post/11095@127.0.0.1
 11096 |     121 |    11047 |           2 | Thanks! I hadn't seen that :slight_smile:    | <p>Thanks! I hadn’t seen that <img src="//127.0.0.1:4200/images/emoji/twitter/slight_smile.png?v=12" title=":slight_smile:" class="emoji" alt=":slight_smile:" loading="lazy" width="20" height="20"></p> | 2023-08-14 04:13:09.346146 | 2023-08-14 05:17:10.659726 |                      |           0 |           0 |            |               0 |          0 |                   0 |              0 |   0.4 |     2 |         1 |          2 |              1 | f      |                  |                       0 |          0 |             0 |                   0 | 2023-08-14 05:17:10.589855 | f            |                  |                 0 |                 0 |          0 |               |             |          7 |       2 |           1 | f    | 2023-08-14 05:17:10.659607 |             2 |           |          0 | f            | f         |           |              2 |             |              |                 | discourse/post/11096@127.0.0.1

CREATE TABLE topics (
    id integer NOT NULL,
    title character varying NOT NULL,
    last_posted_at timestamp without time zone,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL,
    views integer DEFAULT 0 NOT NULL,
    posts_count integer DEFAULT 0 NOT NULL,
    user_id integer, -- user id the id of the user who created the topic --
    last_post_user_id integer NOT NULL,
    reply_count integer DEFAULT 0 NOT NULL,
    featured_user1_id integer,
    featured_user2_id integer,
    featured_user3_id integer,
    deleted_at timestamp without time zone, -- the application only "soft deletes" posts and topics. Unless explicitly asked to return details about deleted posts or topics, always check that `deleted_at IS NULL` when writing queries related to posts or topics. --
    highest_post_number integer DEFAULT 0 NOT NULL,
    like_count integer DEFAULT 0 NOT NULL,
    incoming_link_count integer DEFAULT 0 NOT NULL,
    category_id integer,
    visible boolean DEFAULT true NOT NULL,
    moderator_posts_count integer DEFAULT 0 NOT NULL,
    closed boolean DEFAULT false NOT NULL,
    archived boolean DEFAULT false NOT NULL,
    bumped_at timestamp without time zone NOT NULL,
    has_summary boolean DEFAULT false NOT NULL,
    archetype character varying DEFAULT 'regular'::character varying NOT NULL, -- archetype can be set to either 'regular' or 'private_message' --
    featured_user4_id integer,
    notify_moderators_count integer DEFAULT 0 NOT NULL,
    spam_count integer DEFAULT 0 NOT NULL,
    pinned_at timestamp without time zone,
    score double precision,
    percent_rank double precision DEFAULT 1.0 NOT NULL,
    subtype character varying,
    slug character varying,
    deleted_by_id integer,
    participant_count integer DEFAULT 1,
    word_count integer,
    excerpt character varying,
    pinned_globally boolean DEFAULT false NOT NULL,
    pinned_until timestamp without time zone,
    fancy_title character varying,
    highest_staff_post_number integer DEFAULT 0 NOT NULL,
    featured_link character varying,
    reviewable_score double precision DEFAULT 0.0 NOT NULL,
    image_upload_id bigint,
    slow_mode_seconds integer DEFAULT 0 NOT NULL,
    bannered_until timestamp without time zone,
    external_id character varying,
    CONSTRAINT has_category_id CHECK (((category_id IS NOT NULL) OR ((archetype)::text <> 'regular'::text))),
    CONSTRAINT pm_has_no_category CHECK (((category_id IS NULL) OR ((archetype)::text <> 'private_message'::text)))
);
SELECT * FROM topics WHERE id = 1852 OR id = 10863 OR id = 11047;
  id   |                title                |       last_posted_at       |         created_at         |         updated_at         | views | posts_count | user_id | last_post_user_id | reply_count | featured_user1_id | featured_user2_id | featured_user3_id | deleted_at | highest_post_number | like_count | incoming_link_count | category_id | visible | moderator_posts_count | closed | archived |         bumped_at          | has_summary |    archetype    | featured_user4_id | notify_moderators_count | spam_count | pinned_at |       score       | percent_rank |   subtype    |                slug                 | deleted_by_id | participant_count | word_count |                                      excerpt                                      | pinned_globally | pinned_until |             fancy_title             | highest_staff_post_number | featured_link | reviewable_score | image_upload_id | slow_mode_seconds | bannered_until | external_id
-------+-------------------------------------+----------------------------+----------------------------+----------------------------+-------+-------------+---------+-------------------+-------------+-------------------+-------------------+-------------------+------------+---------------------+------------+---------------------+-------------+---------+-----------------------+--------+----------+----------------------------+-------------+-----------------+-------------------+-------------------------+------------+-----------+-------------------+--------------+--------------+-------------------------------------+---------------+-------------------+------------+-----------------------------------------------------------------------------------+-----------------+--------------+-------------------------------------+---------------------------+---------------+------------------+-----------------+-------------------+----------------+-------------
  1852 | Ask Me Anything                     | 2023-08-14 04:08:43.430717 | 2020-05-04 22:11:50.813596 | 2023-08-14 05:25:00.249902 |     2 |           3 |       1 |                 1 |           0 |                   |                   |                   |            |                   7 |          0 |                   2 |           3 | t       |                     4 | f      | f        | 2023-08-14 04:08:43.430717 | f           | regular         |                   |                       0 |          0 |           | 0.914285714285714 |            1 |              | ask-me-anything                     |               |                 1 |        157 | The AMA with @sally will be held on May 8th. Start sending in your questions now. | f               |              | Ask Me Anything                     |                         7 |               |                0 |                 |                 0 |                |
 10863 | Post the last picture on your phone | 2023-08-14 04:11:37.417456 | 2022-01-25 21:42:52.514124 | 2023-08-14 05:22:30.373683 |    18 |           5 |       2 |                 2 |           3 |               121 |                 1 |                   |            |                  11 |          4 |                   7 |          14 | t       |                     0 | f      | f        | 2023-08-14 04:11:37.417456 | f           | regular         |                   |                       0 |          0 |           |  6.54545454545455 |            1 |              | post-the-last-picture-on-your-phone |               |                 3 |        150 | This is a test. This should go to the review queue                                | f               |              | Post the last picture on your phone |                        11 |               |            70.75 |             445 |                 0 |                |
 11047 | Have you seen this post?            | 2023-08-14 04:13:09.346146 | 2022-05-12 22:26:39.280698 | 2023-08-14 05:16:41.106722 |     3 |           2 |       1 |               121 |           0 |                   |                   |                   |            |                   2 |          0 |                   0 |             | t       |                     0 | f      | f        | 2023-08-14 05:17:10.695009 | f           | private_message |                   |                       0 |          0 |           |               0.4 |            1 | user_to_user | have-you-seen-this-post             |               |                 2 |         11 | Here is another one…                                                              | f               |              | Have you seen this post?            |                         2 |               |                0 |                 |                 0 |                |


CREATE TABLE categories (
    id integer NOT NULL,
    name character varying(50) NOT NULL,
    color character varying(6) DEFAULT '0088CC'::character varying NOT NULL,
    topic_id integer,
    topic_count integer DEFAULT 0 NOT NULL,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL,
    user_id integer NOT NULL,
    topics_year integer DEFAULT 0,
    topics_month integer DEFAULT 0,
    topics_week integer DEFAULT 0,
    slug character varying NOT NULL,
    description text,
    text_color character varying(6) DEFAULT 'FFFFFF'::character varying NOT NULL,
    read_restricted boolean DEFAULT false NOT NULL,
    auto_close_hours double precision,
    post_count integer DEFAULT 0 NOT NULL,
    latest_post_id integer,
    latest_topic_id integer,
    "position" integer,
    parent_category_id integer,
    posts_year integer DEFAULT 0,
    posts_month integer DEFAULT 0,
    posts_week integer DEFAULT 0,
    email_in character varying,
    email_in_allow_strangers boolean DEFAULT false,
    topics_day integer DEFAULT 0,
    posts_day integer DEFAULT 0,
    allow_badges boolean DEFAULT true NOT NULL,
    name_lower character varying(50) NOT NULL,
    auto_close_based_on_last_post boolean DEFAULT false,
    topic_template text,
    contains_messages boolean,
    sort_order character varying,
    sort_ascending boolean,
    uploaded_logo_id integer,
    uploaded_background_id integer,
    topic_featured_link_allowed boolean DEFAULT true,
    all_topics_wiki boolean DEFAULT false NOT NULL,
    show_subcategory_list boolean DEFAULT false,
    num_featured_topics integer DEFAULT 3,
    default_view character varying(50),
    subcategory_list_style character varying(50) DEFAULT 'rows_with_featured_topics'::character varying,
    default_top_period character varying(20) DEFAULT 'all'::character varying,
    mailinglist_mirror boolean DEFAULT false NOT NULL,
    minimum_required_tags integer DEFAULT 0 NOT NULL,
    navigate_to_first_post_after_read boolean DEFAULT false NOT NULL,
    search_priority integer DEFAULT 0,
    allow_global_tags boolean DEFAULT false NOT NULL,
    reviewable_by_group_id integer,
    read_only_banner character varying,
    default_list_filter character varying(20) DEFAULT 'all'::character varying,
    allow_unlimited_owner_edits_on_first_post boolean DEFAULT false NOT NULL,
    default_slow_mode_seconds integer
);
SELECT * FROM categories WHERE id = 3 OR id = 16 OR id = 42;
 id |       name       | color  | topic_id | topic_count |         created_at         |         updated_at         | user_id | topics_year | topics_month | topics_week |       slug       |                                        description                                        | text_color | read_restricted | auto_close_hours | post_count | latest_post_id | latest_topic_id | position | parent_category_id | posts_year | posts_month | posts_week | email_in | email_in_allow_strangers | topics_day | posts_day | allow_badges |    name_lower    | auto_close_based_on_last_post | topic_template | contains_messages | sort_order | sort_ascending | uploaded_logo_id | uploaded_background_id | topic_featured_link_allowed | all_topics_wiki | show_subcategory_list | num_featured_topics | default_view |  subcategory_list_style   | default_top_period | mailinglist_mirror | minimum_required_tags | navigate_to_first_post_after_read | search_priority | allow_global_tags | reviewable_by_group_id | read_only_banner | default_list_filter | allow_unlimited_owner_edits_on_first_post | default_slow_mode_seconds | uploaded_logo_dark_id
----+------------------+--------+----------+-------------+----------------------------+----------------------------+---------+-------------+--------------+-------------+------------------+-------------------------------------------------------------------------------------------+------------+-----------------+------------------+------------+----------------+-----------------+----------+--------------------+------------+-------------+------------+----------+--------------------------+------------+-----------+--------------+------------------+-------------------------------+----------------+-------------------+------------+----------------+------------------+------------------------+-----------------------------+-----------------+-----------------------+---------------------+--------------+---------------------------+--------------------+--------------------+-----------------------+-----------------------------------+-----------------+-------------------+------------------------+------------------+---------------------+-------------------------------------------+---------------------------+-----------------------
  3 | Staff            | E45735 |        2 |          16 | 2019-04-26 22:58:38.813759 | 2023-02-16 08:35:51.424024 |      -1 |           0 |            0 |           0 | staff            | Private category for staff discussions. Topics are only visible to admins and moderators. | FFFFFF     | t               |                  |         23 |          11094 |           11250 |       41 |                    |          1 |           0 |          0 |          | f                        |          0 |         0 | t            | staff            | f                             |                |                   |            |                |                  |                        | t                           | f               | f                     |                   3 |              | rows_with_featured_topics | all                | f                  |                     0 | f                                 |               0 | f                 |                        |                  | all                 | f                                         |                           |
 16 | customer support | F7941D |      277 |          13 | 2019-07-17 20:18:13.584715 | 2023-08-11 20:50:35.548498 |       1 |           1 |            0 |           0 | customer-support | This description will appear on the categories page.                                      | FFFFFF     | f               |              720 |         68 |          11087 |           11236 |        0 |                    |         10 |           7 |          0 |          | f                        |          0 |         0 | f            | customer support | f                             |                |                   |            |                |              689 |                        | t                           | f               | f                     |                   4 |              | rows_with_featured_topics | all                | f                  |                     0 | f                                 |               0 | f                 |                        |                  | all                 | f                                         |                           |
 42 | eurorack         | 0088CC |      584 |          30 | 2019-10-03 17:29:33.782372 | 2023-08-13 21:35:09.514998 |       1 |           1 |            0 |           0 | eurorack         | All about Eurorack synths.                                                                | FFFFFF     | f               |                  |        142 |          11091 |           11183 |        1 |                    |         15 |           3 |          3 |          | f                        |          0 |         3 | t            | eurorack         | f                             |                |                   | posts      |                |                  |                        | t                           | f               | f                     |                   4 | latest       | rows_with_featured_topics | all                | f                  |                     2 | t                                 |               0 | f                 |                        |                  | all                 | f                                         |                           |

CREATE TABLE tags (
    id integer NOT NULL,
    name character varying NOT NULL,
    topic_count integer DEFAULT 0 NOT NULL,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL,
    pm_topic_count integer DEFAULT 0 NOT NULL,
    target_tag_id integer,
    description character varying
);
SELECT * FROM tags WHERE id = 183 OR id = 184 OR id = 185;
 id  |    name     |         created_at         |         updated_at         | pm_topic_count | target_tag_id | description | public_topic_count | staff_topic_count
-----+-------------+----------------------------+----------------------------+----------------+---------------+-------------+--------------------+-------------------
 183 | photos      | 2023-08-14 05:22:30.326838 | 2023-08-14 05:22:30.326838 |              0 |               |             |                  1 |                 1
 184 | meetup      | 2023-08-14 05:25:00.227547 | 2023-08-14 05:25:00.227547 |              0 |               |             |                  0 |                 1
 185 | icebreakers | 2023-08-14 06:18:39.214459 | 2023-08-14 06:18:39.214459 |              0 |               |             |                  1 |                 1

CREATE TABLE topic_tags (
    id integer NOT NULL,
    topic_id integer NOT NULL,
    tag_id integer NOT NULL,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL
);
SELECT * FROM topic_tags WHERE id = 1005 OR id = 1006 OR id = 1007;
  id  | topic_id | tag_id |         created_at         |         updated_at
------+----------+--------+----------------------------+----------------------------
 1005 |    10863 |    183 | 2023-08-14 05:22:30.360668 | 2023-08-14 05:22:30.360668
 1006 |     1852 |    184 | 2023-08-14 05:25:00.237298 | 2023-08-14 05:25:00.237298
 1007 |    10863 |    185 | 2023-08-14 06:18:39.240753 | 2023-08-14 06:18:39.240753

CREATE TABLE user_actions (
    id integer NOT NULL,
    action_type integer NOT NULL, -- :like=>1 (when a user likes a post), :was_liked=>2 (when a user's post is liked), :new_topic=>4, :reply=>5, :response=>6, :mention=>7, :quote=>9, :edit=>11, :new_private_message=>12, :got_private_message=>13, :solved=>15, :assigned=>16 --
    user_id integer NOT NULL,
    target_topic_id integer,
    target_post_id integer,
    target_user_id integer,
    acting_user_id integer,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL
);
SELECT * FROM user_actions WHERE id = 19928 OR id = 19929 OR id = 19931;
/*
In the first example below, the user with the id 1 has liked (action_type: 1) the post with the id 11100. Because the user with the id 1 took the action, 1 is set as the value of the acting_user_id column. Because the entry is recording that this user has performed a "like" action, their user id (1) is also set in the entry's user_id column.
Here's an example query that would return the number of times the user with id: 1 has liked other user's posts: `SELECT COUNT(user_id) AS number_of_likes_given FROM user_actions WHERE action_type = 1 AND user_id = 1;`

In the second example below, the user with the id 121 has had their post (id 11100) liked by another user (action_type: 2). The entry's user_id is set to 121 because that's the id of the user who has had their post liked. The entry's acting_user_id column is set to 1 because that is the user who liked the post.
Here's an example query that returns the number of times the user with id 121 has had their posts liked by other users: `SELECT COUNT(user_id) AS number_of_likes_received FROM user_actions WHERE action_type = 2 AND user_id = 121;`
Here's an example query that returns the number of times the user with id 121 has had their posts liked by the user with id: 1: `SELECT COUNT(user_id) AS number_of_likes_received_from_user_1 FROM user_actions WHERE action_type = 2 AND user_id = 121 AND acting_user_id = 1;`

In the third example, the user with the id 121 has been mentioned (action_type: 7) in the post (id 11101) by the user with the id 2. The entry's user_id is set to 121 because that is the id of the user who was mentioned. The entry's acting_user_id is set to 2 because that is the id of the user who created the mention.
Here's an example that returns the number of times the user with id 121 has been mentioned by any users: `SELECT COUNT(user_id) AS number_of_mentions_received FROM user_actions WHERE action_type = 7 AND user_id = 121;`
Here's an example query that returns the number of times the user with id 121 has been mentioned by the user with id 2: `SELECT COUNT(user_id) AS number_of_mentions_received_from_user_2 FROM user_actions WHERE action_type = 7 AND user_id = 121 AND acting_user_id = 2;`
Here's an example query that returns the number of times the user with id 121 has been mentioned in the topic with the id 7282: `SELECT COUNT(user_id) AS number_of_mentions_received_in_topic FROM user_actions WHERE action_type = 7 AND user_id = 121 AND target_topic_id = 7282;`
*/
  id   | action_type | user_id | target_topic_id | target_post_id | target_user_id | acting_user_id |         created_at         |         updated_at
-------+-------------+---------+-----------------+----------------+----------------+----------------+----------------------------+----------------------------
 19928 |           1 |       1 |            7282 |          11100 |                |              1 | 2023-08-14 07:47:49.520171 | 2023-08-14 07:47:49.651056
 19929 |           2 |     121 |            7282 |          11100 |                |              1 | 2023-08-14 07:47:49.520171 | 2023-08-14 07:47:49.671757
 19931 |           7 |     121 |            7282 |          11101 |                |              2 | 2023-08-14 08:00:05.877537 | 2023-08-14 08:00:05.877537

CREATE TABLE polls (
    id bigint NOT NULL,
    post_id bigint,
    name character varying DEFAULT 'poll'::character varying NOT NULL,
    close_at timestamp without time zone,
    type integer DEFAULT 0 NOT NULL,
    status integer DEFAULT 0 NOT NULL,
    results integer DEFAULT 0 NOT NULL,
    visibility integer DEFAULT 0 NOT NULL,
    min integer,
    max integer,
    step integer,
    anonymous_voters integer,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL,
    chart_type integer DEFAULT 0 NOT NULL, -- {"bar"=>0, "pie"=>1} --
    groups character varying,
    title character varying
);
SELECT * FROM polls WHERE id = 71 OR id = 72 OR id = 74;
 id | post_id | name |      close_at       | type | status | results | visibility | min | max | step | anonymous_voters |         created_at         |         updated_at         | chart_type |    groups     |                   title
----+---------+------+---------------------+------+--------+---------+------------+-----+-----+------+------------------+----------------------------+----------------------------+------------+---------------+-------------------------------------------
 71 |   11097 | poll | 2023-08-20 07:00:00 |    0 |      0 |       0 |          1 |     |     |      |                  | 2023-08-14 06:38:10.96388  | 2023-08-14 06:38:10.96388  |          0 | trust_level_2 | Who took the best picture?
 72 |   11098 | poll | 2023-09-10 07:00:00 |    0 |      0 |       0 |          1 |     |     |      |                  | 2023-08-14 06:40:36.925762 | 2023-08-14 06:40:36.925762 |          1 | staff         | Who should we invite to our next AMA?
 74 |   11100 | poll | 2023-08-27 07:00:00 |    1 |      0 |       0 |          0 |   1 |   2 |      |                  | 2023-08-14 06:48:27.498764 | 2023-08-14 06:48:27.498764 |          0 | eurorack      | What are your favourite Eurorack modules?

CREATE TABLE poll_options (
    id bigint NOT NULL,
    poll_id bigint,
    digest character varying NOT NULL,
    html text NOT NULL,
    anonymous_votes integer,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL
);
SELECT * FROM poll_options WHERE id = 271 OR id = 274 OR id = 280;
 id  | poll_id |              digest              |      html      | anonymous_votes |         created_at         |         updated_at
-----+---------+----------------------------------+----------------+-----------------+----------------------------+----------------------------
 271 |      71 | 5c6c2c880d86e5a0d9f0924a7ffd9629 | Sally          |                 | 2023-08-14 06:38:10.975263 | 2023-08-14 06:38:10.975263
 274 |      72 | 2654933188fb6bd444b6df4cc39fd908 | Dangerous Dave |                 | 2023-08-14 06:40:36.930354 | 2023-08-14 06:40:36.930354
 280 |      74 | f2e5462d97476629719e607dc80a9619 | Maths          |                 | 2023-08-14 06:48:27.502673 | 2023-08-14 06:48:27.502673

CREATE TABLE poll_votes (
    poll_id bigint,
    poll_option_id bigint,
    user_id bigint,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL
);
SELECT * FROM poll_votes WHERE poll_id = 71;
 poll_id | poll_option_id | user_id |         created_at         |         updated_at
---------+----------------+---------+----------------------------+----------------------------
      71 |            270 |       2 | 2023-08-14 06:59:00.382873 | 2023-08-14 06:59:00.382873
      71 |            271 |     121 | 2023-08-14 07:02:57.364892 | 2023-08-14 07:02:57.364892
      71 |            271 |       1 | 2023-08-14 07:03:36.464304 | 2023-08-14 07:03:36.464304

/* Discourse database documentation end */

It covers the users, groups, group_users, posts, topics, categories, tags, topic_tags, user_actions, polls, poll_options, and poll_votes tables. It’s currently at 1150 tokens, so it should be able to double its size without causing issues. Note that if you copy it into a ChatGPT chat input, you’ll need to paste it into two separate inputs - the character limit on a chat input is less than the token limit for a chat session.

I added fairly detailed comments above the example user_actions queries. With those examples, ChatGPT is doing a good job in answering questions about likes given, likes received, etc. Previously it had been struggling with that. I suspect there are a few tables that would need a similar approach.

After having sent the documentation, the following prompts are helpful:

Example prompts

When I ask you start a query with a ‘query period CTE’, I want the query to start with exactly this SQL (including the comment):

--[params]
-- string :query_interval = 1 week
-- date :start_date
-- date :end_date

WITH query_periods AS (
  SELECT
      generate_series(:start_date, :end_date, :query_interval::interval)::date AS period_start,
      (generate_series(:start_date, :end_date, :query_interval::interval)::date + :query_interval::interval - INTERVAL '1 DAY')::date AS period_end
)

The Data Explorer plugin allows paramters to be added to its queries. Parameters that are used in queries need to appear in a comment at the top of the query in this form:

--[params]
--param_type :param_name

The available param types are: int, bigint, boolean, string, date, time, datetime, double, user_id, post_id, topic_id, category_id, group_id, badge_id, int_list, string_list, user_list
Here is an example:

--[params]
-- string :action_type

An optional default value can be supplied for a parameter. For example:

--[params]
--string :action_type = like

I created the “query_period CTE” prompt because if I didn’t specifically tell ChatGPT how to create the query_period CTE, it was coming up will all sorts of solutions, some better than others. With the prompt, it adds the exact code I give it, then builds queries on top of it without any problems. Interestingly, when I tried to add details about how to create a “query period CTE” to the initial documentation that I sent ChatGPT, it would just ignore the instructions. For some reason sending it as a separate prompt makes a difference.

Prompts describing how Discourse “soft deletes” topics and posts are also helpful. After letting ChatGPT know about the need to check that deleted_at IS NOT NULL for queries related to topics and posts, it consistently applies that to all queries.

Telling ChatGPT to leave off the semicolon at the end of the queries is kind of hopeless. It remembers for a couple of queries, then goes back to adding the semicolon. That seems like a minor detail.

My initial hope of having a perfect query returned from a natural language question was a bit ambitious. It makes mistakes, and so do I. At least in the short term, I think the best way to integrate ChatGPT with the Data Explorer plugin would be to initiate a PM with ChatGPT. A basic description of the database could be sent when the PM is created. Then a selection of prompts could be made available via the UI. For example, a prompt for parameterizing a query, or a prompt for adding details about a seldom used table.

My suspicion is that this could be very helpful to someone who knows a bit of SQL, but possibly it could also be implemented in a way that would help people who are new to SQL get up to speed a lot faster than they would on their own. I learned this wonderful trick today:

WITH post_type_mapping AS (
  SELECT 'regular' AS type, 1 AS post_type
  UNION ALL
  SELECT 'moderator_action' AS type, 2 AS post_type
  UNION ALL
  SELECT 'small_action' AS type, 3 AS post_type
  UNION ALL
  SELECT 'whisper' AS type, 4 AS post_type
)
...
JOIN post_type_mapping m ON :post_type = m.type
WHERE p.post_type = m.post_type
4 个赞

Hopefully I am not feeding your addiction.

I don’t know if you read research papers but today ran across another one reinforcing that you are on a valid path and that the paper can hopefully point out new avenues for achieving the goal of generating valid SQL and results starting with a natural language query.

The paper is for math problems but since math is just an expression as is SQL, just change out one form of expression for another and it should make sense. There are many such related papers all with similar ideas so don’t consider this one as authoritative.

“Solving Challenging Math Word Problems Using Gpt-4 Code Interpreter With Code-Based Self-Verification” by Aojun Zhoun, Ke Wang, Zimu Lu, Weikang Shi, Sichun Luo, Zipeng Qin, Shaoqing Lu, Anya Jia, Linqi Song, Mingjie Zhan and Hongsheng Li (pdf)

1 个赞

@simon

另一个可能对你有帮助的想法,这个想法听起来可能有点异想天开,但我曾用它来生成 Prolog 代码,特别是网页。

问题可能在于生成式人工智能更难理解 SQL,因为它是一种声明式语言,而生成式人工智能在编程语言方面取得成功很大程度上是因为在少数命令式编程语言(如 JavaScript、Python、Java 等)上有大量的训练集。但是,基于 Transformer 的生成式人工智能(据我回忆,Transformer 最初是为了 IIRC 将英语翻译成德语而创建的)非常擅长与训练良好的编程语言进行相互翻译。所以,如果你一开始不直接要求 SQL,而是先要求生成式人工智能用 Python 或其他类似的训练良好的编程语言来编写解决问题的代码,然后再让生成式人工智能将 Python 翻译成 SQL,看看是否有效。我个人不打算尝试这个方法,但既然你似乎很喜欢这个,那就请便吧。另外,如果你尝试了,请提供反馈,我非常想知道你的发现。:slightly_smiling_face:

1 个赞

这是拼写错误吗?

WHERE poll_id = 74 应该改为 WHERE poll_id = 71 吗?


我没有检查整个提示,只是想看看你是否包含了查询结果示例。换句话说,你提供了表值的示例,但我没有看到提示的预期结果示例,也许我错过了。预期结果可用于验证 SQL 是否正确。


建议:

Discourse 数据库提示 示例中,id 使用 1 在多个表中都有使用。虽然我们人类知道 1 仅在字段和表的上下文中才有意义,但 AI 不会知道这一点,这可能会给 AI 一个犯错的机会。因此,建议修改 Discourse 数据库提示 以为每个 id 使用不同的数字。

更进一步,使用 OpenAI Tokenizer 页面 检查每个示例表中的所有值是否为单个 token。我知道你可能想保留其中一些作为单词,甚至更糟的是字符串,但 AI 真的在乎吗?多个 token 的值会导致更多变化和可能的幻觉吗?

1 个赞

是个错误。我会修复它。我想我用 71 重新运行了查询,以尝试引入与另一个与投票相关的表相关联的结果。

我偏离了博客文章中关于运行所有 SELECT 查询的建议,该建议的形式是:
SELECT * FROM polls LIMIT 3;
我之所以这样做,是因为我的开发数据库中有很多匿名用户和已删除的帖子。我认为这样可以提供更一致的结果,但我将尝试使用一个全新的数据库重新进行提示,并简化 SELECT 语句。

是的,我在所有示例中都使用了三个用户。他们的 id 是 12121,所以这些值会重复很多次。我以为最好是显示一致的数据。我会尝试几种不同的方法,看看哪种效果最好。

博客文章中建议的另一种方法是限制提示中设置的列。这很诱人,但有引入大量错误的风险,并且难以维护等。

认为 我看到的模式是,如果我通过请求一个复杂查询来开始一个会话,ChatGPT 就会感到困惑。我让它克服了困惑,然后在那之后会话的结果就相当不错了。另一种似乎有效的方法是先从一个简单的查询开始,然后在此基础上进行更复杂的查询。我不确定这是否是一个真正的模式,或者成功率是否真的比看起来更随机。

目前来看,它似乎对那些已经熟悉 SQL 和 Discourse 数据库的人很有用。我想让它达到对那些对两者都不太了解的人也有用的程度。

我还在用 ChatGPT-4 测试这个。它很可能会产生更好的结果,但也许使用起来不那么有趣。ChatGPT-3.5 速度快很多。

2 个赞

几十年前,当我学习数据库时,仅仅学习 SQL 就很令人困惑,然后我使用了 Microsoft Access 的查询生成器,它可以拖放表,然后用线条将表字段连接起来,这与 Visio 的工作方式很相似,它会生成 SQL。

类似的工具,图片来自 这里

不指望 Discourse 创建这样的 SQL 构建工具,但或许可以利用 AI 生成连接表的此类图像作为反馈。


我记得在这个视频中提到,首先用 GPT-4 创建以获得正确结果,然后为 GPT-3.5 定制以提高速度。

2 个赞