How to fetch topics ordered by custom field iso8601 value?

Hi,

I’ve come across a weird query which i don’t really know how to wrap my head around doing only with a sql/active record query.

I have topics with custom fields, and i need to sort data using the value of ‘importedSortDate’ custom field.

Initial code
Topic.where(‘category_id in (?)’, [cat_id])
.where(‘closed = ?’, false)
.order(‘created_at desc’)
.limit(10)

How can I modify or add to this query so I load into memory only those 10 topics, and not more.

Since Custom Fields Table template is a bit weird, in the sense you have a name column and a value column, i’m a bit stuck on how exactly do i manage to do this all using the database query.

My issue with loading more in memory is the fact that for every topic i load into memory a category, user and post fetch are also performed.

I’m working with 1000-2000 topics per category, and i have 4 categories to fetch from.
The fetch is done by all users, it is not a one time thing.

As I see it I have 2 ways of doing it so far (first one doesn’t seem fast enough for me):

  1. Get only ids for all topics that i’m interested in 1 fetch and then use that info somehow join it with the value of ‘importedSortDate’ and then do a sort in memory and then again do a fetch for the ids in the same order i have them (again don’t really know exactly how that would work)
  2. somehow make a query that joins with the custom fields table (i’ve seen some examples but they were for fixed values on name and value columns) and sort this all based on the columns ‘name’ === ‘importedSortDate’ and the value column (iso8601 string)

How can I achieve this?

PS some topics may not have a value in the importedSortDate custom field, so I could just fetch those and put them wherever I want, but it’s an edge-case i can deal with. If they don’t have a value, they should be the first entries.

PPS I want to afterwards use this query as my pagination query, logic already implemented on that side, I just need to change the main query for fetching topics.

The limit(10) should do that already :slight_smile:

This kind of thing should do it: (but not tested)

Topic.where("category_id in (?)", [cat_id])
         .where("closed = ?", false)
         .joins("LEFT JOIN topic_custom_fields import_tcf ON import_tcf.topic_id = topic.id AND topic_custom_fields.name = 'importedSortDate'")
         .order("TIMESTAMP import_tcf.value")
         .order("created_at desc")
         .limit(10)

The problem is, asking postgres to cast the date string into a timestamp will be very inefficient. It will have to do the conversion for every single topic first, then pick the first 10.

There are a couple of options I can think of here - you could store the dates as “seconds since epoch”, rather than ISO8601. That will make them much easier for postgres to sort. Or you might be able to use a database migration to add an index to the topic_custom_fields table for TIMESTAMP value.

4 Likes

Oh…

Hmm, ok so this basically means instead of the iso8601, I need to save a seconds since epoch, so that the string compare is correct and sorts properly.

I was so concerned about not being able to get the value of import_tcf that i totally forgot to go with raw sql…

1 Like

Well … I got here:

 topics = Topic.where("topics.category_id in (?)", [7])
    .where("topics.closed = ?", false)
    .joins("LEFT JOIN topic_custom_fields custom_fields ON custom_fields.topic_id = topics.id AND custom_fields.name = '#{Constants::TOPIC_SORT_DATE}'")
    .order("coalesce(cast(custom_fields.value as timestamp), topics.created_at) desc") # Thanks @falco
  # .limit(10)

  array = topics.to_a.map do |t|
    next { id: t.id, createdAt: t.created_at, sortDate: t.custom_fields[Constants::TOPIC_SORT_DATE] }
  end
  puts array

With the output:

{:id=>25, :createdAt=>Thu, 14 May 2020 09:26:47 UTC +00:00, :sortDate=>nil}
{:id=>7017, :createdAt=>Tue, 06 Oct 2020 07:49:10 UTC +00:00, :sortDate=>"2011-01-02T22:00:00.000Z"}
{:id=>7016, :createdAt=>Tue, 06 Oct 2020 07:49:10 UTC +00:00, :sortDate=>"2011-01-02T22:00:00.000Z"}
{:id=>7058, :createdAt=>Tue, 06 Oct 2020 15:39:49 UTC +00:00, :sortDate=>"2010-02-02T00:00:00Z"}
{:id=>7008, :createdAt=>Tue, 06 Oct 2020 07:49:05 UTC +00:00, :sortDate=>"2010-01-02T22:00:00.000Z"}
{:id=>7010, :createdAt=>Tue, 06 Oct 2020 07:49:06 UTC +00:00, :sortDate=>"2010-01-02T22:00:00.000Z"}
{:id=>7011, :createdAt=>Tue, 06 Oct 2020 07:49:06 UTC +00:00, :sortDate=>"2010-01-02T22:00:00.000Z"}
{:id=>7012, :createdAt=>Tue, 06 Oct 2020 07:49:07 UTC +00:00, :sortDate=>"2010-01-02T22:00:00.000Z"}
{:id=>7013, :createdAt=>Tue, 06 Oct 2020 07:49:08 UTC +00:00, :sortDate=>"2010-01-02T22:00:00.000Z"}
{:id=>7014, :createdAt=>Tue, 06 Oct 2020 07:49:08 UTC +00:00, :sortDate=>"2010-01-02T22:00:00.000Z"}
{:id=>7015, :createdAt=>Tue, 06 Oct 2020 07:49:09 UTC +00:00, :sortDate=>"2010-01-02T22:00:00.000Z"}
{:id=>7003, :createdAt=>Tue, 06 Oct 2020 07:49:01 UTC +00:00, :sortDate=>"2010-01-02T22:00:00.000Z"}
{:id=>7004, :createdAt=>Tue, 06 Oct 2020 07:49:02 UTC +00:00, :sortDate=>"2010-01-02T22:00:00.000Z"}
{:id=>7005, :createdAt=>Tue, 06 Oct 2020 07:49:03 UTC +00:00, :sortDate=>"2010-01-02T22:00:00.000Z"}
{:id=>7006, :createdAt=>Tue, 06 Oct 2020 07:49:03 UTC +00:00, :sortDate=>"2010-01-02T22:00:00.000Z"}
{:id=>7007, :createdAt=>Tue, 06 Oct 2020 07:49:04 UTC +00:00, :sortDate=>"2010-01-02T22:00:00.000Z"}
{:id=>7009, :createdAt=>Tue, 06 Oct 2020 07:49:05 UTC +00:00, :sortDate=>"2010-01-02T22:00:00.000Z"}
{:id=>7059, :createdAt=>Tue, 06 Oct 2020 15:49:16 UTC +00:00, :sortDate=>"2009-02-02T00:00:00Z"}
{:id=>7002, :createdAt=>Tue, 06 Oct 2020 07:49:01 UTC +00:00, :sortDate=>"2009-01-02T22:00:00.000Z"}
{:id=>7001, :createdAt=>Tue, 06 Oct 2020 07:49:00 UTC +00:00, :sortDate=>"2009-01-02T22:00:00.000Z"}
{:id=>6999, :createdAt=>Tue, 06 Oct 2020 07:48:59 UTC +00:00, :sortDate=>"2009-01-02T22:00:00.000Z"}
{:id=>6998, :createdAt=>Tue, 06 Oct 2020 07:48:58 UTC +00:00, :sortDate=>"2009-01-02T22:00:00.000Z"}
{:id=>7000, :createdAt=>Tue, 06 Oct 2020 07:49:00 UTC +00:00, :sortDate=>"2009-01-02T22:00:00.000Z"}

I probably need to run a script to convert everything to timestamps

You will want to change this into:

.order("COALESCE(CAST(custom_fields.value AS timestamp), custom_fields.value) DESC")

for readability.

You sort by sortDate and when that is missing you use createdAt. All in descending order. The query is returning the data exactly as asked as 2020 is “greater” than the 2011 second place.

If you want nulls last, you need:

.order("CAST(custom_fields.value AS timestamp DESC NULLS LAST")
1 Like

Sorry @Falco I figured out that the code was correct and edited as fast as I could.

Edit:
I’ve simplified my code to:
.order(“coalesce(cast(custom_fields.value as timestamp), topics.created_at) desc”)

Just to be safe…

@david Regarding the convert to timestamps idea. My problem with that is not all categories have the same topic custom fields model. This would mean that i have to either create a sort_date for every topic, or keep it as is and just sort by sort_date || t.created_at.

With timestamps it needs to be all or nothing. :frowning:

@Falco @david any idea what the performance implications are for casting in ORDER BY?

in this thread, the author says that the performance will degrade with the returned items number. I’m not a SQL guru so I don’t know if this is true. (https://stackoverflow.com/a/491240/4020131)

The reason for this is that with a char data type, you are sorting the rows as a string.

The idea to ORDER BY CAST() is correct, however performance of this will go down as the number of returned results increases.

If it's only numerical data in this column, the best practice would be to find a suitable numerical data type and change it.

If you really can't change the column and you find yourself having performance issues, I suggest having a sort order column which contains the value cast to an integer (will nulls converted to an appropriate value).

Index the sort order column and ideally, add a trigger to the CHAR column so that inserts or updates to the char value trigger an update to the integer value.

From what he says, I understand that if i only want 10-20 topics, then the performance will stay the same, no matter how many topics I have in the database.

For me this is counter intuitive because how will it know to order all topics and give me 10-20 if it doesn’t cast it beforehand …

I also found this thread MSDN thread, but I do not understand exactly how it applies to my case - CAST in ORDER BY

Bad. If you plan to query on this in a hot path, you may as well add a new table with a proper column type and index in a migration on your plugin.

What about the joining of the custom fields table, where i search for only the entries that have topic_id? @Falco doesn’t that affect performance as well?

I was just thinking… the aim here is to get things in chronological order, right? Looking at the YYYY-MM-DDTHH:MM:SS format of ISO8601 dates, I think you can actually get away with sorting them ‘alphabetically’, and it will still be chronological.

Without the cast, I think postgres should be able to use the index we have on (name, value), and it should be much more efficient.

2 Likes

But what about the create_at part? I would like it to use sortDate when it has it, and created_at when not.

1 Like

Wait, doesn’t ISO8601 already string compare correctly? Isn’t that half the point?

Yeah.

1 Like

The problem is i want to do a cross sort using sortDate (where i have it) and created_at where i dont, without performance loss.

My problem is a value for custom field is a STRING and created_at is a DATE type

1 Like

You can try this probably.
Postgres String to Date and Timestamp | Blendo.

I found a way of doing it, but it involves casting, either to date from string (sortDate being custom field value string) or turning created_at into string from date.

I was looking for a solution that was less demanding and didnt require extra work like db migration scripts.

Im not sure how discourse handles custom tables, when it comes to rebasing and upgrading the database.

1 Like

I will revisit this in the future by filling the sortDate for all items most probably. It world be a nicer solution if i could set it on the Topic itself but i don’t know again how that would affect the discourse upgrades.

Any helpful links or tips for custom tables and how to avoid issues when having them and updating discourse?

Custom tables are far safer than adding/modifying fields in the existing ones. Discourse won’t mess with your tables but you may need do some work to keep up with the changes in core.

discourse-subscriptions and discourse-calendar are good examples of official plugins having their own tables.

2 Likes

@fzngagan I just started using Discourse 3-4 months ago, along with ruby :slight_smile:. When you say “do some work”, what are the exact steps you are referring to? Is it something like write a migration script that creates the table if it is non existing? If you can help with a small list, it would really ease my documentation time :expressionless:

Thank you in advance

If you see something breaking (and you’ll need to check regularly) you’ll have to keep fixing it. Also, you can write some unit tests which will help you locate the issues easily.

1 Like