/polls/voters.json devuelve usuarios duplicados en solicitudes paginadas

When calling the /polls/voters.json endpoint, using pagination, we’ve found that the first call returns 25 users as expected, but the 2nd page returns 26 users, one of which was returned in the first call. This is consistent whenever there are over 25 users.

The problem is here in the poll plugin for discourse:

The offset is calculated assuming that postgress SQL is exclusive with a bound of the row using BETWEEN, when it is inclusive.

The first query actually returns 25, because the calculated rows actually start a 1, not 0.

My proposed fix is as simple as:

   params = {
      offset: offset + 1,
      offset_plus_limit: offset + limit,
      option_digest: opts[:option_id].presence,
    }

Or, a more elegant solution might be to use postgres LIMIT and OFFSET

    params = {
      limit: limit,
      offset: offset,
      option_digest: opts[:option_id].presence,
    }
          WHERE pv.poll_id IN (:poll_ids)
                /* where */
        ) v
        ORDER BY digest, CASE WHEN rank = 'Abstain' THEN 1 ELSE CAST(rank AS integer) END, username
		LIMIT :limit OFFSET :offset
      SQL
2 Me gusta

Nice catch, putting a pr-welcome on this in case anyone wants to give fixing this a go.

We need to confirm we don’t break existing stuff in polls in case the front end incorrectly depends on wrong assumptions here.

1 me gusta

Thanks for quick reply!
I’ll try to set up an env, didn’t wanna PR without testing. Anyone feel free to beat me to it. I’m not a ruby dev at all. I also don’t know about changing spec or tests.

Hilariously in the UI there is code that dumps the results into a set to bypass this bug.

2 Me gusta

For anyone else viewing, we put in a similar UI fix for our raffle software here.

Cheers to @lukecyca from VHS for identifying the area of issue.

There is also a related report about the loading of more voters in polls from January.

1 me gusta

Great catch.

It’s even more funny than that. I submitted a couple of very large PRs a while back to modernise the front end and add Ranked Choice.

The huge scale of those PRs definitely contributed to me missing this and back end fixes were not really in scope (though in the process of doing those PRs I did actually resolve a lot of issues that weren’t strictly for those PRs to fix)

I admit to matching that front end precedent that was set back in 2018 for ranked choice without really thinking about it :sweat_smile:. (Which you guys just also followed, albeit knowingly :laughing:)

Amazing how workarounds can persist without the underlying issue being identified.

3 Me gusta

So I thought I’d take a stab at this one as my first contribution, but it looks like it might be a bit hairier than I expected. :wink:

I have a Draft PR up: FIX: do not return duplicates from /polls/voters.json by clechasseur · Pull Request #1 · clechasseur/discourse · GitHub

In this PR, I first added tests that reproduce the problem, then I applied Rob’s simple fix and it made the tests pass.

I also tried the more elegant solution (the one I’d have preferred), but although it does prevent duplicate voters, it also changes which voter is returned in which page, including the first, which could potentially be considered a breaking change (depending on how the front end deals with it - I haven’t yet looked).

Taking a step back, however, I am left wondering what is the true meaning of the limit parameter when you call this endpoint - it doesn’t really limit the numbers of total voters returned, merely the number of voters returned for each poll option. You can see this effect in the test I added for the multi-choice poll here - the first page is indeed limited to 2 voters per option, but in total, three different voters are returned (scattered across options). Switching to the elegant solution (i.e., using LIMIT :limit OFFSET :offset) results in the limit being applied to the total number of votes and not voters. I’m not 100% sure it’s better or more intuitive.

Anyway, I am new to this so I may be overthinking this. The simple solution does remove duplicate voters and doesn’t wreak too much havok, so it might be the way to go. I’ll wait for input before submitting a PR to the parent repo.

As an aside, I think there is another bug in this part of the code. The query to load voters is ordered by digest, rank and username - but when ordering by rank, it uses this condition:

CASE WHEN rank = 'Abstain' THEN 1 ELSE CAST(rank AS integer) END

However, ’Abstain’ actually corresponds to rank 0, not 1 - rank 1 can also be returned as ’1’. This potentially makes the ordering non-deterministic across queries, which means that depending on the number of voters and the limit value used, it could be possible to actually miss voters when doing paginated queries. In my new tests, I had to sort the returned voters to get around the non-deterministic nature. (Because it’s non-deterministic, I assume it’s not easy to reproduce in a test, but I can take a shot at it…)

1 me gusta

I followed the link above to a previous report, which includes a like to a previous previous report which includes something that seems to point to this possibility as well:

That’s a long time ago however so the code might have been totally different back then. (I tried going back in time with Git but the history stopped in 2021 so I assume the code moved at some point.)

After some experimentation, I was unable to craft a test that caused the non-deterministic ordering to cause a problem with returned voters. I think this is because of the way the query is crafted and because of how the poll votes are created, but I am not 100% certain.

I’ve created a PR with my test and the fixes I propose: FIX: Avoid returning duplicate voters from `/polls/voters.json` endpoint by clechasseur · Pull Request #34433 · discourse/discourse

1 me gusta