So I thought I’d take a stab at this one as my first contribution, but it looks like it might be a bit hairier than I expected. 
I have a Draft PR up: FIX: do not return duplicates from /polls/voters.json by clechasseur · Pull Request #1 · clechasseur/discourse · GitHub
In this PR, I first added tests that reproduce the problem, then I applied Rob’s simple fix and it made the tests pass.
I also tried the more elegant solution (the one I’d have preferred), but although it does prevent duplicate voters, it also changes which voter is returned in which page, including the first, which could potentially be considered a breaking change (depending on how the front end deals with it - I haven’t yet looked).
Taking a step back, however, I am left wondering what is the true meaning of the limit
parameter when you call this endpoint - it doesn’t really limit the numbers of total voters returned, merely the number of voters returned for each poll option. You can see this effect in the test I added for the multi-choice poll here - the first page is indeed limited to 2 voters per option, but in total, three different voters are returned (scattered across options). Switching to the elegant solution (i.e., using LIMIT :limit OFFSET :offset
) results in the limit
being applied to the total number of votes and not voters. I’m not 100% sure it’s better or more intuitive.
Anyway, I am new to this so I may be overthinking this. The simple solution does remove duplicate voters and doesn’t wreak too much havok, so it might be the way to go. I’ll wait for input before submitting a PR to the parent repo.
–
As an aside, I think there is another bug in this part of the code. The query to load voters is ordered by digest, rank and username - but when ordering by rank, it uses this condition:
CASE WHEN rank = 'Abstain' THEN 1 ELSE CAST(rank AS integer) END
However, ’Abstain’
actually corresponds to rank 0, not 1 - rank 1 can also be returned as ’1’
. This potentially makes the ordering non-deterministic across queries, which means that depending on the number of voters and the limit
value used, it could be possible to actually miss voters when doing paginated queries. In my new tests, I had to sort the returned voters to get around the non-deterministic nature. (Because it’s non-deterministic, I assume it’s not easy to reproduce in a test, but I can take a shot at it…)