Posts sometimes randomly not displaying


#1


BUG: Something appears to be preventing post content from displaying. It is affecting every topic, seemingly at random. You can hit reply & then fullquote to see what they said, but can’t see it in the actual post body itself.
EXPECTED: Forum that works.

No we are not faking blank posts with empty HTML chars. The div.cooked contents are completely empty. It is kind of getting in the way of normal discourse.


[Solved] Posts appearing blank in 1.0
(Jeff Atwood) #2

Can you repro it here or on try.discourse? Does it happen across different browsers?


#3

It is happening to all our users, seemingly regardless of browser (i’ve used Chrome & IE11). try reading: The Home Stretch - What the Daily WTF?

There were some guesses as to what caused it, but we haven’t homed in on any single cause yet. It’s just started happening in every topic someone posts in, since about 1 - 2 hours ago.

The very first post anyone noticed vanish was Login to your account - What the Daily WTF?

Then there were several more ok posts and now it’s basically 95% of the posts fail to work.


(Jeff Atwood) #4

All right, I can confirm I see the same weirdness on Chrome Android as well.

Let’s try a few things

  1. Update to latest revision. OK, didn’t help.

  2. Check JavaScript console for errors f12. Nope don’t see any.

  3. Check /logs for errors (definitely a few around user-created SQL query badges.)

  4. Check /admin/logs/staff_action_logs to see if any weird CSS, HTML, or JS was added by an admin. Don’t see anything serious in the last day or two…

  5. Disable custom CSS. Nope. didn’t help.

  6. Disable custom HTML head/footer. Nope. Didn’t help.

  7. Disable custom badges that are throwing errors in /logs. Not sure if this has helped yet. edit: it definitely stopped the SQL badge errors from appearing in /logs, so that’s something.

Looks like possibly a rogue admin added badge query to me? cc: @PJH


(Jeff Atwood) #5

More specifically

Job exception: PG::Error: ERROR:  column q.post_id does not exist
LINE 15: ...RE (ub.badge_id IS NULL AND q.user_id <> -1) AND (q.post_id ...
                                                              ^
: INSERT INTO user_badges(badge_id, user_id, granted_at, granted_by_id, post_id)
            SELECT 139, q.user_id, q.granted_at, -1, NULL
            FROM ( SELECT user_id, current_timestamp granted_at FROM badge_posts  where topic_id not in (

SELECT topic_id FROM badge_posts GROUP BY topic_id HAVING count(topic_id) <4

) and topic_id not in (

1000, 1673

) GROUP BY user_id HAVING count(*) >= 512 ) q
            LEFT JOIN user_badges ub ON
              ub.badge_id = 139 AND ub.user_id = q.user_id
              
            WHERE (ub.badge_id IS NULL AND q.user_id <> -1) AND (q.post_id in (64971))
            RETURNING id, user_id, granted_at

Which I guess is the 2^x badge series around awarding badges for certain post counts? And the attendance badge too?

I disabled those badge series in /admin/badges to see if that helps.

I suspect all users with post counts > whatever the threshold is might be affected when they post?


Badges SQL problem
(Sam Saffron) #6

To me this looks like someone figured out how to inject js somehow, probably in profile data

Will download a backup and muck around


(Jeff Atwood) #8

Hmm. Not sure.

Posts from the same user can be randomly “hidden”, then right below, show up.

Also, editing these hidden posts as an admin causes them to re-appear. Which implies rebaking is a factor and the server side, not the client?


(Sam Saffron) #9

To me the one thing that screams out in the logs is:

ActionView::Template::Error (incompatible character encodings: ASCII-8BIT and UTF-8)

Coming from

<b><%= post.user.username_lower %></b> <%= "(#{post.user.name})" if SiteSetting.display_name_on_posts %> — <%= post.created_at.to_formatted_s(:iso8601) %> — #<%= post.post_number %>

in topics show.html.erb , I tried splitting the line up to see what fragment is causing this.

strangely it appears that the restart stopped this error from happening.

This is a very hard one to diagnose.


(Sam Saffron) #10

FYI, did a full rebake of all posts in last 24 hours, full docker upgrade, full discourse docker image upgrade.

Watching logs to see if the encoding issue creeps back, let me know if anyone knows how to repro OR it starts happening again.


(Jeff Atwood) #11

The encoding issue only appeared after the upgrade and after we cleared all the custom SQL badge errors by disabling those badges that have custom SQL.

Since rebaking worked, even many hours ago (an admin editing a post would fix it) I still suspect this is a server side error triggered by the custom badge SQL errors…


(Sam Saffron) #12

this is an indicator that post baking was bust, possibly a ruby racer bug, I am not sure how a sql syntax error would throw this off


(PJH) #13

While there may be a problem with the badges, they were added some time ago, not last night.

I’ll be looking at the badges when I get back Tues.


(Sam Saffron) #14

I changed all the 2^ badges to run daily, I need to post here explaining how the trigger stuff works.


(PJH) #15

Please include when daily stuff runs with respect to other stuff (I’m thinking of the attendance badges vs user_visits table here where I suspect daily would be insufficient causing off by one incidents, if not breaking them altogether requiring even more complicated SQL. crontab style triggers could mitigate this…)


(Jeff Atwood) #16