Long Running Sidekiq Job Restarting Internal Code

Luke_Clancy · May 8, 2020, 1:57am

I am currently creating a plugin related to meritmoot.com (which is in development and in rolling releases) and due to its use of outside data, I have a long-running sidekiq job that I am utilizing. Unfortunately for some reason the job continues to restart the internal job code without either failing the job or providing error output. Is there something I am overlooking in sidekiq that could cause sidekiq to trigger a restart?

On my last job (which is supposed to only take place once a day) related to Roll Calls, my job restarted at these intervals without an error:

0h-58m-42s (first iteration start time)
1h-30m-12s (first restart)
2h-1m-1s (second restart)
3h-46m-49s
4h-17m-11s
4h-47m-33s

The job did not finish, and all progress was restarted. It is worth noting that I have my own logging process which redirects stderr and stdout for my internal code. though I doubt it would interfere with sidekiq. (ask me if you want to take a look, its very useful for dev!)

It is possible for me to save progress made within the code, but I would rather a simpler process as that creates overhead. Is there something I am overlooking in sidekiq that could cause my code to trigger a restart?

Falco · May 8, 2020, 2:00am

So you want a sidekiq job that intentionally takes over an hour per run? Can you explain a little more why?

Luke_Clancy · May 8, 2020, 2:01am

I am taking in alot of external data and storing it internally on my website. The data has to do with USA Congressional information

edit: like publicly available bills / rollcalls and members

codinghorror · May 8, 2020, 2:28am

Are you running out of memory? On our hosting, we have sidekiq set to auto-restart if it begins using too much memory.

pfaffman · May 8, 2020, 2:33am

Here are some ideas:

Push the data via the API (but then you’ll hit rate limits)
run an external script, especially if it just needs to get the initial batch of data pulled in and updates will take less time

Luke_Clancy · May 8, 2020, 8:15pm

memory as in Database / Hard Drive memory? I am using a lot of that yeah. Currently looking into a slightly modified suggestion of @pfaffman’s where I fork the process, allowing the main thread to exit, but spawning a child process that has the same context. (essentially an external script in relation to sidekiq)

Luke_Clancy · May 8, 2020, 8:43pm

Testing the

pid = Process.fork
if pid.nil? then
  # In child
  exec "whatever --take-very-long"
else
  # In parent
  Process.detach(pid)
end

pattern from ruby - How to fire and forget a subprocess? - Stack Overflow to solve the issue. It is a bit of a weird problem, but the api I am connecting to doesnt have update functionality so I am basically just refreshing the data by re-downloading large portions of the api every day

I’ll let you know how it goes in a couple of hours

edit: stopped again - I think I will save my progress periodically and look into ways to make it more efficient

sam · May 13, 2020, 2:49am

Instead of doing this, why not teach your job to work in smaller chunks, does it really need to take 4 hours? Sync 10 topics, then another 10 … and so on.

Sidekiq has nothing that will terminate long running jobs, rebuilds of the app will do that and even upgrades via the web ui.

Luke_Clancy · June 1, 2020, 10:23pm

Thanks for all the help,

I ended up removing the process code, was not effective. The restarting job was just covering an underlying issue of being really inefficient . I have been instead:

Writing Bulk SQL code (which is much faster than sequential), detecting when it really needs to update, and allowing me to skip using the PostRevisor class to re-update items that have not changed
increasing efficiency in retrieving the data over HTTP by using persistent connections and other datapoints which include zipped items (where possible)

I have found Writing bulk sql commands has immense speedup. What I am updating is:

post table: cooked, last updated columns
topic table: title, last updated columns

My next idea is to skip PostRevisor entirely by doing something along the lines of:

1 - move data to temptable

2 - UPDATE topics FROM temp_table
      SET topics.title = temp_table.title, topics.last_updated = temp_table.last_updated
      WHERE topics.id = temp_table.id AND topics.title != temp_table.title

3 - UPDATE posts FROM temp_table
        SET posts.raw = temp_table.raw, posts.last_updated = temp_table.last_updated
        WHERE posts.id = temp_table.id AND posts.raw != temp_table.raw

4 - and then trigger the search re-index job as the title and content changed.

Is there something I am overlooking? Discourse is complex, and by skipping PostRevisor I feel like I could tread on tables I dont have experience in (post_stats, post_timings, post_uploads, quoted_posts are some I see in the database). However I also dont need all the validation that PostRevisor provides as the system is getting these revisions from a trusted predictable source. Seems pretty hit or miss of a solution.

What do you think?

Luke_Clancy · June 3, 2020, 3:10am

Update - was doing some code checks since there was a strange amount of updates over time and found that there is something causing unwarrented updates on data items that have not actually updated in their raw json format. Once this error is resolved the above will probably not be necessary should of done testing… would of saved me alot of time. I think I may still try out the above though. Just wont be priority. It will help for quick updates when I change the format of how data is presented. Plus its already written out, just not tested.

Luke_Clancy · June 10, 2020, 9:59pm

Finished the bulk update code - would you be interested in having this pushed to a specific branch once its more stable? Its rather specific in its use case, but for what it does it can update thousands of records quickly, including tags. Its built to extend TopicsBulkAction. Heres the readme I wrote if you want more in depth information:

  # input
  #   -lst of hashes containing cooked, post_id, topic_id, title, updated_at, tags (raw will point to cooked)
  #   [{post_id: #, cooked: "", topic_id: #, title: "", updated_at: date_time, tags: [{tag: "", tagGroup: ""}, ... ] } ,  ... ]
  #   -category_name, the name of the category being updated. This is used for search indexing.

  # optional hash attributes to include in list items:
  #   -raw, if not included will be equal to cooked.
  #   -fancy_title, if not included will be equal to title
  #   -slug, if not included will be processed from title (this is related to its url)

  #use case, updating topics regularly from a changing non-discourse backend datasource in an efficient manner
  #to mirror updating information. Note that this is not made for general post or topic posting, but for updating
  #topic's title, and the topic's MAIN post. For general post revision, go to PostRevisor in lib/post_revisor.rb

  # - Assumes pre-cooked, custom cooked, or viewed as-is. Data is not validated.
  # - posts should have (cook_methods: Post.cook_methods[:raw_html]) set on creation if your raw == cooked.
  #     You would do this if you are writing custom html to display inside the post. 
  #     Otherwise discourse may re-cook it in the future which would be bad. Make sure source of information
  #     is trusted and its contents escaped.
  # - If the above is not ideal, then make sure to include raw, set the correct cook method in your post's creation
  #     (in case system re-cooks) run raw through your chosen cook method, and include raw, and the outputted cooked
  #     in your hashes.
  # - Keeps track of word_count through noting differences in the before and after word counts of the post, and passing that
  #     to the topic.
  # - Keeps track of tag count in a similar manner

Luke_Clancy · June 22, 2020, 8:24pm

https://gist.github.com/LukeClancy/d63cd7a89d6b43217bdb407a42c388ca

Topic		Replies	Views
Reschedule sidekiq scheduled job Support	18	1581	July 27, 2018
Sidekiq stops after some time Installation	8	1112	July 14, 2023
When i restore, get this Sidekiq did not finish running all the jobs in the allowed time Support	0	194	December 21, 2023
Jobs are stuck in Sidekiq after a restart Dev	1	2521	April 12, 2016
Scheduled backup fails when Sidekiq is forcefully restarted Support	5	865	April 9, 2018

Long Running Sidekiq Job Restarting Internal Code

Related topics