Errori del database memorizzati nella cache per 30 minuti in client_settings_json - il sito non si carica

Priority/Severity

Rare but critical, site becomes unusable for up to 30 minutes after transient database errors. Pages fail to load properly due to missing client settings causing JavaScript errors.

Platform

  • Affects: All Discourse installations
  • Observed in: Production environments experiencing transient database connectivity issues

Description

Actual Result
When a transient database error occurs (connection timeout, pool exhaustion, network glitch), the error gets cached for 30 minutes. During this time:

  • The site fails to load properly - pages appear broken or non-functional
  • Client-side JavaScript errors occur due to missing/invalid settings
  • Every request logs “Nil client_settings_json from the cache for ‘client_settings_json_[git_version]’”
  • Empty client settings are returned to the browser
  • This continues even after the database has fully recovered

Expected Result
When a transient database error occurs, the system should not cache the error. Once the database recovers, the next request should successfully fetch and cache the client settings, and the site should resume normal operation immediately.

Reproducible Steps

  1. Start a Discourse instance with client settings configured
  2. Simulate a database connectivity issue (e.g., exhaust connection pool, introduce network delay, or temporarily block port 5432)
  3. Make a request that triggers SiteSetting.client_settings_json (any page load will do this)
  4. Observe error: “Error while generating client_settings_json_uncached: [database error]”
  5. Pages fail to render correctly, JavaScript console shows errors related to missing settings
  6. Restore database connectivity
  7. Make additional requests over the next 30 minutes. Continue to see “Nil client_settings_json from the cache” errors despite healthy database
  8. After 30 minutes, the cache expires and the site finally recovers

User Impact

The site is effectively down during this period:

  • Pages don’t render correctly
  • JavaScript applications fail to initialize due to missing configuration

This turns a 1-second database hiccup into a 30-minute outage.

Root Cause

In lib/site_setting_extension.rb, the client_settings_json_uncached method catches exceptions and returns nil. This nil gets cached for 30 minutes, breaking the site.

Proposed Fix

A pull request has been submitted to fix this issue. The fix requires a simple change to re-raise the exception instead of returning nil.

Why it works:

  1. Re-raising the exception prevents Discourse.cache.fetch from caching the error
  2. The outer client_settings_json method already has proper exception handling that returns “” (empty string) without caching it
  3. The site can continue to function during database issues (though degraded)
  4. The site automatically recovers on the next request once the database is healthy
  5. The fix ensures that transient database errors don’t cause 30-minute outages.

Impact

This bug affects all Discourse installations experiencing transient database issues:

  • Connection pool exhaustion during traffic spikes
  • Network glitches between app and database
  • Database failover scenarios (e.g., primary/replica switchover)
  • Lock contention on the site_settings table
  • Query timeouts due to slow queries

Severity: Any of these common, transient issues causes the entire site to become unusable for 30 minutes, even if the underlying problem resolves in seconds.

Workaround

If you’re experiencing this issue right now, you can manually clear the cache to restore service:

In Rails console

Discourse.cache.delete(SiteSettingExtension.client_settings_cache_key)
1 Mi Piace

Great catch, I will take a look at your PR.

2 Mi Piace