ICS → מייבא Discourse דרך ה-REST API

Behaviour notes from testing ics_to_discourse.py

I’ve been running a series of tests on this script (with and without --time-only-dedupe) and thought it would be useful to document the update/adoption flow in detail.


1. How uniqueness is determined

  • Default mode: adoption requires start + end + location to match exactly.
  • With --time-only-dedupe: adoption requires only start + end; location is treated as “close enough.”

If no existing topic matches these rules, a new topic is created.


2. The role of the UID marker

  • Every event topic gets a hidden HTML marker in the first post:
  <!-- ICSUID:xxxxxxxxxxxxxxxx -->
  • On subsequent runs, the script looks for that marker first.
  • If found, the topic is considered a UID match and updated directly, regardless of how noisy or stale the DESCRIPTION text might be.
  • This makes the UID the true identity key. Visible description fields don’t affect matching.

3. Update flow with UID match

  1. Script fetches the first post and strips the marker:
old_clean = strip_marker(old_raw)
fresh_clean = strip_marker(fresh_raw)
  1. If old_clean == fresh_clean: no update (avoids churn).
  2. If they differ: check whether the change is “meaningful”:
meaningful = (
    _norm_time(old_attrs.get("start")) != _norm_time(new_attrs.get("start"))
    or _norm_time(old_attrs.get("end")) != _norm_time(new_attrs.get("end"))
    or _norm_loc(old_attrs.get("location")) != _norm_loc(new_attrs.get("location"))
)
  • If meaningful = True → update with bump (topic rises in Latest).

  • If meaningful = False → update quietly (bypass_bump=True → revision only, no bump).

    1. Tags are merged (ensures static/default tags are present, never removes moderator/manual ones).
    2. Title and category are never changed on update.

  1. Update flow with no UID match
    1. Script attempts adoption:
      • Builds candidate triples of start/end/location (or start/end only with --time-only-dedupe).
      • Searches /search.json and /latest.json for an existing event with matching attributes.
      • If found → adopt that topic, retrofit UID marker + tags (body left unchanged at this stage).
      • If not found → create a brand new topic with the marker and tags.
    2. Once adopted or created, all future syncs will resolve directly by UID.

  1. Practical consequences
    • Time changes
    • Default: adoption fails (times differ) → new topic created.
    • With --time-only-dedupe: adoption fails the same way; new topic created.
    • Location changes
    • Default: adoption fails (location differs) → new topic created.
    • With --time-only-dedupe: adoption succeeds (times match), but location difference is flagged as “meaningful” → update with bump.
    • Description changes
    • If DESCRIPTION text changes but start/end/location do not:
    • Body is updated quietly (bypass_bump=True).
    • Topic revision created, but no bump in Latest.
    • If DESCRIPTION is unchanged (or only noise such as Last Updated: that normalizes away), no update occurs at all.
    • UID marker
    • Ensures reliable matching on future syncs.
    • Means noisy DESCRIPTION fields don’t affect whether the correct topic is found.

  1. Why the DESCRIPTION sometimes “stays the same”

The script compares the entire body (minus the UID marker).
If only a volatile line like Last Updated: is different, but it normalizes away (e.g. whitespace, line endings, Unicode), old_clean and fresh_clean appear identical → no update is made.
This is by design, to prevent churn from feed noise.


Summary

  • Time defines uniqueness (always creates new topic when times change).
  • Location changes → visible bump (so users notice venue updates).
  • Description changes → quiet update (revision but no bump).
  • UID marker = reliable identity key, ensures the correct topic is always found, even if DESCRIPTION is stale or noisy.

This strikes a good balance: important changes surface in Latest, unimportant churn stays invisible.