Behaviour notes from testing ics_to_discourse.py
I’ve been running a series of tests on this script (with and without --time-only-dedupe
) and thought it would be useful to document the update/adoption flow in detail.
1. How uniqueness is determined
- Default mode: adoption requires start + end + location to match exactly.
- With
--time-only-dedupe
: adoption requires only start + end; location is treated as “close enough.”
If no existing topic matches these rules, a new topic is created.
2. The role of the UID marker
- Every event topic gets a hidden HTML marker in the first post:
<!-- ICSUID:xxxxxxxxxxxxxxxx -->
- On subsequent runs, the script looks for that marker first.
- If found, the topic is considered a UID match and updated directly, regardless of how noisy or stale the DESCRIPTION text might be.
- This makes the UID the true identity key. Visible description fields don’t affect matching.
3. Update flow with UID match
- Script fetches the first post and strips the marker:
old_clean = strip_marker(old_raw)
fresh_clean = strip_marker(fresh_raw)
- If old_clean == fresh_clean: no update (avoids churn).
- If they differ: check whether the change is “meaningful”:
meaningful = (
_norm_time(old_attrs.get("start")) != _norm_time(new_attrs.get("start"))
or _norm_time(old_attrs.get("end")) != _norm_time(new_attrs.get("end"))
or _norm_loc(old_attrs.get("location")) != _norm_loc(new_attrs.get("location"))
)
-
If meaningful = True → update with bump (topic rises in Latest).
-
If meaningful = False → update quietly (bypass_bump=True → revision only, no bump).
- Tags are merged (ensures static/default tags are present, never removes moderator/manual ones).
- Title and category are never changed on update.
- Update flow with no UID match
- Script attempts adoption:
• Builds candidate triples of start/end/location (or start/end only with --time-only-dedupe).
• Searches /search.json and /latest.json for an existing event with matching attributes.
• If found → adopt that topic, retrofit UID marker + tags (body left unchanged at this stage).
• If not found → create a brand new topic with the marker and tags. - Once adopted or created, all future syncs will resolve directly by UID.
- Script attempts adoption:
- Practical consequences
• Time changes
• Default: adoption fails (times differ) → new topic created.
• With --time-only-dedupe: adoption fails the same way; new topic created.
• Location changes
• Default: adoption fails (location differs) → new topic created.
• With --time-only-dedupe: adoption succeeds (times match), but location difference is flagged as “meaningful” → update with bump.
• Description changes
• If DESCRIPTION text changes but start/end/location do not:
• Body is updated quietly (bypass_bump=True).
• Topic revision created, but no bump in Latest.
• If DESCRIPTION is unchanged (or only noise such as Last Updated: that normalizes away), no update occurs at all.
• UID marker
• Ensures reliable matching on future syncs.
• Means noisy DESCRIPTION fields don’t affect whether the correct topic is found.
- Why the DESCRIPTION sometimes “stays the same”
The script compares the entire body (minus the UID marker).
If only a volatile line like Last Updated: is different, but it normalizes away (e.g. whitespace, line endings, Unicode), old_clean and fresh_clean appear identical → no update is made.
This is by design, to prevent churn from feed noise.
Summary
- Time defines uniqueness (always creates new topic when times change).
- Location changes → visible bump (so users notice venue updates).
- Description changes → quiet update (revision but no bump).
- UID marker = reliable identity key, ensures the correct topic is always found, even if DESCRIPTION is stale or noisy.
This strikes a good balance: important changes surface in Latest, unimportant churn stays invisible.