Hello Discourse Community,
I’d like to introduce myself — I’m a system designer currently exploring the Discourse ecosystem. While I’m not yet an expert in Discourse’s internal architecture, I chose this platform for my upcoming project because of its strong data model, extensibility, and community-driven governance.
I’m currently preparing a proposal to present to my supervisor for development approval. By sharing this here, my goal is to gather feedback, insights, and constructive criticism from people who know Discourse best — particularly around feasibility, overlap with core features, and alignment with Discourse best practices.
Note: The proposal below is fairly detailed and technical.
I may not be able to reply immediately, but I will read and respond to every comment as soon as possible.
Plain-Language Overview (Quick Context)
The problem:
In many community discussions — especially around news, public policy, science, or controversial topics — claims are frequently challenged, corrected, or refined over time. While Discourse excels at discussion, it is still difficult to clearly see:
- where a claim originally came from,
- how it evolved through replies and quotes,
- which posts support, correct, or contradict it,
- and how confidence in that claim changes as the discussion grows.
The idea:
This project explores a plugin that turns discussion threads into transparent evidence graphs, helping communities perform collective fact-checking without relying on a single authority or binary “true/false” labels.
Instead of declaring facts, the system focuses on traceability, structure, and signals — allowing readers and moderators to judge credibility based on how information is sourced, challenged, and verified within the community.
Who might benefit:
- Moderators & admins:
- Supporting fair, evidence-based moderation decisions
- Identifying corrections, disputes, and unstable claims more quickly
- Fact-checkers & researchers:
- Tracing claim lineage and verification paths inside discussions
- Community members:
- Engaging in debates with clearer context and shared references
- Readers:
- Understanding why a claim is trusted or disputed without reading every reply
[Facto Map]
This is an example of how I imagine the graph could look in practice.
It is intended for specialized use cases or for topics with overwhelming amounts of replies, where manually tracking, verifying, and understanding all arguments becomes impractical.
The Facto Map helps visualize claims, evidence, contradictions, and verification paths, making it easier for communities to fact-check, assess credibility, and conduct structured debates without losing context.
What follows is a technical specification / RFC-style proposal focused on architecture and feasibility.
Technical Specification
Project Title: Discourse OriginGraph & Facto-Mapper
Subtitle: Native Data Provenance Tracking & Reliability Analysis System
Version: 1.0.0 (Proposal)
1. Executive Summary
In an era of rapid information dissemination, discussion platforms like Discourse excel at structured argumentation but lack native tools for data provenance analysis, claim evolution tracking, and transparent fact-checking workflows.
Discourse OriginGraph & Facto-Mapper is a plugin designed to augment Discourse with a community-driven fact-mapping layer. Rather than enforcing absolute truth, it provides visual and statistical tools that help communities prove, challenge, and contextualize claims through structured discussion.
The system emphasizes:
- traceability over authority,
- confidence signals over verdicts,
- and collective reasoning over centralized judgment.
2. Technical Objectives
-
Traceability:
Directed Acyclic Graph (DAG) for Source → Expansion → Verification → Correction -
Fact-Checking Support:
Provide structural evidence for claims, rebuttals, and corrections without labeling content as true or false -
Visualization:
Interactive “Facto Maps” embedded directly in topic views -
Heuristic Analysis:
Weighted confidence-based scoring derived from discussion structure and verification patterns -
Performance:
Asynchronous processing via Sidekiq -
Integration:
Strict adherence to Discourse plugin architecture (Rails / Ember.js)
3. Scope of Work
3.1 In-Scope
- Intra-topic relationship graphing (reply, quote, mention, correction, contradiction)
- Signal extraction from cooked HTML and reply metadata
- Configurable Origin / Stability / Dispute scores
- Governance via Discourse Trust Levels
- Moderator-visible fact-checking signals (non-authoritative)
3.2 Out-of-Scope
- Automated truth classification (no true/false labels)
- NLP / LLM semantic analysis (Phase 1)
- Global search replacement
- Cross-instance federation
4. System Architecture
4.1 Technology Stack
- Backend: Ruby on Rails (Discourse Core), Sidekiq
- Frontend: Ember.js, D3.js or Cytoscape.js
- Database: PostgreSQL 13+, Redis
- Data Interchange: Internal JSON API
4.2 Conceptual Architecture Diagram
[Client: Ember.js] <-- JSON --> [Controller: Rails]
| |
(Interactive Facto Map) (Request Validation)
| |
v v
[Visualizer Library] [Sidekiq Worker Pool]
|
+--------+--------+
| |
[Graph Engine] [Scoring Engine]
| |
+--------+--------+
|
[PostgreSQL]
(Edges / Snapshots / Logs)
5. Data Model (Schema Design)
5.1 Table: provenance_edges
| Column | Type | Index | Description |
|---|---|---|---|
| id | BigInt | PK | Unique edge ID |
| topic_id | Integer | IDX | Topic reference |
| source_post_id | Integer | IDX | Origin node |
| target_post_id | Integer | IDX | Destination node |
| relation_type | Enum | reply, quote, ref, correction, contradiction | |
| weight | Float | Edge strength | |
| metadata | JSONB | Context data |
5.2 Table: facto_graph_snapshots
| Column | Type | Index | Description |
|---|---|---|---|
| id | BigInt | PK | Snapshot ID |
| topic_id | Integer | UNIQUE | Associated topic |
| version | Integer | Graph version | |
| graph_payload | JSONB | Nodes & edges | |
| computed_at | Datetime | Generated time | |
| is_public | Boolean | Visibility flag |
5.3 Redis Keys
- facto:quota:user:{id}:daily
- facto:job:topic:{id}:status
6. Internal API Specification
POST /facto/analyze
- Auth: TL1+
- Params: topic_id, force_recalc
- Response: job_id, status = queued
GET /facto/graph/:topic_id
version: 5
nodes:
- id: 101
group: source
score: 0.8
edges:
- source: 101
target: 105
type: verification
7. Algorithms & Logic
7.1 Signal Extraction Logic
- Iterate all posts in topic
- reply_to_post_number → Reply edge
- Parse cooked HTML → Quote edge
- Regex @username → Mention edge
- Moderator annotations → Correction / Contradiction edges
7.2 Scoring Algorithm
Weighted centrality (PageRank-style):
Score(P) = (1 - d) + d × Σ((Score(Pi) × Weight(Ei,P)) / OutDegree(Pi))
Negative or contradictory edges apply penalty multipliers rather than binary rejection.
8. UX / UI
- Entry point: “Graph View” button in topic map
- Fullscreen modal Facto Map
- Hover node: post snippet + author + confidence signals
- Click node: scroll to post
- Filters: show/hide corrections, contradictions, or only verified paths
9. Security & Governance
- Rate limiting via Discourse RateLimiter
- JSONB sanitization to prevent XSS
- Private topics inherit Discourse ACL
- No automated moderation actions — signals are advisory only
10. Development Roadmap
- Phase 1: MVP graph extraction + basic visualization
- Phase 2: Advanced confidence scoring + moderator annotations
- Phase 3: External fact-checking API & research exports
