Question about the extent of data tracking / pseudo-AI

(Michelle C. Basey) #1

I’m reading this in the Terms Of Service:

That special license allows the company to copy, publish, and analyze content you submit to the forum.

I wanted to double-check on the meaning before making wild promises to my community that aren’t true. If I’m reading the whole of the TOS correctly …

A user posting in my forum can safely assume:

  • I own my content and Discourse honors that.

  • My use of the forum is being tracked only so far as to ensure the safety and health of the community and reward me for my awesome engagement.
    (e.g. trolls, spammers, hackers, search & find goodness, new rooms to explore, no one’s been engaging this room for a while so let’s get rid of it, etc)

  • My profile data, private messages, and content is never being scraped for advertising data (assuming the host/I/me has all ad features turned off).

Please let me know if I’m making things up, misunderstanding, or otherwise failing to fully understand this wonderful tool.

(There could easily be features I haven’t stumbled into yet that run stats I haven’t even imagined I would be needing.)

(Joshua Rosenfeld) #2

Hi Michelle,

Those are the standard terms provided to all Discourse installs by default. You are welcome to change them as you see fit for your community, including removing that line. We recommend consulting with a lawyer to develop terms that fit your community.

Assuming you host your own site, we - Civilized Discourse Construction Kit, Inc., the company that develops Discourse - have no access to your data.

We have no access to any data on your site, unless such data is already public (and thus anyone on the internet has access to said data).

We do not track content on your forum. It is up to you to ensure the health and safety of your community. The only data we regularly collect about sites we do not host is their URL and installed Discourse version. We collect this data when your site pings us to check for new update. Both the URL and version of your site are public information, we have no special access that allows us to obtain it. If your site was behind a firewall (off the internet), we would not even know it exists.

100% correct. There are no ad features as part of Discourse core, so there is nothing to disable. You’d need to install Advertising Plugin for Discourse - Serve Ads on your Discourse Forum (Official Endorsed) or a similar plugin in order to have ads.


If we host your site, we clearly have access to all of your data. See Discourse Standard Hosting Terms | Discourse - Civilized Discussion for the terms that govern our usage of hosted customer data. (Spoiler alert, we don’t scrape our customer sites for advertising data, sell their data, or anything else nefarious)


Hope this clears things up! Let me know if you have any further questions.

4 Likes
(Michelle C. Basey) #4

yay!! thank for you swift and thorough reply

In my mind there was a difference between tracking content, and tracking usage.

Where “usage” (in my mind) was things like - you clicked 10 likes, replied 20 times, and 184 people replied to your topic.

The kinds of thing being “tracked” to increase levels - 0, 1, 2 …

but even as I’m typing this I thinking “It’s just a sql call counting rows. That’s not really “tracking” anyone!”

(p.s. is that a bug up there in this post - when a quoted “quote within a quote” isn’t pulling over both people being quoted?
(I used the “select quote that automatically starts the reply for you” method.))

(Joshua Rosenfeld) #5

Correct, the software is tracking everything that happens on the site. It has to, otherwise no trust level system, no rate limits, no … lots of other stuff. But that’s all happening locally on your server, it’s how Discourse works! Nothing is being sent anywhere else.

Not a bug, just a limitation of the quote system. Nested quotes are tricky - you really need to create them manually for them to work.

5 Likes