既存のフォーラムをDiscourseに移行することに関する質問

皆さんこんにちは :slight_smile:

長い投稿になってしまい申し訳ありませんが、Discourse に詳しい方がいれば、すぐに答えがわかるかもしれません。

私は趣味のフォーラムを共同運営しており、モデレーションを行っています。運営者は2人で、もう一人の方がソフトウェア(これも Ruby で書かれています)を開発しました。既存のフォーラムは完全にカスタムソフトウェアで、PHP-BB や Vbulletin などと比較してシンプルであることが特徴です(それらはハッキングの被害に遭い続けています)。データベースの容量は約 40GB で、投稿数は 20 万件です。様々な理由から、データベースを別のプラットフォームに移行することを検討しており、Discourse が適しているように思われます。

予備テストでは、画像や動画の埋め込みなど、全体的な機能面では非常に優れていることがわかりました。Android スマホからの複数画像アップロードも正常に動作しています!

ただし、いくつかのカスタマイズが必要です。主にユーザーインターフェースの簡素化です。重要性の順ではありませんが、例を挙げます:

  1. 投稿総数を表示しないこと。これは新しいメンバーが圧倒されないようにするためです。

  2. 一定時間経過後のユーザーによる投稿編集をブロックすること(現在は 2 時間に設定)。これは、この分野でよく見られる一種の荒らし行為を防ぐためです。

  3. 有料広告(Paypal での決済機能付き)の分類広告セクションがあると良いですね。価格設定の構成や決済リンクなど、複雑な問題があることは承知しています。

  4. 投稿日付に年を明確に表示すること。

  5. 管理者が特定のユーザーにアクセスし、同じブラウザインストール(基本的にはクッキー)でアクティブな他のユーザーを確認できる機能。Discourse には既に IP ベースの機能があるようですが、現在では IP は効果的ではありません(特に複数のアカウントを運用したい人々はモバイルデータを使用することが多いため)。このスレッド Handling trolls with multiple accounts over VPNs - #18 by ljpp や他のスレッドを読んだところ、多くの人が同じ道を進んでいることがわかりました。VPN などに精通した人物に対する解決策はありません。最終的には投稿スタイルや、非常に不快な投稿をしてBANされることで正体が露見することが多いです。また、同じパスワードのハッシュを検出できることも有用だと思います。多くの人が全てのアカウントで同じパスワードを使用しているためです :wink:

  6. 管理者向けのシンプルな線形投稿リスト。これにより、スマホで直近の x 件の投稿を非常に素早く確認できます。これはサブドメイン上でデータベースに直接アクセスするコードで実装できると思います。このリスト内で「DELETE」と「BAN」ボタンを用意し、不快な投稿(残念ながらフォーラムでは珍しくありません)をした人物を素早く排除できるようにします。

  7. 既に実装されているかもしれませんが:管理者がスレッド内の選択された(または全ての)投稿を別のスレッドにマージし、マージされた投稿が正しい時系列順になるようにすること。これは、リンクが破綻する可能性があります。ただし、リンクがサイト固有のもの(例えばスレッド内の投稿番号ではなく、データベース内の投稿番号)であれば問題ありません。

  8. 過去 12 ヶ月または 24 ヶ月以内にログインした全ユーザーの CSV メールリストを管理者が生成する機能。古く inactive なユーザーへのメール配信は、配信が非常に遅い(1 分間に 1 通のみ)ことでリスクを最小化しているにもかかわらず(また、sharklasers.com のような使い捨てアドレスはメール配信リストから除外しています)、ブラックリスト(RBL など)に登録される可能性が高まることがわかりました。

  9. GDPR 準拠のため、ユーザーのプロフィールでこれらのメールの受信可否を選択できる設定。

私はちょうどここで GDPR に関するスレッドを読みました。私の理解では、英国では投稿者は自分の投稿の削除を要求する権利はありません。ログイン情報の削除は可能です。Discourse がこの分野で追加的に脆弱であるかどうか疑問に思います。私たちのフォーラムでは、ほぼ全員がニックネームを使用しています。

  1. 管理者が PM(プライベートメッセージ)を読む機能。これは必須です。多くのスパマーが登録して投稿せず、PM のみを送信するためです。誰かが苦情を申し立てない限り気づきませんが、多くの新規登録者は疑わしい(ただし明確ではない)ため、しばらく監視しています。例えば、ユーザープロフィールに「国」の設定があり、登録時に指定する必要がありますが、ドイツと設定しているのにタイの IP からのアクセスは不審ですが、タイにいるドイツ人かもしれません!

  2. ユーザーの所在地の「国」設定を、登録時に強制すること(好きなものを入力できることは承知しています)。

コードを修正する場合、アップデートの適用が困難または不可能になる可能性があることは承知しています。

怪しい登録は深刻な問題です。現状、登録の 10〜20% が怪しいと考えています。何もしなければ、将来的に多くの問題が発生します。一般的な行動パターンは、登録して 1 週間待ち、その後スパムを投稿し始めることです。

残念ながら、私は Ruby について何も知りません。PHP は少しだけやったことがあります。私の IT 専門知識はより一般的です:POP と SMTP サーバー、VM、VPN、FTP、SPF、DKIM、ルーター設定。簡単な HTML はできますが、CSS はできません。昔の IT 専門知識は組み込みシステムのハードウェアとソフトウェア(アセンブラと C)です。オリジナルソフトウェアを書いた方は、データベース移行のサポートを申し出てくれました。他の部分を手伝ってくれる人との接点もありますが、現時点では直接 Ruby の専門家はいません。Linode サーバーでいくつかのサイトを運用しており、非常に信頼性が高いため、ホスティングの第 1 候補にしています。

ここまでお読みいただき、ありがとうございます。また、これらの中で既に実装されているものはどれか、残りの作業量はどれくらいか、あるいは類似のものがあるかについて、いくつかのヒントをいただければ幸いです :slight_smile:

Hi Peter!

there are people much more knowledgable than me, but I can try to answer at least some of them. :slight_smile:

  1. This seems like a trivial CSS fix, I am sure someone will help you here.
  2. I believe there is a site setting for this so it should be trivial.
  3. I guess you’ll need a separate post for this with more specific info and examples. You can try to search for plugins here, there are some that might do the job. but hard to say for sure.

Ad 6. Generally, Discourse relies on users flagging nasty post to help with moderation. You can set it up in a way that only one flag from a trusted user will hide the post. For your specific ask you might need a plugin.

Ad 7. This is not possible at this point I believe. Search around on meta, there are discussions about this.
Ad 8. Yep, you can easily export user data and then filter out based on “last seen” column

Ad. 10: Yep, out of the box. Admins can access everything.
Ad. 11: Yep, doable in site customization, the keyword is custom user fields, and you can enforce them at sign up.

「いいね!」 1

Hey Peter :smiley:.

Already some good answers from Daniel, let me see if I can assist with the rest and/or provide more details.

  1. Definitely trival CSS. Just need to know where you are seeing this number (it may appear in more than one location) and we can help you hide it.
  2. Yep, this is a site setting. post edit time limit. Default is 60 days, 2 hours seems really short - I’d expect it to lead to lots of “oh by the way” or “edit” replies to one’s own post.
  3. Definitely need more details, but this sounds like plugin territory.
  4. You can customize the post timestamp to display year (instead of relative time since post), could also likely add the year separately via a theme-component or plugin.
  5. Check out GitHub - discourse/discourse-fingerprint: A plugin that computes user fingerprints to help administrators combat internet trolls. - not sure the status of the plugin though, @dan can likely share more.
  6. You can see all posts sorted by recency via RSS, but there is obviously no delete or suspend buttons there. We’ve found that context is key, and showing posts outside their topic can lead to mistakes. A seemingly innocuous post by itself may in fact be quite bad in context. A seemingly trolling post by itself may be a nice friendly joke in context.
    That said, you can build this via the API if you need to.
  7. You can merge posts from one topic to another, but it does not support reordering the topic - all merged posts are added to the end of the topic. Merging won’t completely break links - they won’t point directly to the post’s new location, but there’s a “x posts moved to …” post added to the topic where the posts were moved from, which can be clicked to jump to the new topic.
  8. Check out Discourse Data Explorer as well.
  9. Users have full control over emails they receive. By default (assuming no modified site settings), they’re emailed when mentioned, PM’d, directly replied to, or quoted. They’re also emailed when someone replies to a topic they started, and one email every 7 days when they haven’t logged into the site, up to a maximum of 52 “weekly digest” emails. Users can modify these preferences at any time, and all emails have an unsubscribe link. If you’re planning to email users from an external tool (say mailgun), you can add a custom user field (checkbox type) which users can use to indicate if they wish to receive emails.
  10. Yep.
  11. Yep.

Talking about dodgy signups - we don’t see too many dodgy signups as being a JavaScript application prevents most old-fashioned scrapers from being able to see the signup fields, let alone verify email.

「いいね!」 5

Thank you Daniel.

I have been doing some testing with a sample site (running on a laptop :slight_smile: )

Regarding point 7) I see that post moving is supported but I see the original link does break. So the “share” link is not a unique link to the post. It just takes you into where the post was in the thread before the move.

We had this issue on the existing forum too (where I do a lot of thread merging, to keep the site informative). It was initially addressed by entirely removing the link to the post after the move… This proved to be a hassle in the long run; the only way you could get a link to such a post was by doing a search for some fairly unique string within it and then you could extract a link to it from the search results. Eventually it was solved using the cunning method of using the database post ID (a large number on a forum with 200k posts) as the link, and that works great while looking slightly odd. However if a forum used the database ID # from the start, it would work great.

Many thanks too Joshua. A real pity about the merging issue; that is a big thing on the original site.

Re point 6) I currently see the Section / Thread Name / post text, so the whole context is there. I know what you mean; you can’t tell what the post is a reply to because that post could be 50 posts down. But usually an offensive post is obviously offensive in a standalone manner. Especially if you see the poster’s name… I find that within the past 30 days about 200 people have posted and the rude ones run around the 2% mark. It is a bizzare observation that you could discharge the mod job by marking the other 98% invisible and just getting an RSS feed with the 2% :slight_smile: :slight_smile:

BTW I find that all the dodgy signups are clearly humans. They even fill in their profile intelligently in some cases, then come back a week later and drop the spam in :slight_smile: There is an army of them in India, Africa and similar places and they know how to do it. The majority are selling fake passports but we had one recently who was much more clever. He took a post from earlier in the thread and pasted it in (so producing a sensible looking post, which was actually a question). He then changed the question mark into a live link, which most people would not spot. But Chrome will prefetch all the links on a page… a nice attack vector! At that point we changed it so every signup is manually approved. Not a major workload on a forum of this size.

Discourse post links use the following form by default:

{scheme}://{hostname}/t/{topic-slug}/{topic-id}/{post-number-within-topic}

The topic-id is the most important, using an invalid topic-slug, or even skipping it will still work.

For example, for your post above, any of these would work:

https://meta.discourse.org/t/questions-about-moving-an-existing-forum-to-discourse/104948/4
https://meta.discourse.org/t/fake-slug-here/104948/4
https://meta.discourse.org/t/104948/4

You can link directly to a post, by using the following format (you will need to manually create such links, or use a plugin to help create them for you in the Share UI):

{scheme}://{hostname}/p/{post-id}

Again, for your post above:

https://meta.discourse.org/p/514248

You can obtain the post-id by looking at the json data for a given post: https://meta.discourse.org/t/questions-about-moving-an-existing-forum-to-discourse/104948/4.json


I disagree with you that the category, topic title, and post text are enough context in all cases. Some examples:

  • Topic started talking about “Favorite Pizza toppings”. Over time, the discussion has shifted to favorite restaurants somehow. If solely considering category, title, and post, a reply about a hamburger joint would seem spammy, or at least off topic. But by reviewing a few preceding replies you can see that the topic has simply digressed, not a specific user.
  • Topic started talking about “how to fix a flat tire”. Two users are going back and forth discussing how to do so. User B replies with the solution. User A realized he missed something obvious, and makes a self-deprecating remark. Without seeing the preceding replies, one may be quick to delete the post as inappropriate, even when it would be fine (and possibly even humorous if done nicely) within the topic.

Interesting to hear about the human spammers. Good news in the specific example you shared about profiles: we compile a list of all user who sign up, modify their profile, and don’t post. You can view the list of “suspect users” from your admin dashboard - and quickly delete them, optionally blocking the email and IP from further sign ups if you want.

Saw your edit: manual approval of new accounts is a simple setting change.

「いいね!」 2

Joshua - indeed, I have been testing the post links. Only the x/y number is required. It is the same on ours. The issue is that moving the post to a different thread breaks that link.

The drawback with blocking people who create a profile but never post is that lots of genuine people do that (on our forum at least) because having a login gives you various privileges e.g. seeing new posts. I am aware one can achieve that with just cookies but I am told that presenting user-specific context without a login is regarded as bad practice these days (well, Amazon does it, to a certain level, so it must be good). We have ~3.5k signed up of which 1.5k have posted.

That fingerprint feature would be hugely controversial if it was discovered, which it will be very quickly if implemented with client-side keystroke timing i.e. javascript. And if you do the timing server-side (I apologise for lack of clarity / wrong terminology) then you break various browser features like spell checking, because each keystroke goes straight to the server separately.

Regarding the spam, you might find some interesting ideas in a relatively recent discussion here:

「いいね!」 1

Yes; that tactic works because most forums don’t check posts except when originally posted. Edits are not checked :slight_smile:

Another tactic is to put something in your profile or your avatar. Those don’t get checked either, but they get picked up by google.

It’s all about getting SEO.

Lots of answers to your questions here. I think that the human spam is also mitigated by Akismet.

I have written several custom importers. You can see my notes about them here. It’s how I make my living, so it is largely about paying me to do the job, but also gives you some idea of what the issues are should you write the importer yourself.

「いいね!」 1

We have decided to leave the migration for the time being. The requirement for post moving is quite important.

If this feature is sorted out one day, we would be happy to re-visit the project.

So I thank you all for your answers :slight_smile:

You definitely want to run the Akismet plugin, however, as there are a large number of 100% human spammers running around out there today. Big increase in human spam numbers over the last 10 years.

Slightly off topic; should Discourse admins have access to member PMs (P emphasized for obvious reasons)?

The answer to whether or not Admins should have access to Personal Messages was answered in the OP (# 10) - spammers

Admins on every forum need full access, for many reasons. Legal (e.g. member harassment by another member), spam management (e.g. many spammers join up and just send out lots of PMs without ever posting), subversion (disgruntled posters sometimes embark on a background mailing campaign to trash the site, spread allegations about the admins, etc), covert advertising (a real person joins up and sends out loads of PMs to promote his forum related business)… Any forum gets some mixture of this stuff… unfortunately, in addition to spammers, there are several % of people who cause trouble. In reality I am sure most forum admins never look at this stuff routinely (I certainly don’t) because they have better things to do, but you want there to be triggers e.g. more than 10 PMs sent within a day, within an hour, whatever.

Ultimately absolutely everything you do on some website is visible to whoever has admin access to the server.

The only Q is what access moderators have, and on most small sites they are the same person(s). This then leads to the topic of renegade mods (of which I have seen many examples, having been on forums since the internet got going, Compu$erve onwards :slight_smile: ) and this is a real problem. If you have a large site and have to appoint a load of mods, this policy needs to be done carefully because a renegade mod can do tremendous damage.

I have another Q on Discourse: is there a feature or a plug-in which can implement a question/answer session at signup time? For example if running a space flight related forum, you may want to put in some multiple choice question about the order of the planets’ distances from the sun, to keep the less intelligent spammers from signing up.

「いいね!」 1

This plugin is ready for production and it did in fact run on a well-known Discourse for some time, but it was not a “toxic” community that would return lots of results. From what I can remember, there was a problem with Apple devices, as there is not much diversity when it comes to those and Safari does not leak very much info either. So, please take the results with a grain of salt.

If you decide to use it, please let me know. I would be very interested in seeing how it works in a real-world situation.

That is something I definitely do not recommend. It would decrease the overall security strength of the authentication system.

「いいね!」 4

Presumably because you would need to have an indexed database of the hashes, which is yet another thing that can be hacked?

How come Safari doesn’t leak enough info? I know nothing about this topic but AIUI the only way to do keyboard fingerprinting is to time the keystrokes somehow, with millisecond resolution. But didn’t browsers do something very recently on the timing front, to block the Kaiser / Meltdown exploit which of necessity relies on accurate timing?

As regards other means of user fingerprinting, there is a ton of methods. There was one website (which now escapes me) which you went to and it produced a breakdown of all it can see about you, e.g. browser type, screen pixel size, java version, about 20 other things. It then calculated how unique the ID is (based on e.g. how many Chrome v41.5.6 users were running a specifies combination of plugins etc etc). And it very quickly got to c. 99.99%. So even if someone clears their cookies on every visit, they are still about 99.99% identifiable… well for a while until their browser gets an update etc. But the server then has to keep all that stuff in the logfiles and index it all up; I am sure most forums don’t bother.

And all someone needs to do to defeat all that is to use a different client device on a mobile IP. I have just had someone create their 4th character this way (he didn’t last long because he started posting his usual trolls). Or if they want to do it all from their home PC, they just need to run a throwaway browser instance in a VM and going out via a VPN terminating in the Peoples’ Republic of Cameroon (yes this is a real example too :slight_smile: ). I’ve seen one chap (who created about 10 identities) use the TOR browser, but his alter egos were a dead giveaway because who would be browsing a special interest (technology related) using such a heavy duty method intended for illegal activities, and whose IP really does map to various African countries?

We had one guy who ran two characters, one polite and one who posted rude stuff, including one post which IMHO would have resulted in a police visit (to us), but in hindsight he would have been detectable simply on cookies. I think cookies are a powerful enough method, and throw in the IP for good measure.

Regarding the user’s post count, it was something I recall seeing when testing the sample installation. If it is not shown, I apologise!

Would a CSS change really suppress it? I ask that because I recall one forum I used to visit years ago (another technology one) which had a “print thread” feature (actually printing threads is where most forums, along with most other websites, fail miserably, but I suppose not many people print nowadays… although one may want to print to a PDF) and when you used that feature, every user’s email address would appear for about 100ms :slight_smile: Someone malicious could have written a spider which grabbed the stuff, for whatever dodgy purpose.

Had Discourse had proper functionality for moving posts from one thread to another, with links to posts retained (which AFAICS requires the link to be the unique post ID # from the database, or some long hash of something) we would have probably bitten the bullet and gone ahead with the move.

There are only about ten kinds of iPhone, and they are all identical. Thus, no useful device fingerprinting.

As for timing, the browsers clamp you to about 2ms. That’s still plenty of accuracy for user behavior fingerprinting; you need microsecond-accurate timing to do Spectre.

Let’s not be fallacious here. There are two types of sockpuppets you know about; the ones that do the tech solid, but flub the character, and the ones who do the character solid, but flub the tech. You don’t know about the ones that do both correctly. They’re not “trolls”, per se, since the whole point of trolling is to post flame bait, so you will eventually identify the tree by its fruit. But astroturfing, things like fake product reviews and subtler SEO spam, those are the worse part.

No, obviously CSS won’t prevent the post count from showing up in the API and the DOM. But there’s really no point in trying to hide the post count; unlike the email address, which you actually might have a prayer of keeping secret, the post count can by definition be mined by Mallory just scraping your entire forum and counting the posts herself.

「いいね!」 4

Reminder that this is not possible on Discourse, as new users cannot PM until they have reached trust level 1.

We have global rate limits on topic creation and PMs, there is also a limit on how long PMs can get. But you are right, there should be good visibility into anomalous PM patterns by TL1 users.

「いいね!」 4

You don’t know about the ones that do both correctly.

That’s true, but do you care? Anyone who can pull off both is quite clever and by definition making a good contribution to the forum :slight_smile:

Most forums would be happy to have Vlad the Impaler posting, provided he was polite and on-topic, etc :slight_smile:

「いいね!」 1

Side note, highlight text, click quote.

「いいね!」 4