Migrare una mailing list su Discourse (mbox, Listserv, Google Groups, ecc)

Discourse · 4 Febbraio 2018, 9:08pm

This guide is for you if you want to migrate a mailing list to Discourse.
It also contains instructions for importing messages from Google Groups.

1. Importing using Docker container

This is the recommended way for importing content from your mailing lists into Discourse.

1.1. Installing Discourse

The import script most likely won’t work on systems with less than 4GB of RAM. Recommended are 8GB of RAM or more. You can scale back the RAM usage after the import if you like.

Install Discourse by following the official installation guide. Afterwards it’s a good idea to go to the Admin section and configure a few settings:

Enable login_required if imported topics shouldn’t be visible to the public
Enable hide_user_profiles_from_public if user profiles shouldn’t be visible to the public.
Disable download_remote_images_to_local if you don’t want Discourse to download images embedded in posts.
Enable disable_edit_notifications if you enabled download_remote_images_to_local and don’t want your users to get lots of notifications about posts edited by the system user.
Change the value of slug_generation_method if most of the topic titles use characters which shouldn’t be mapped to ASCII (e.g. Arabic). See this post for more information.

The following steps assume that you installed Discourse on Ubuntu and that you are connected to the machine via SSH or have direct access to the machine’s terminal.

1.2. Preparing the Docker container

Copy the container configuration file app.yml to import.yml and edit it with your favorite editor.

cd /var/discourse
cp containers/app.yml containers/import.yml
nano containers/import.yml

Regular import

Add - "templates/import/mbox.template.yml" to the list of templates. Afterwards it should look something like this:

templates:
  - "templates/postgres.template.yml"
  - "templates/redis.template.yml"
  - "templates/web.template.yml"
  - "templates/web.ratelimited.template.yml"
## Uncomment these two lines if you wish to add Lets Encrypt (https)
  #- "templates/web.ssl.template.yml"
  #- "templates/web.letsencrypt.ssl.template.yml"
  - "templates/import/mbox.template.yml"

That’s it. You can save the file, close the editor and build the container.

Google Groups import

You need to add two entries to the list of templates:

  - "templates/import/chrome-dep.template.yml"
  - "templates/import/mbox.template.yml"

Afterwards it should look something like this:

templates:
  - "templates/postgres.template.yml"
  - "templates/redis.template.yml"
  - "templates/web.template.yml"
  - "templates/web.ratelimited.template.yml"
## Uncomment these two lines if you wish to add Lets Encrypt (https)
  #- "templates/web.ssl.template.yml"
  #- "templates/web.letsencrypt.ssl.template.yml"
  - "templates/import/chrome-dep.template.yml"
  - "templates/import/mbox.template.yml"

That’s it. You can save the file, close the editor and build the container.

/var/discourse/launcher stop app
/var/discourse/launcher rebuild import

Building the container creates an import directory within the container’s shared directory. It looks like this:

/var/discourse/shared/standalone/import
├── data
└── settings.yml

1.3. Downloading messages from Google Groups (optional)

You can skip this step unless you want to migrate from Google Groups.

Instructions for Google Groups

1.3.1. Preparation

Make sure you don’t have any pinned posts in your group, otherwise the crawler might fail to download some or all messages.

Make sure the group settings allow posting, otherwise you might see “Failed to scrape message” error messages. It might take a couple of minutes before the scraping works when you changed those settings recently.

image718×232 42.8 KB

Google account: You need a Google account that has the Manager or Owner role for your Google Group, otherwise the downloaded messages will contain censored email addresses.

Group name: You can find the group name by visiting your Google Group and looking at the browser’s address bar.

Domain name: The URL might look a little bit differently if you are a G Suite customer. You need to know the domain name if the URL contains something like example.com.

1.3.2 Cookies

In order to download messages, the crawler needs to have access to a Google account that has the owner role for your group. Please visit https://myaccount.google.com/ in your browser and sign in if you aren’t already logged in. Then use a browser extension of your choice to export your cookies for google.com in a file named cookies.txt.

The recommended browser extensions is Export Cookies for Mozilla Firefox.

Upload the cookies.txt file to your server and save it within the /var/discourse/shared/standalone/import directory.

1.3.3. Download messages

Tip: It’s a good idea to download messages inside a tmux or screen session, so that you can reconnect to the session in case of SSH connection loss.

Let’s start by entering the Docker container.

/var/discourse/launcher enter import

Replace the <group_name> (and if applicable, the <domain_name>) placeholders within the following command with the group name and domain name from step 1.3.1 and execute it inside the Docker container in order to start the download of messages.

If you didn’t find a domain name in step 1.3.1, this is the command for you:

script/import_scripts/google_groups.rb -g <group_name>

Or, if you found a domain name in step 1.3.1, use this command instead:

script/import_scripts/google_groups.rb -g <group_name> -d <domain_name>

Downloading all messages can take a long time. It mostly depends on the number of topics in your Google Group. The script will show you a message like this when it’s finished: Done (00h 26min 52sec)

Tip: You can abort the download anytime you want by pressing Ctrl+C
When you restart the download it will continue where it left off.

1.4. Configuring the importer

You can configure the importer by editing the example settings.yml file that has been copied into the import directory.

nano /var/discourse/shared/standalone/import/settings.yml

The settings file comes with sensible defaults, but here are a few tips anyway:

The settings file contains multiple examples on how to split data files:
- mbox files usually are separated by a From header. Choose a regular expression that works for your files.
- If each of your files contains only one message, set the split_regex to an empty string. This also applies to imports from Google Groups.
- There’s also an example for files from the popular Listserv mailing list software.
prefer_html allows you to configure if the import should use the HTML part of emails when it exists. You should choose what suits you best – it heavily depends on the emails sent to your mailing list.
By default each user imported from the mailing list is created as staged user. You can disable that behaviour by setting staged to false.
If your emails do not contain a Message-ID header (like messages stored by Listserv), you should enable the group_messages_by_subject setting.

1.5. Prepare files

Each subdirectory of /var/discourse/shared/standalone/import/data gets imported as its own category and each directory should contain the data files you want to import. The file names of those do not matter.

Example: The import directory should look like this if you want to import two mailing lists with multiple mbox files:

/var/discourse/shared/standalone/import
├── data
│   ├── list 1
│   │   ├── foo
│   │   ├── bar
│   ├── list 2
│   │   ├── 2017-12.mbox
│   │   ├── 2018-01.mbox
└── settings.yml

1.6. Executing the import script

Tip: It’s a good idea to start the import inside a tmux or screen session, so that you can reconnect to the session in case of SSH connection loss.

Let’s start the import by entering the Docker container and launching the import script inside the Docker container.

/var/discourse/launcher enter import
import_mbox.sh # inside the Docker container

Depending on the size of your mailing lists it’s now time for some or
The import script will show you a message like this when it’s finished: Done (00h 26min 52sec)

Tip: You can abort the import anytime you want by pressing Ctrl+C
When you restart the import it will continue where it left off.

You can exit and stop the Docker container after the import has finished.

exit # inside the Docker container
/var/discourse/launcher stop import

1.7. Starting Discourse

Let’s start the app container and take a look at the imported data.

/var/discourse/launcher start app

Discourse will start and Sidekiq will begin post-processing all the imported posts. This can take a considerate amount of time. You can watch the progress by logging in as admin and visiting http://discourse.example.com/sidekiq

1.8. Clean up

So, you are satisfied with the result of the import and want to free some disk space? The following commands will delete the Docker container used for importing as well as all the files used during the import.

/var/discourse/launcher destroy import
rm /var/discourse/containers/import.yml
rm -R /var/discourse/shared/standalone/import

1.9. The End

Now it’s time to celebrate and enjoy your new Discourse instance!

2. FAQ

2.1. How can I remove list names (e.g. `[Foo]`) from topic titles during the import?

You can use an empty tag to remove one or more prefixes from topic titles. The settings file contains an example.

2.2 How can I prevent the import script from detecting messages as already being imported?

The following steps will reset your Discourse forum to the initial state! You will need to start from scratch.

The following commands will stop the container, delete everything except the mbox files and the importer configuration and restart the container.

Commands

cd /var/discourse

./launcher stop app
./launcher stop import

rm -r ./shared/standalone/!(import)
rm ./shared/standalone/import/data/index.db

./launcher rebuild import

./launcher enter import
import_mbox.sh # inside the Docker container

2.3 How can I manipulate messages before they are imported into Discourse?

Enable index_only in settings.yml and take a look at the index.db (a SQLite database) before you run the actual import.

You can use SQL to update missing values in the database if you want. That way you don’t need to reindex any messages. The script uses only data from the index.db during the import phase. Simply disable the index_only option when you are done and rerun the importer. It will skip the indexing if none of the mbox files were changed, recalculate the content of the user and email_order tables and start the actual import process.

2.4 How can I find messages which cause problems during the import?

You can split mbox files into individual files to make it easier to find offending emails.

Commands

apt install procmail;
export FILENO=0000;
formail -ds sh -c 'cat &gt; split/msg.$FILENO' < mbox;

2.5 I have already imported a group. How can I import another group?

Create a new directory in the import/data directory and restart the import script.

2.6 I don’t have access to Mailman archives in mbox format? Is there any other way to get them?

You could give this script a try.

Last edited by @JammyDodger 2024-05-27T14:56:11Z

Check document
Perform check on document:

ravenzachary · 6 Dicembre 2019, 8:26pm

@gerhard - Sono riuscito a migrare un archivio mbox di 22.000 messaggi utilizzando questo script su un droplet Digital Ocean con soli 1 GB di RAM. Nessun problema. Grazie per la guida alle istruzioni. Tutto ha funzionato perfettamente. L’unico errore che ho commesso nel primo tentativo è stato provare a denominare la sottocartella /var/discourse/shared/standalone/import/data/X con una nuova categoria che avevo creato prima di eseguire lo script. Questo ha causato l’inserimento di questi messaggi nella categoria Non classificati. Nel secondo tentativo, ho eliminato la nuova categoria e ho riprovato. In questo modo, lo script ha creato il nome della categoria per me e ha inserito automaticamente i messaggi nella categoria corretta.

Jeremias_Volker · 13 Marzo 2020, 9:22am

Grazie per questa guida.

Sto tentando di eseguire un’importazione da Google Groups. Purtroppo, quando eseguo import_mbox.sh, ricevo questo errore:

L'importazione mbox sta iniziando...

Traceback (most recent call last):
5: from script/import_scripts/mbox.rb:9:in `<main>'
4: from script/import_scripts/mbox.rb:10:in `<module:ImportScripts>'
3: from script/import_scripts/mbox.rb:13:in `<module:Mbox>'
2: from /var/www/discourse/script/import_scripts/mbox/support/settings.rb:9:in `load'
1: from /var/www/discourse/script/import_scripts/mbox/support/settings.rb:9:in `new'

/var/www/discourse/script/import_scripts/mbox/support/settings.rb:42:in `initialize': undefined method `each' for nil:NilClass (NoMethodError)

Tutti i file in /var/discourse/shared/standalone/import/data/Foo sono file .eml, non mbox. Questo potrebbe essere un problema?

Grazie!

gerhard · 18 Marzo 2020, 4:33pm

L’ultima versione dello script di importazione risolve quel problema. In alternativa, aggiorna il tuo file delle impostazioni. Ci sono stati alcuni recenti aggiornamenti.

Jeremias_Volker · 19 Marzo 2020, 9:31am

Grazie mille. Potresti per favore dare qualche consiglio su come aggiornare lo script di importazione?

È sufficiente aggiornare solo gli script di importazione o devo rifare altri passaggi della guida (quali?)? Non riesco a trovarli e quindi non so come aggiornarli.

Ho aggiornato il file delle impostazioni come hai menzionato, considerandolo un’alternativa, ma sto riscontrando gli stessi problemi.

Grazie.

gerhard · 19 Marzo 2020, 12:39pm

Puoi eseguire /var/discourse/launcher rebuild import per aggiornare lo script di importazione e tutto il resto ad esso correlato.

Jeremias_Volker · 19 Marzo 2020, 5:18pm

Grazie.

Durante l’esecuzione di import_mbox.sh, quasi tutti i messaggi vengono saltati con messaggi del genere:

script/import_scripts/mbox.rb:12:in `<module:Mbox>'

script/import_scripts/mbox.rb:10:in `<module:ImportScripts>'

script/import_scripts/mbox.rb:9:in `<main>'

41 / 215 ( 19.1%) [59096 elementi/min] Impossibile mappare il post per 36a37072-e5b6-4009-878f-f0824e40eac6@googlegroups.com

metodo `each` non definito per nil:NilClass

/var/www/discourse/script/import_scripts/mbox/importer.rb:179:in `block in remove_tags!'

/var/www/discourse/script/import_scripts/mbox/importer.rb:176:in `loop'

/var/www/discourse/script/import_scripts/mbox/importer.rb:176:in `remove_tags!'

/var/www/discourse/script/import_scripts/mbox/importer.rb:150:in `map_first_post'

/var/www/discourse/script/import_scripts/mbox/importer.rb:104:in `block (2 levels) in import_posts'

/var/www/discourse/script/import_scripts/base.rb:503:in `block in create_posts'

/var/www/discourse/script/import_scripts/base.rb:502:in `each'

/var/www/discourse/script/import_scripts/base.rb:502:in `create_posts'

/var/www/discourse/script/import_scripts/mbox/importer.rb:98:in `block in import_posts'

/var/www/discourse/script/import_scripts/base.rb:882:in `block in batches'

/var/www/discourse/script/import_scripts/base.rb:881:in `loop'

/var/www/discourse/script/import_scripts/base.rb:881:in `batches'

/var/www/discourse/script/import_scripts/mbox/importer.rb:84:in `batches'

/var/www/discourse/script/import_scripts/mbox/importer.rb:92:in `import_posts'

/var/www/discourse/script/import_scripts/mbox/importer.rb:36:in `execute'

/var/www/discourse/script/import_scripts/base.rb:47:in `perform'

E più avanti:

60 / 215 ( 27.9%) [58321 elementi/min] Il messaggio padre 1b46f337-95a3-4b4a-a14a-689636941580@googlegroups.com non esiste. Si salta 5634208e-e6df-4bd8-b361-0735f73fe554@googlegroups.com:

Qual potrebbe essere la causa? Grazie.

gerhard · 26 Marzo 2020, 4:24pm

Il problema dovrebbe essere risolto. Si prega di ricostruire il contenitore di importazione un’ultima volta.

Jeremias_Volker · 26 Marzo 2020, 5:13pm

Fantastico, ha funzionato alla perfezione. Grazie mille per il tuo supporto.

pfaffman · 8 Aprile 2020, 9:25pm

Sto cercando di scaricare Google Groups e sto ricevendo

Accesso non riuscito. Controlla il contenuto del tuo cookies.txt

Ho utilizzato l’estensione consigliata per Firefox per scaricare i cookie. L’ho fatto ieri e di nuovo oggi. Ho confermato che sta leggendo il file rinominandolo in qualcosa di errato e ottenendo un errore “non trovato”. Ho scaricato tutti i cookie, non solo quelli di Google. Mi sono disconnesso e ricollegato, quindi ho scaricato nuovamente i cookie.

Posso vedere che sono un amministratore perché ho le opzioni “gestisci gruppo”.

Ho controllato tre volte di stare usando il nome del gruppo corretto copiando e incollando e verificando che sia nel formato di un nome di gruppo e non di un nome di dominio.

C’è qualcosa che non funziona o è solo un problema mio?

@gerhard, scusa per il richiamo pubblico, ma hai un suggerimento rapido su come eseguire il debug di questo problema? Forse un endpoint di accesso è cambiato?

MODIFICA: L’ho trovato. Invierò una PR a breve. L’endpoint per l’accesso è cambiato e sono riuscito a indovinare quello nuovo.

sturdy2 · 16 Aprile 2020, 8:40pm

Novellino che cerca di importare file mbox da Yahoo Groups. Ho seguito queste istruzioni diverse volte, ma ottengo sempre lo stesso messaggio di errore. Vedo che altri sono riusciti, quindi probabilmente si tratta di un errore da principiante. L’errore sembra indicare che split_regex: "^From .+@.+" non trova la chiave email per dividere il file, ma ho testato l’espressione regolare in un editor di testo e funziona come previsto. La riga 2 del file di importazione è simile a Message-ID: <35690.0.1.959300741@eGroups.com>.

Qualche idea? Grazie in anticipo…

L'importazione mbox sta iniziando...

Traceback (most recent call last):
	12: from script/import_scripts/mbox.rb:9:in `<main>'
	11: from script/import_scripts/mbox.rb:10:in `<module:ImportScripts>'
	10: from script/import_scripts/mbox.rb:12:in `<module:Mbox>'
	 9: from script/import_scripts/mbox.rb:12:in `new'
	 8: from /var/www/discourse/script/import_scripts/mbox/importer.rb:11:in `initialize'
	 7: from /var/www/discourse/script/import_scripts/mbox/support/settings.rb:8:in `load'
	 6: from /usr/local/lib/ruby/2.6.0/psych.rb:577:in `load_file'
	 5: from /usr/local/lib/ruby/2.6.0/psych.rb:577:in `open'
	 4: from /usr/local/lib/ruby/2.6.0/psych.rb:578:in `block in load_file'
	 3: from /usr/local/lib/ruby/2.6.0/psych.rb:277:in `load'
	 2: from /usr/local/lib/ruby/2.6.0/psych.rb:390:in `parse'
	 1: from /usr/local/lib/ruby/2.6.0/psych.rb:456:in `parse_stream'
/usr/local/lib/ruby/2.6.0/psych.rb:456:in `parse': (/shared/import/settings.yml): did not find expected key while parsing a block mapping at line 2 column 1 (Psych::SyntaxError)

gerhard · 17 Aprile 2020, 1:29pm

Sembra che tu abbia commesso un errore nel file settings.yml. Ti suggerisco di convalidare la configurazione su http://www.yamllint.com/

sturdy2 · 17 Aprile 2020, 9:54pm

Grazie @gerhard. Sigh… Avrei dovuto vedere quel problema, è la mia prima esperienza con Ruby. Ora, penso di essere un po’ più vicino alla soluzione, ma c’è un altro errore (vedi sotto). Dato che lo script di importazione sta ora caricando i gruppi, ecc., presumo che il nuovo errore si trovi oltre il problema iniziale. Presumo anche che il file db a cui si fa riferimento sia import/index.db, creato dallo script di importazione (ma non è stato creato).

L'importazione mbox sta iniziando...

Caricamento dei gruppi esistenti...
Caricamento degli utenti esistenti...
Caricamento delle categorie esistenti...
Caricamento dei post esistenti...
Caricamento degli argomenti esistenti...
Traceback (most recent call last):
	9: from script/import_scripts/mbox.rb:9:in `<main>'
	8: from script/import_scripts/mbox.rb:10:in `<module:ImportScripts>'
	7: from script/import_scripts/mbox.rb:12:in `<module:Mbox>'
	6: from script/import_scripts/mbox.rb:12:in `new'
	5: from /var/www/discourse/script/import_scripts/mbox/importer.rb:14:in `initialize'
	4: from /var/www/discourse/script/import_scripts/mbox/importer.rb:14:in `new'
	3: from /var/www/discourse/script/import_scripts/mbox/support/database.rb:10:in `initialize'
	2: from /var/www/discourse/script/import_scripts/mbox/support/database.rb:10:in `new'
	1: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/sqlite3-1.4.2/lib/sqlite3/database.rb:89:in `initialize'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/sqlite3-1.4.2/lib/sqlite3/database.rb:89:in `open_v2': unable to open database file (SQLite3::CantOpenException)

sturdy2 · 18 Aprile 2020, 10:04pm

Il SISTEMA non mi permette di modificare il mio commento, quindi invio questa risposta invece.

MODIFICA: Per chiudere il cerchio… L’importazione del mio gruppo Yahoo ora funziona, almeno fino al punto di indicizzare 9951 email. Non ho ancora completato l’importazione completa, quindi ci saranno aggiornamenti. Ho modificato settings.yml molte volte e ora sono tornato alla versione originale che improvvisamente sembra funzionare! senza l’errore di sintassi. Non capisco perché abbia ricevuto numerosi messaggi di errore che mi sembrano incoerenti. L’errore di sintassi originale in settings.yml è di nuovo un mistero. Il messaggio di errore sopra riportato non ha senso per me… sigh.

deeplow · 5 Settembre 2020, 3:26pm

@gerhard. Credo di aver trovato un metodo molto più semplice per fare esattamente la stessa cosa della tua guida, ma senza richiedere conoscenze tecniche né accesso amministrativo a nessun server. Fammi sapere cosa ne pensi.

Panoramica

Configureremo essenzialmente una mailing list e utilizzeremo un archivio email per inviare le conversazioni passate in ordine. Queste email verranno inoltrate, ma non come il pulsante “Inoltra” dei client di posta (che sovrascriverebbe le intestazioni e rovinerebbe l’indentazione). Quello che vogliamo fare è reinviarle (inviandole come se fossero state inviate a Discourse in origine).

Requisiti e presupposti

Accesso alle precedenti scambi di email: qualcuno che li abbia archiviati nel proprio client di posta e possa offrirsi di inoltrarli – chiamiamo questa persona Mario Rossi.
Tempo: l’inoltro delle email sarà molto lento, ma gestibile da Discourse (forse alcuni giorni con un computer attivo che carica le email, a seconda delle dimensioni dell’archivio).
Client Thunderbird: Assumiamo inoltre che Mario Rossi utilizzi il client di posta “Thunderbird”. Potrebbe essere possibile farlo con altri client, ma non ho verificato.

La seguente guida utilizza due indirizzi email come segnaposto. Devi sostituirli con i tuoi indirizzi reali.

johndoe@example.com L’email di Mario Rossi (la persona che inoltrerà l’intero archivio della mailing list)

discourse+mailinglist-3@discoursemail.com L’email di Discourse per inoltrare le email alla categoria della mailing list (vedi configurazione 1. per sapere come ottenerla)

Istruzioni

Ecco una panoramica di base delle istruzioni:

Segui la guida su Mirroring a read-only mailing list in Discourse per creare uno specchio della tua mailing list

Nota: questo specchierà solo la tua mailing list per il futuro. Mancheranno ancora le conversazioni passate. È per questo che serve il resto di questa guida.
Modifica il modo in cui Discourse inoltra le email a (non sono sicuro che questo sia necessario)
Modifica le impostazioni della categoria e sotto l’impostazione Indirizzo email in arrivo personalizzato: aggiungi alla fine di quanto presente |johndoe@example.com.

Il pipe qui funziona come un ,, indicando che vuoi anche che johndoe@example.com possa inviare a quella categoria
Mario Rossi installa su Thunderbird l’estensione Mail Redirect.

Questo perché non si tratta di un normale inoltro email. Ciò che farà è inviare l’email come se fosse stata inviata direttamente all’indirizzo email di Discourse, invece che a quello di Mario Rossi
Mario Rossi va nelle impostazioni dell’estensione e imposta il valore a 1 (il predefinito è 5)

Questo assicurerà che le risposte arrivino in ordine: altrimenti Discourse non è abbastanza veloce da capire che le risposte sono collegate e crea semplicemente un nuovo argomento per ogni risposta – ma renderà il processo di inoltro molto lento
Mario Rossi seleziona tutte le email passate della mailing list, clicca con il tasto destro e clicca su Reindirizza. Si aprirà una nuova finestra e lui aggiungerà discourse+mailinglist-3@discoursemail.com come Rinvia a

Dopo questo, il client di posta di Mario Rossi invierà lentamente gli archivi email a Discourse. Controlla dopo un po’ di tempo per vedere se la categoria di Discourse si sta riempiendo di conversazioni nostalgiche del passato.

Pulizia

Rimuovi l’email di Mario Rossi dall’impostazione Indirizzo email in arrivo personalizzato: di quella categoria (e non dimenticare di rimuovere il |)
Disinstalla l’estensione Mail Redirect – probabilmente non ti servirà più, o almeno aumenta di nuovo le connessioni SMTP a 5.

miri64 · 21 Settembre 2020, 7:57pm

Stiamo cercando di migrare le nostre liste Mailman in un’istanza Discourse già attiva. Sono incluse diverse liste private per le quali è necessario impostare i permessi per la categoria corrispondente. Quando creiamo queste categorie prima dell’importazione, tutti i post delle liste private vengono aggiunti a “Senza categoria” (quindi automaticamente pubblici).

Quindi abbiamo due domande alternative:

È possibile impostare i permessi per le liste di posta importate (se fossero visibili solo agli amministratori, sarebbe già sufficiente per noi) prima dell’importazione?
È possibile aggiungere la lista di posta a una categoria esistente (con permessi preimpostati)?

danb35 · 21 Settembre 2020, 8:18pm

Il mio Discourse è la continuazione di un gruppo Yahoo, che a sua volta era la continuazione di un listserv AOL. Lo scorso autunno, di fronte alla grande pulizia di Yahoo, sono riuscito a scaricare un archivio .mbox del gruppo Yahoo e a importare quei messaggi seguendo queste istruzioni. Ora ho ottenuto un archivio parziale del listserv AOL e vorrei importare anche quei messaggi.

Facile, vero? Basta creare import/data/foo, mettere lì i messaggi ed eseguire lo script di importazione. Ma mi chiedo: se in seguito riesco a ottenere un archivio completo (o più completo), posso semplicemente inserire quei file in import/data/foo, eseguire di nuovo lo script di importazione e far sì che aggiunga i nuovi messaggi alla stessa categoria?

Effettuerà la deduplicazione? Oppure vedrò copie multiple dei messaggi presenti in entrambi gli archivi?
- La risposta a questa domanda cambierebbe se uno, l’altro o entrambi gli archivi mancassero degli header message-id?
Un’importazione nuova nella stessa categoria sovrascriverà i messaggi esistenti?
La maggior parte dei miei utenti è in modalità mailing list. Se non voglio inviar loro spam con centinaia (o migliaia) di notifiche, per non parlare di far salire una costosa fattura di Mailgun, presumo che vorrò disabilitare le email a livello di sito durante l’importazione?

gerhard · 28 Settembre 2020, 6:51pm

Purtroppo non è possibile.

Sì, puoi ingannare lo script di importazione in modo che riutilizzi le categorie esistenti.

./launcher enter app
rails c

# Usa l'ID della categoria mostrato nell'URL, ad esempio
# è 56 quando il percorso della categoria assomiglia a questo: /c/howto/devs/56
category = Category.find(56)

# Usa il nome della directory in cui sono archiviati i file mbox. Ad esempio,
# se i file sono archiviati in import/data/foo, dovresti usare "foo" come nome della directory.
category.custom_fields["import_id"] = "directory_name"
category.save!

Questo è inaspettato. Non ho mai visto accadere una cosa del genere, ma non ho mai provato a importare in categorie esistenti con permessi diversi da quelli predefiniti.

Se non riesci a farlo funzionare, ti consiglio di pubblicare un annuncio sul tuo forum, mettere il tuo sito in modalità sola lettura, creare un backup, ripristinare il backup su un server diverso, eseguire l’importazione, configurare i permessi della categoria, creare un altro backup e ripristinarlo sul tuo sito di produzione.

gerhard · 28 Settembre 2020, 7:07pm

Sì, puoi farlo. Potresti voler mantenere il file import/data/index.db, nel caso volessi consultare i dati precedentemente importati, dovessi modificare gli ID dei messaggi generati o per altre necessità…

Sì, non importerà messaggi già importati finché l’intestazione Message-ID rimane invariata. Se l’intestazione Message-ID manca in solo uno degli archivi, non avrai fortuna. In assenza dell’intestazione, utilizziamo l’hash MD5 del messaggio. Dovrai assicurarti che entrambi i messaggi abbiano o la stessa intestazione Message-ID o generino lo stesso hash MD5.

No.

Tutte le email in uscita vengono disabilitate durante le importazioni.

miri64 · 29 Settembre 2020, 8:41am

Sì, puoi ingannare lo script di importazione per riutilizzare le categorie esistenti.

Ok, in sostanza è quello che abbiamo fatto alla fine (abbiamo usato Category.find_by_name() invece, ma immagino che sia solo una questione di semantica). È bello sapere di aver scelto la strada “corretta” . Grazie!

Argomento		Risposte	Visualizzazioni
Yahoo Groups Importation Errors Migration	6	1428	Dicembre 19, 2019
Migrate a phpBB3 forum to Discourse Migrating to Discourse how-to	464	100420	Settembre 4, 2025
Migration from Yahoo! Groups Migration	23	6704	Novembre 4, 2019
Migrate a XenForo forum to Discourse Sysadmins how-to	90	21051	Febbraio 25, 2025
[bounty] Google+ (private ) communities: export screenscraper + importer Marketplace	99	8538	Aprile 25, 2019