So, further developments. This tool:
https://github.com/IgnoredAmbience/yahoo-group-archiver
seems to work quite nicely in bulk-downloading the contents of a group–it gets all the messages, files, attachments, etc. The messages are downloaded in two .json files each, one “raw” and the other HTML. The first looks like:
{
"userId": 185744666,
"authorName": "vhsproducts@aol.com",
"from": "vhsproducts@...",
"profile": "vhsproducts",
"replyTo": "LIST",
"senderId": "fc-T6L4xNaFRDleu_7gutRzgA_WWujKXanij68LOf7iz0WXh-BolDsmiqlo19adwRPTjwe0FpCYycg",
"spamInfo": {
"isSpam": false,
"reason": "0"
},
"subject": "Re: [MicroTrak] Mint-Trak300 completed",
"postDate": "1181013131",
"msgId": 4,
"canDelete": false,
"contentTrasformed": false,
"systemMessage": false,
"headers": {
"messageIdInHeader": "PGM3ZC5lNWZlOTFjLjMzOTYyZThiQGFvbC5jb20+"
},
"prevInTopic": 3,
"nextInTopic": 6,
"prevInTime": 3,
"nextInTime": 5,
"topicId": 3,
"numMessagesInTopic": 4,
"msgSnippet": "Outstanding work! I see you have the first gen of the Micro-Trak ( although we still sell them for people with TT3 SMT s) How long will a 9 volt run your GPS? ",
"rawEmail": "Return-Path: <VHSProducts@...>\r\nX-Sender: VHSProducts@...\r\nX-Apparently-To: MicroTrak@yahoogroups.com\r\nReceived: (qmail 18487 invoked from network); 5 Jun 2007 03:13:19 -0000\r\nReceived: from unknown (66.218.67.36)\n by m50.grp.scd.yahoo.com with QMQP; 5 Jun 2007 03:13:19 -0000\r\nReceived: from unknown (HELO imo-m23.mx.aol.com) (64.12.137.4)\n by mta10.grp.scd.yahoo.com with SMTP; 5 Jun 2007 03:13:19 -0000\r\nReceived: from VHSProducts@...\n\tby imo-m23.mx.aol.com (mail_out_v38_r9.2.) id r.c7d.e5fe91c (29679)\n\t for <MicroTrak@yahoogroups.com>; Mon, 4 Jun 2007 23:12:11 -0400 (EDT)\r\nMessage-ID: <c7d.e5fe91c.33962e8b@...>\r\nDate: Mon, 4 Jun 2007 23:12:11 EDT\r\nTo: MicroTrak@yahoogroups.com\r\nMIME-Version: 1.0\r\nContent-Type: multipart/alternative; boundary="-----------------------------1181013131"\r\nX-Mailer: 9.0 Security Edition for Windows sub 5365\r\n(snip)"
}
…while the latter looks like:
{
"userId": 185744666,
"authorName": "vhsproducts@aol.com",
"from": "vhsproducts@...",
"profile": "vhsproducts",
"replyTo": "LIST",
"senderId": "oChpSVZSELyeHvFRyDX_nG5dfpdVZTLBKFMDvOg33fSsrDk5l-zpPohl42rhz6OhM9tFfSjAxxGsRg",
"spamInfo": {
"isSpam": false,
"reason": "0"
},
"subject": "Re: [MicroTrak] Mint-Trak300 completed",
"postDate": "1181013131",
"msgId": 4,
"canDelete": false,
"contentTrasformed": false,
"systemMessage": false,
"headers": {
"messageIdInHeader": "PGM3ZC5lNWZlOTFjLjMzOTYyZThiQGFvbC5jb20+"
},
"prevInTopic": 3,
"nextInTopic": 6,
"prevInTime": 3,
"nextInTime": 5,
"topicId": 3,
"numMessagesInTopic": 4,
"msgSnippet": "Outstanding work! I see you have the first gen of the Micro-Trak ( although we still sell them for people with TT3 SMT s) How long will a 9 volt run your GPS? ",
"messageBody": "<div id=\"ygrps-yiv-810547383\">\n<html><head>\n \n</head> \n\n<font id=\"ygrps-yiv-810547383role_document\"\n face=\"Arial\" color=\"#000000\" size=\"2\">\n<div>Outstanding work! I see you have the first gen of the Micro-Trak ( although \nwe still sell them for people with TT3 SMT's) How long will a 9 volt run your \nGPS?</div>\n(snip)",
"specialLinks": []
}
Depending on the group, there can be tens or even hundreds of thousands of these files. Yahoo, being Yahoo, masks the email addresses from “normal” users–group owners can see them, and maybe moderators, but the rest can’t. Now to see if there’s a relatively-straightforward way to bulk-import these into a Discourse instance, or if it’d be better to use the tools I mentioned above.
Files and Photos are also downloaded by this tool, along with polls, calendars, and other stuff that I don’t really care about but no doubt others would.
One other point–a more careful reading of Yahoo’s message indicates that not only are they getting rid of files and photos, they’re also doing away with message archives. That’s really going to make them useless for any purpose.