Re-purposing a Discourse installation for a yearly event

I’m using Discourse for a yearly event (a preparatory course at a university), both for internal use for the organizers and for discussion among participants. This year’s event has finished, and once usage has dropped below significance, I’d be okay with shutting the installation down.

Next year, however, I want to have another installation ready, and the old topics available as an archive to the organizers. This leaves multiple options, and I’m not sure what is best:


Archive HTML, New Installation

I could simply drop the current installation and bootstrap a new install. After copying over a few site settings, it would be mostly ready! However, I need an archive of the old posts. I could try to download the HTML view of all old posts, and offer them as an archive to the new team of organizers.

This should be relatively easy, but the archive is hard to search, and you cannot link to old posts.

Move Installation, New Installation

I could also try to move my current installation somewhere else (a different subdomain), and leave it up and running! This would serve as a nice searchable archive where I can link to old topics, but this needs server resources and updates. I’d also need to create accounts for next year’s organizers in the old installation so they have the right permissions.

I don’t think this approach really scales over multiple years.

Category Permissions

I could also put this year’s participants into a large group, and use category permissions to restrict access to old topics to the corresponding participants.

Since I don’t want this year’s participants to see next year’s posts, I would also need to put next year’s participants into their own group – which can’t simply be done in the SSO payload.

I also cannot move all current categories under a new category 2015 since Discourse doesn’t support sub-subcategories.

Deleting all Topics

Another idea would be to leave the forum as-is, but clean it up:

  • delete all topics
  • prevent SSO for this year’s participants
  • disable all user accounts (so no-one gets mail notifications until he signs in again)

The resulting Discourse installation would look mostly empty, and could be used by next year’s organizers and participants without the old users noticing. However, using the ?status=deleted query string, the organizers (all with moderator permissions) could browse the archive.

I don’t see a way to allow searching these deletes posts.

Hiding all Topics

Similarly, I could proceed as before, but hide all topics instead of deleting them. For the organizers, this would mean that old topics are still easily available and searchable, only distinguished by the “hidden” icon.

Category Archiving

There’s a discussion on how category archiving should work. This might work once it is implemented, but I think it is not really tailored towards this situation.


I’m interested in opinions on how to proceed. Which method is the best in this case? Did I overlook a better way?

7 Likes

Use groups, one for each year. And api to manage groups.

We definitely need a better archiving strategy but I am not sure what that is at the moment.

3 Likes

Could you move the topics to an ‘archive’ category that is organized with tags?

5 Likes

This is my favorite idea so far.

As you get deeper into doing this update the topic; we may want to convert some of the manual steps you take into automatic steps that Discourse can take on your behalf. Once we know what they are…

5 Likes

On a SSO-enabled site, if I log out a user, prevent him from signing in again via SSO, and deactivate his account, is there any way for him to interact with Discourse anymore? He should not be ably to reply by mail, should not get any new mails and be unable to re-validate his address because he cannot login, right?

I am trying to kick out users that took a course last year, while ensuring that they can login again if they sign up for a new course (which grants them SSO login once again).

I think so can you verify the above words @sam? I wonder if we need a better process here, somehow.

Yes I am pretty sure that if you deactivate they will have to re-activate prior to any interaction.

Be sure to test it out though.

1 Like

I’m not really sure how a better process could work. A perfect solution for me would have been a button that logs out and disables all non-staff-accounts until they log in again, without forcing them to re-validate their address if they do. But that seems extremely domain-specific…

I’m very interested in this yearly event use case, e.g. students in a class every year. I think it’s common and very useful!

3 Likes

Sounds great! So let me dream a bit:

  1. Add a group containing all non-staff users.
  2. Implement a new user state, let’s call it “expired” for now: Expiring a user forces a logout. Expired users cannot interact via email. Logging in once again reinstates the account, without re-validating the address.
  3. Add a way to bulk-execute admin actions on all accounts in a group (or all accounts matching an SQL query), like logout, expire, suspend, …

That’s a lot of feature requests ;-), but would help with this use case and hopefully be general enough to also help in other cases.

2 Likes

Well, also I feel the year’s worth of “class” or “event” content should be archived somehow, too, which might play into the archiving tool @neil worked on.

You know, every year there is a Maker’s Faire or XOXO or Enterprise JavaBeans Pro Mega Hyper conference or whatever, and that year (2014, 2015, 2016, and on…) is unique and topical for that particular set of speakers, content, presentations etc

2 Likes

There is an interesting distinction here: In some of these recurring-event-cases, the old topic should remain visible (but clearly marked as “old” to not cause confusion); in other cases, they should be restricted to staff (e.g. because the topics contain discussion of solutions to exercises) and also be marked as old.

3 Likes

It’s time to update this topic! I just finished the process of archiving all topics. I went with this suggestion:

Of course, the devil lies in the details, so here are all the details, for anyone interested. Note that I’m not claiming this is the perfect way to do it (I know it’s not), I just want to document this as a starting point for others. Prepare for a wild ride :wink:


Locking out all users

The first step is to prevent all those pesky users from getting in again. I did this by configuring my SSO provider to only allow system admins to log in. How to do this depends on the SSO provider. Either way, it’s good to leave yourself a way to log in :wink:

Deactivated users also cannot reply by mail :thumbsup:

The next step is to disable and lock out all users. Unfortunately, admins and moderators cannot be disabled (Why?), so this also means revoking all access right. Oh, and revoking nonexistent access rights fails with a 403 error (Why?), so this should be fault-tolerant.

Of course, doing this for hundreds of users isn’t fun, so let’s talk to the API via some JavaScript:

var request = require('request');
var q = require('q');
var _ = require('lodash');

var processRequest = function (requestData, allowForbidden) {
    var deferred = q.defer();

    var req = request(requestData.request, function (error, response, body) {
        if (error) {
            console.log("Got error: " + error.message);
            deferred.reject(error.message);
        } else {
            var status = response.statusCode;

            if (status === 200 || (status === 403 && allowForbidden)) {
                deferred.resolve(body);
                process.stdout.write(".");
            } else if (status === 429 || status === 502) {
                // rate limited
                deferred.resolve(q.delay(Math.random() * 10000).then(() => {
                    return processRequest(requestData, allowForbidden);
                }));
            } else {
                console.log(`Got error ${response.statusCode} from server when requesting ${requestData.request.uri}: ${body}`);
                deferred.reject(response.statusCode);
            }
        }
    });

    return deferred.promise;
};

var buildRequestBuilder = function (domain, apiKey) {
    return (method, fragment, headers, body) => ({
        request: {
            port: 443,
            uri: `https://${domain}/${fragment}?api_key=${apiKey}&api_username=system`,
            method,
            headers: headers || {},
            body: body
        }
    });
};

var requestBuilder = buildRequestBuilder('example.com', '■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■');

var logOutUser = function (id) {
    return processRequest(requestBuilder('POST', `admin/users/${id}/log_out`));
};

var deactivateUser = function (id) {
    if (id < 0) return; // don't mess with system

    return logOutUser(id).then(() => {
        processRequest(requestBuilder('PUT', `admin/users/${id}/revoke_admin`), true).then(() => {
            processRequest(requestBuilder('PUT', `admin/users/${id}/revoke_moderation`), true).then(() => {
                processRequest(requestBuilder('PUT', `admin/users/${id}/deactivate`));
            });
        });
    });
};

This code includes a retry if a request is rejected by rate limiting.

Now, we have a function the can log out, revoke admin, revoke moderation and disable a user, given the user id. Where do we get the IDs? That’s a case for the Data Explorer:

SELECT id
FROM users

The results can be downloaded as JSON (did I say that I :heart: the Data Explorer plugin?), which I just bind to a variable named input. Now, we can work with that:

_.each(input.rows, (row) => {
    var userId = row[0];
    deactivateUser(userId);
});

The result is a program that writes a lot of dots to the console, and logs out and disables all users. Hooray! Oh, it also logs out you. Did I say you should keep a way for you to log in? :wink:

Preparing the archive

Create one category for the archive. Allow no-one to write to it, and only your staff to read it. If you want different permissions for some topics, use sub-categories. Also, enable tagging if it hasn’t been enabled yet. You’ll probably want to lock it down so no-one can apply the tags you want to use for archiving. Easy-peasy!

Tagging and moving topics?

Next, we need to move topics and tag them. Brace for more JavaScript:

var addTagsToTopic = function (topicId, newTags) {
    return processRequest(requestBuilder('GET', `t/${topicId}.json`)).then((topicJson) => {
        var topic = JSON.parse(topicJson);
        var slug = topic.slug;
        var tags = topic.tags || [];
        tags = _.uniq(_.concat(tags, newTags));

        var tagsEncoded = '';
        _.each(tags, (tag) => {
            if (tagsEncoded) tagsEncoded += '&';
            tagsEncoded += 'tags%5B%5D=' + encodeURIComponent(tag);
        });

        return processRequest(requestBuilder('PUT', `t/${slug}/${topicId}`, { 'Content-type': 'application/x-www-form-urlencoded' }, tagsEncoded));
    });
};

var moveTopicToCategory = function (topicId, categoryId) {
    return processRequest(requestBuilder('GET', `t/${topicId}.json`)).then((topicJson) => {
        var topic = JSON.parse(topicJson);
        var slug = topic.slug;

        return processRequest(requestBuilder('PUT', `t/${slug}/${topicId}`, { 'Content-type': 'application/x-www-form-urlencoded' }, `category_id=${categoryId}`));
    });
};

Both actions require the slug (Why?), so this code retrieves the slug via the API before continuing. Yuck!

Next up, let’s get the topic IDs for a category. Data Explorer (:heart:) to the resuce:

-- [params]
-- string :category = 

SELECT id
FROM topics
WHERE archetype = 'regular'
    AND category_id = (
        SELECT id
        FROM categories
        WHERE name = :category
    )

I hope you don’t have two categories with the same name like we do ¯\_(ツ)_/¯

So let’s grab the category ID where the topics should go (hint: Go to the category page and add .json to the URL), think of some tag names, and get moving!

_.each(input.rows, (row) => {
    var topicId = row[0];
    addTagsToTopic(topicId, ["archiv-2015", "archiv-intern"]).then(() => {
        return moveTopicToCategory(topicId, 17);
    });
});

This will likely throw some errors, because deleted topics apparently either cannot be moved or cannot be tagged. Why, oh why? :crying_cat_face:
I decided to ignore that (for now), and just watch the error messages scroll through.

Rinse and repeat with all your categories! Or write more code to automate requesting the IDs and build a tag name. I decided to do that part manually, so you’re on your own. :blush:

Clean up

All of this mostly worked, but some cleanup was needed.

First of all, some topics were already about organizing the next event, so I manually un-tagged them and moved them back.

Also, category description topics are special: Trying to move them fails silently (Why?). I just un-tagged them manually.

Results

Now, the result is a pristine forum with a staff-readable, searchable archive.
After re-enabling SSO (with the new event), old staff users that stayed with you can log in via SSO again, which also gives them back moderator privileges. You’ll also find that they cannot access the archive, start to be infuriated when you don’t understand why, then dig up an old bug report by you :wink:


I hope this helps others in a similar situation. If you have any questions, feel free to ask.

Maybe the :discourse: team can help reducing the number of occurrences of Why? and Why, oh why? above :wink:

6 Likes

This procedure has the side effect of preventing incoming mails from previous users.

Maybe staging the users instead of inactivating them would work, as staged users should no longer get Digests and the like; and there should be no further activity on their topics (as they are moved to the archive).

1 Like

Another year has passed, and we followed the same procedure. This time, we used actual Ruby scripts instead of my crazy API-based approach – maybe this is useful to someone else:

Log out and deactivate (almost) all users:

protected_users = ["system", "codinghorror"] # do not process these users

# get all users that should be logged out
affected_users = User.all.select { |u| !(protected_users.include? u.username) }

affected_users.each { |u|
        u.admin = false
        u.moderator = false
        u.active = false
        u.save!
        u.user_auth_tokens.destroy_all
        u.logged_out
};

Move all topics to an archive category and tag them appropriately:

protected_topics = [] # topics that will be ignored

# create a tag by running: Tag.create(name: "archive-2016")
year_tag = 19 # annotate all topics with this tag (represents the year)

categories = { # all categories that should be processed
    1 => {                 # move everything from category 1
        "target" => 2,     # to category 2
        "tags" => [10, 11] # and add tags 10 and 11
    }
    # define additional entries as needed
}

categories.each { |id, data|
    c = Category.find(id)
    topics = c.topics.select { |t| t.id != c.topic_id } # get non-description topics

    topics.each { |t|
        if !(protected_topics.include? t.id)
        tags = t.tag_ids
        tags.push(year_tag)
        data["tags"].each { |tag| tags.push(tag) }
        t.tag_ids = tags
        t.category_id = data["target"]
        t.save!
        end
    }
}


# update all topic counts

Category.all.each { |c|
    c.topic_count = c.topics.length - 1 # -1 for about post
    c.save!
}
6 Likes

As a note to anyone using this (…which includes myself :wink:), @zogstrip updated the topic count recalculation code here. He probably has hood reasons to do so, so this might deserve an update :slight_smile:

The Category.all.each :arrow_forward: Category.find_each is for performance reasons. The former loads all categories in memory while the latrer will do it in batches.

The c.topics.length :arrow_forward: Topic.where(category: c).count is probably the same but is more explicit. I like explicit :wink:

3 Likes