[bounty?] Import questions in bulk


(Stephane Maarek) #1

Hello

I have my questions and responses in a JSON format:

sample JSON document here
{
    "id": "x017c4h221p7T8sboHglB-7kQ==",
    "created": "2018-05-09T20:13:23Z",
    "title": "Docker i/o error",
    "body": "<p>Hey Stephane, I am having to restart docker to run Kafka everytime I exit since it errors out on tcp port binding. My understanding is that when I stop Kafka services and exit, the port is released.</p>",
    "course": {
        "_class": "course",
        "id": "x0190RCkGpZ6FMe4CPAF8aOoQ==",
        "title": "Apache Kafka Series - Learn Apache Kafka for Beginners v2",
        "url": "/apache-kafka/"
    },
    "replies": [{
            "_class": "answer",
            "id": "x01Qx2rNaX48kxP4NFFSSCK7g==",
            "created": "2018-05-10T07:04:10Z",
            "user": {
                "_class": "user",
                "id": "x01N-Fup_OULjTEtHPLwc8JSQ==",
                "name": "Ivan",
                "locale": "en_US"
            },
            "is_top_answer": null,
            "body": "<p>Hi Nandini,\u00a0</p><p>Can you please elaborate in more details what your problem is? If you stop Kafka services, ports used by Kafka\u00a0will be released, of course.</p><p>Regards,</p><p>Ivan</p>",
        },
        {
            "_class": "answer",
            "id": "x01bLG2QPhyLwZ_RJsbMge16A==",
            "created": "2018-05-10T20:45:39Z",
            "user": {
                "_class": "user",
                "id": "x01oX4mwhRQoLXKuhHXDHg3zg==",
                "name": "Nandini",
                "locale": "en_US"
            },
            "is_top_answer": null,
            "body": "<p>I stop kafka services and when i restart, docker ports are not released and I get a TCP binding error on ports 2181 and 3030</p>",
            "is_upvoted": false,
            "num_upvotes": 0
        },
        {
            "_class": "answer",
            "id": "x01yL8D1-inVZE3njAo08-uMw==",
            "created": "2018-05-11T00:32:46Z",
            "user": {
                "_class": "user",
                "id": "x01lNfqEyIqBf47SM76dxq0rw==",
                "name": "Stephane Maarek",
                "locale": "en_US"
            },
            "is_top_answer": true,
            "body": "<p>Restart the docker engine if you can, or your computer. See if that helps !\u00a0</p><p>Otherwise you have something running on port 2181. Please check the FAQ lecture (lecture 22) as a lot of students have been through these issues before </p>",
            "is_upvoted": false,
            "num_upvotes": 0
        }
    ],
    "user": {
        "_class": "user",
        "id": "x01oX4mwhRQoLXKuhHXDHg3zg==",
        "name": "Nandini",
        "locale": "en_US"
    }
}

I have developed a python script to hit the API but I’m getting a bunch of errors on the API related to throttling… :

You’ve performed this action too many times. Please wait X seconds before trying again.

I have to import about 3000 questions total (alongside an average of 2 answers) so I feel the API route might be too long.

Is there a way to disable this API throttling issue?

Is there any other way I can import my data? They’re only linked to two users (so no need to create users). I’m also using a hosted discourse so I don’t know if I have direct access to the underlying DB

Happy to share the Python code I have, or open a bounty if this requires a lot of effort

Thanks!
Stephane


(Jeff Atwood) #2

Make sure the user performing the actions via the API key has staff privileges, even if only temporary.

That will help with some of the rate limits.


(Mittineague) #3

An easy approach I used was to add
sleep(0.7)
inside the Ruby loop. (you may need to tweak that)

For 3000 requests it would take 35 minutes to complete. A bit painful, but for a once off I don’t think it would be all that bad.


(Stephane Maarek) #4

I have admin permissions (using my own API key) and I still get a throttle it seems every 60 API calls. I tried to change things in settings > Rate limits but doesn’t seem to help