Discourse Data Explorer Power BI integration

Discourse Data Explorer Power BI integration

I recently made a cool Power BI function which fetches Data Explorer data directly to Power BI with minimal overhead.

Thus, if you are like me, and you wanted to have the data being automatically pulled and refreshed via the Power BI app, here’s the code of the function you should put in the Advanced Editor in Power BI:

(queryID) => let
        resultCount = 1000,
        otherNameForPage = 0,

        GetPage = (otherNameForPage) =>
            let
                content1 = "params={""page"":""" & Number.ToText(otherNameForPage) & """}",
                RawData = Json.Document(Web.Contents(
                    "https://forumURL/admin/plugins/explorer/queries",
                    [RelativePath=Number.ToText(queryID) & "/run",
                        Query=
                        [
                            params="{""page"":""" & Number.ToText(otherNameForPage) & """}"
                        ],

                        Headers = [ 
                        #"api-username"="yourAPIUsername",
                        #"api-key"="yourAPIKey",
                        #"Content-Type" = "application/x-www-form-urlencoded"],
                        Content = Text.ToBinary(content1)
                    ]
                    
                ) ),
                resultCount = RawData[result_count]
            in
                if RawData[result_count] = 0 then null else RawData,
            Pages = List.Generate(
                () => [i = 0, RawData = GetPage(i)],
                each [RawData] <> null,
                each [i=[i]+1, RawData = GetPage(i)],
                each Table.Combine(let raw = [RawData] in List.Transform(raw[rows], each Table.FromRows({_}, raw[columns])))),
            Output = Table.Combine(Pages)
in
    Output

And here is a sample Data Explorer query. It includes pagination that is required for the function above to work properly (it is required even if your query returns only 1 page):

--[params]
-- integer :page = 0

SELECT count(*) from badges

OFFSET :page * 1000
LIMIT 1000

(you can easily adapt your existing queries by including first and last 2 lines of the above example)

This function will automatically continue to iterate over the pages until no more results are found.

How to get it to work?

After you load the function to Power BI, you will need to set 2 things:

  1. The api-username and api-key (see function above for the placeholders):

                         Headers = [ 
                         #"api-username"="yourAPIUsername",
                         #"api-key"="yourAPIKey",
    
  2. The query ID number.

    To fetch your data, all you have to provide is the ID of the query which is easy to find in the query URL:
    image

    This should look like so in Power BI after you click on the function:
    image

Sample output

This is the output of the sample query:

image

That’s it! You can now shape your data however you want.

11 Likes

You had too much free time on your hands. :laughing:

Thank you tho :slight_smile: very useful.

2 Likes

For others like me who see Power BI and are confused.

5 Likes

Dzięki @loginerror for providing that! I’ll be trying to achieve the same thing in Tableau. Not sure if it has similar advanced editor to what Power BI offers but will see

1 Like

For others like me who do not know about Tableau.

https://www.tableau.com/

Get a full picture of your business, inside and out.
Connect all of your data with critical external sources like public health status, economic indicators, and government policy.


From Wikipedia

Tableau products query relational databases, online analytical processing cubes, cloud databases, and spreadsheets to generate graph-type data visualizations. The products can also extract, store, and retrieve data from an in-memory data engine.

1 Like

hey @loginerror thanks for providing this!

I’m just getting into PowerBI and would be a fun project to work on a Discourse dashboard, and thanks to you I may be able to achieve the initial steps and get faster to the fun part.

But for this I’d like to ask a question…
I’ve tried to run the suggested query using a newly generated API key (to my user name) but can’t make it work for some reason.

I see that there is something going on Discourse - it indicates that the API key has been used but the PowerBI endlessly runs the query and doesn’t provide any data.

I’m not proficient in queries at all, so I may be missing something obvious… Could you please help?

It should simply work when plugged in.

Have you tried running it with the sample query that only returns one value?

--[params]
-- integer :page = 0

SELECT count(*) from badges

OFFSET :page * 1000
LIMIT 1000

Otherwise, if you have a bit more pages (and we are limited to 1000 records per page), it might take longer to load all the data.

I’ve just tested the code by making a quick Power BI project with the code from the original post and it runs just fine (it seems to run just fine on either a System API key or a user one with admin permissions).

4 Likes

Oh, I didn’t realize you answered so fast - thanks!
I may come as a complete noob, and probably that’s because I am. But where do I use this piece of code? Sorry, I’m really new to powerbi…

1 Like

This is a sample Data Explorer query you should save in your Data Explorer plugin. It is designed to only return 1 record, which should test out the function without forcing it to download a lot of data.

Afterwards, grab its ID from the URL and plug it in Power BI function created from the code from the first post:

/admin/plugins/explorer?id=260
1 Like

Ah, I see now! Thanks so much. I was just looking at the things from Power BI point of view… how silly :slight_smile:

I’ll see how it goes from here.

Thanks again, great stuff.

1 Like

Quick update, in case someone wants to improve the script from the first post.

It is actually possible to pull all data in one go, seemingly independent of how much data there is :sweat_smile:
(I stress-tested this on a sizeable posts table with id and raw columns and it managed!)

Sample query:

SELECT      'id'
            || string_agg(E'\n' || case when posts.id::text is null then '-1' else posts.id::text end
            ,''
            ORDER BY posts.id asc) as value
FROM        posts

When you run it, it will print your almost-csv file as one value, so also on one page. You would then need to trim the value" and another " at the end and it becomes a useful CSV:
image

You can naturally add more data to it by adding a new column for CSV file and a new data for that column:

SELECT      'id,newcolumn'
            || string_agg(E'\n' || case when posts.id::text is null then '-1' else posts.id::text end
            || string_agg(E'\n' || case when posts.newcolumn::text is null then '-1' else posts.newcolumn::text end
            ,''
            ORDER BY posts.id asc) as value
FROM        posts

When updated to support such a query, Power BI wouldn’t need to traverse across multiple pages, resulting in faster data retrieval.

2 Likes

Hi! Newish to Power BI, and very new to the Discourse Data Explorer, so bear with me.

I’m trying to connect, I’ve put in my forum URL, my username, and my password into the script at the top. I’ve run the test badges script in Data Explorer. However, when I try to connect on Power BI
I get this error:

I connect anonymously because if I try any of the other permissions in PowerBI it says the data must be collected anonymously.

Any tips?

I’m eventually trying to connect a whole bunch of our forums (several hundred) to PowerBI so we can have the analytics. If there’s an easier way to do that then manually like this I’d also be happy to hear it!

EDIT: I’m also a bit confused with the updated query for pulling all the data. Do I have to export as a CSV first? This isn’t a live link?

1 Like

You should not use your Forum user credentials, but an API key that you can grab from Admin → API
image

I wouldn’t start with this method before understanding the original one because it’s basically one giant workaround to get a lot of data at once.

3 Likes

Fair enough!

Sorry, I have no idea why I typed password, I meant API key and I was thinking API key! I went through those steps to grab it and inserted it into the area you noted in the PowerBI script.

2 Likes

I’d first try Postman to see if the API call goes through correctly.

The API call itself can be found when you navigate to the data explorer query page in your browser, open your Network console and run the query. The XHR filter will present you with all the required parameters that you could then use in Postman to verify all goes well.

2 Likes

Turns out I had a typo in Power BI, and that was causing the issue. Silly me! Its working correctly in Power BI, now I begin the process of trying to pull all or parts of the data for me to start sifting through.

2 Likes

@loginerror - I am wondering if I am just missing something silly. (Very new to PowerBI but I have done some basic Excel based Dashboarding)…

Continually gives me issues with credents. I know they are good. I have checked against Postman. I’m wondering in your query if I have to somehow remove the #'s or " around the parameter names (tried but got numerous syntax errors.)

I think I am loading all of it in properly even though your query icon looks slightly different than mine.

1_Discourse_Invoked_Function
2_Discourse_Enter_Params

If I do try and manually enter credentials it (PowerBI) continually tells me I have to enter a name in a field that they never provide me ;). I’m sure that is the wrong way to use this anyways as that never appears in your examples above.

It may be that I am just blocked on my network somewhere else but wanted to rule out silly errors first. Please see if anywhere in here looks wrong and let me know what you think. Thanks.

Hi @mattyoung

For Power BI, you should go for Anonymous credentials. This is because the API key is already hard coded in the function.

3 Likes

I was able to get it to work now with your sample query and a new api key and single user with that api key.

I’ll list some mistakes I was making and some confusing areas that were not clear to me… that might help others as well as what the steps are that actually work for me:

In Discourse I actually had to create an api and use a “Single User” not “All Users” API and specify an actual username.

In PowerBI, in order to use the Query system when entering PowerBI, select “Transform Data” in order to get into the Power Query area. Select New Source > Blank Query. Then when it shows your cursor in a new field by fx, from the top row of buttons select “Advanced Editor” paste in the big query given at the beginning of this topic with your already replaced Discourse username and API key. Select Done. A field asking for your Discourse Query ID number will appear. Enter the number… and hit Invoke.

An initial warning will appear asking for you to specify how you will want to authenticate:
4_howtoconnect

Select Edit Credentials.

You will see that this is listed as an “Anonymous” connection, which is what you want. Leave the url alone (as long as it is just a single URL with no subdirectories listed out.) Formerly I have been putting in the full URL that was listed in the query. Clearly this was overthinking on my part and a big mistake. :wink:

You should see output generated similar to this and you will know this basic query is now working!

Screen Shot 2021-10-12 at 11.13.28 AM

3 Likes

Hi! Me again!

I’ve understood the original way to pull data. I’ve successfully created several views and imported them to Power BI. Yay!

However, now I’m running into a speed problem. I’m trying to pull just the raw data in, so for example I use Select * from user_visits/users/topics, etc. I also have several forums. Currently I’m testing with about 15, but the refresh has started getting way too long, for example with User Visits. I’m currently sitting on an refresh for the past hour, for example. I’m eventually going to be importing data from hundreds of forums, so obviously that won’t work in the long term.

I’m truly lost with your work around though. I tried adjusting it, but it didn’t work (mainly because I’m rather poor at writing SQL queries). Can you help me in reworking your work around to allow me to say get Select * From User_visits in a quicker way?