How to make the database (or part of it) accessible to a cloud data processor?

I was interested in using ETL services like Stitch Data or Skyvia to integrate different data sources (including my Discourse database), but I have been told by someone at Skyvia that this is not possible:

Skyvia can connect to PostgreSQL through SSH, however it is not possible to connect to it when it’s inside a docker container while SSH server is not in the container but in front of it.

These are their requirements to connect to Postgres.

Is there an obvious workaround?

You could enable ssh in the discourse container (on a non standard port) and then allow them to connect there. I think there may be an example in the samples directory of Discourse_docker.

3 Likes

Thanks, Jay. I ended up using docker-ssh with publicKey authentication. :+1:

2 Likes

I seem to be missing a key concept because I’m able to connect via ssh with the custom port and then do su postgres -c 'psql discourse' without a problem. Everything works in this two-step approach but I think what I need in order to connect directly via pgAdmin (for example) is something slightly different.

This is the command I’m using to expose a custom port:

docker run -d -p 2222:22 \
        -v /var/run/docker.sock:/var/run/docker.sock \
        -v ~/.ssh/authorized_keys:/authorized_keys \
        --name my-sshd \
        -e FILTERS={\"name\":[\"^/app$\"]} -e AUTH_MECHANISM=publicKey \
        -e AUTHORIZED_KEYS=/authorized_keys \
        jeroenpeeters/docker-ssh

Which allows me to later do this directly (without running the docker container via launcher enter app):

ssh whatever@host -p 2222
su postgres -c 'psql discourse'

I’ve tried several things, but unsuccessfully. I feel like there should be a way to do ssh whatever@host -p XXXX and connect directly to the database (which is probably what pgAdmin is expecting)

Are you not connecting or are you having a permissions problem?

I can connect in the command line via ssh and then psql. I am not able to connect via pgAdmin.

1 Like

You need to expose the PostgreSQL port directly for you to be able to connect via pgAdmin.

In the app.yml, near the top, you see the 80 and 443 ports open. You can add another line for the port 5432 for PostgreSQL.

That said, this is mostly certainly a very bad idea. The database went from accepting only local connections to being exposed to the whole internet.

If all you need is some occasional reporting, downloading some CSVs from Data Explorer and loading those in your favorite tool may be enough. You can also download Discourse backups (without uploads) and those are just the standard PostgreSQL dump format. With that in hand you can restore it to a local PostgreSQL instance for analysis.

1 Like

Thanks, Rafael

I’ve done this and rebuilt the container. But this still doesn’t work (I have the real IP instead of the XX.XX.XX.XX)

And in the SSH Tunnel tab:

This is the error I get

Regarding this, I understand it’s one less layer of protection, but it still requires having the SSH private key. Doesn’t it?

If you add 5432 to the app.yml it is exposed directly, without needing the SSH tunnel.

I can’t give advice on the pgAdmin SSH tunnel, as I have never used it. I assume it expects the port to list to local connections, so it doesn’t need to be exposed to the internet.

Try:

expose:
  - "80:80"
  - "443:443"
  - "5432"
1 Like

But there isn’t a postgres password because it requires being a superuser: the pg_hba.conf file has “local” connection permissions set to “peer”, so it depends on the UNIX user, which requires logging via SSH, no?

This doesn’t work: psql -h XX.XX.XX.XX -p 5432 -U postgres -d discourse

You can connect to psql as the superuser

./launcher enter app
su postgres
psql

and create the necessary user with the necessary permissions for your reporting.

Right, I have no problem connecting from the app docker container. My problem is connecting directly to the postgres DB from my local machine (so I can use pgAdmin) or from a cloud data processor like Stitch. Both of these expect a host IP address and SSH credentials, but I haven’t been able to get them to work (I get the error I showed above).

The only thing I’ve been able to do is use docker-ssh to access the app docker container directly (via publicKey) from my local computer (without doing launcher enter app), but I still need to do su postgres 'psql discourse' in order to access the DB, which I assume is the problem with pgAdmin/Stitch—they expect a direct connection.

Did you try creating a new PostgreSQL with a password and supplying that to your service?

Yes, they have a pretty lengthy procedure.

But I have the same issue simply using pgAdmin from my local computer.

I believe that the How to move from standalone container to separate web and data containers may have instructions for setting a password.

Also, I think you could bind the postgres port only to 127.0.0.1.

expose:
  - "80:80"
  - "443:443"
  - "127.0.0.1:5432:5432"
1 Like

I’ve decided to take a step back to see if I can connect to the DB without exposing any ports. :grin:

If I enter the container I see this:

# netstat -lp | grep postgres
tcp        0      0 0.0.0.0:postgresql      0.0.0.0:*               LISTEN      -
tcp6       0      0 [::]:postgresql         [::]:*                  LISTEN      -
unix  2      [ ACC ]     STREAM     LISTENING     263612292 -                    /var/run/postgresql/.s.PGSQL.5432

If I exit the container and I’m in my remote server (not on my local computer yet), shouldn’t I be able to connect using this?

/var/discourse# psql -h localhost -d discourse -U postgres

The problem is that I get a password prompt. Since the postgres user doesn’t have one, I tried creating a different user and assigning them a password:

CREATE USER whatever_user WITH ENCRYPTED PASSWORD '<whatever password>';
GRANT CONNECT ON DATABASE discourse TO whatever_user;
GRANT USAGE ON SCHEMA public TO whatever_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO whatever_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO whatever_user;

I added a line for that user with md5 in pg_hba.conf and restarted PG with service postgresql restart

# Database administrative login by Unix domain socket
local   all             postgres                                peer
local   all             whatever_user                      md5

However, when I try to connect from the remote server I get an authentication failure:

# psql -h localhost -d discourse -U whatever_user
Password for user whatever_user:
psql: FATAL:  password authentication failed for user "whatever_user"
FATAL:  password authentication failed for user "whatever_user"

What am I missing? I’m trying to at least be able to connect to the DB from the same server. Step 2 would be doing the same using an SSH tunnel, but I guess I need to take care of step 1 first. Any help is appreciated.

Ok. I finally figured it out :tada:

I changed this:

to this - "127.0.0.1:5433:5432" because I got an error saying the port was already used.

I rebuilt the container and checked that the port was indeed open:

$ sudo docker ps
CONTAINER ID        IMAGE                           COMMAND             CREATED             STATUS              PORTS                      NAMES
whatever_id        local_discourse/app             "/sbin/boot"        20 minutes ago      Up 20 minutes       127.0.0.1:5433->5432/tcp   app

I am now able to create an SSH tunnel and connect from my remote server using a user with password:

# create the tunnel (you could also use ssh -f to run it in the background)
ssh -v -N -L 5433:localhost:5433 SERVER_IP_ADDRESS

# connect in another tab and enter the password 
psql -h localhost -d discourse -U whatever_user -p 5433

If anyone is trying to do this and runs into any issues, let me know.

4 Likes