EC2 Origin Exposed By Google Web Crawler With Discourse & CloudFlare

I apologize if this is web hosting problem it was a toss-up, I am running the site on reverse NGINX proxy and the problem seems to be with Discourse side of things as my other site hosted on this server is unaffected. It appears the public DNS of AWS EC2 is being scanned by Google and this is exposing EC2 DNS / IP to others circumventing Cloudflare protection. For example, refer to the image below. In this case, it shows the landing page of the forum.

I am not totally sure how to address this problem, I researched a few solutions with one being adding security groups in AWS to only allow Cloudflare IPs inbound to force Cloudflare only (seems like it would break things in my opinion). My concern is this solution may break reverse nginx and NGINX rewrite rules I have to expose true IP of users instead of the cloudflare IP with moderation and logging. Is this the correct solution or do or is there something I should look into fixing so discourse does not allow crawling under EC2s public DNS.

Guidance is Much Appreciated Thanks.

Edit: I found I was able to edit VPC rules and disable under “Edit DNS Hostnames” in VPC settings. I am unsure if this will solve the problem. Currently, the URL still points to my server, is this the right move? As per below I am now missing the public DNS which caused the problem above

image

Did you use the AWS FQDN to configure your Discourse instance? It shouldn’t even be responding to that hostname unless you used it in your configuration.

Discourse doesn’t need to know the DNS name AWS has assigned, you just need an ‘a’ record pointed at the public IP for your ELB. Your machine doesn’t need to be addressable remotely at all.

I did not use AWS FQDN to set up the server, I used my FQDN domain name when installing discourse. I am not using ELB, I am just pointing CloudFlare to my EC2 origin if that matters. I did look today and the URL is still active for public DNS and points to my server despite it missing in the AWS panel now (which is quite weird) the AWS documentation does not explain this well.

Only thing I can think of is moving the server and ensuring it does not get assigned public DNS when it is created as removing it in VPC groups does not appear to remove it from what I can see, a lot of work for uncertainty though. Any advice is appreciated.

If you’re using SSL just reconfigure discourse to use the correct address. Search will no longer be able to crawl the old URL and entries in the index will slowly drop out the next time engines go to crawl.

I am using SSL and it is working, when I got to the public DNS it spews SSL errors (as you would imagine). Google still seems to crawl it despite the SSL mismatch. I did not change anything in way I handled SSL since the site was installed so I am surprised Google is still crawling the page despite the SSL errors. It is even still being updated on the search engine which is quite odd. The weird thing is I do not know why because I have not pointed anything to public DNS unless discourse did something with a mind of its own during install. I have a second site running on this server that is unaffected by this problem.

The thing is, Discourse can’t infer the DNS name for a server, it has to be specified somewhere.

How are you using Cloudflare? Are you using an A record, or CNAME?

is my cloudflare DNS setup. Not pointing to the AWS DNS, just the IP of the server.

Edit: I just checked the google domains also and there are no records on it, so not sure what is pointing to public DNS.

This is the correct solution. Your concern is unnecessary. Cloudflare is passing that IP in a HTTP header.

3 Likes

Would this break updating the server and preforming maintenance or does everything work fine under cloudflare HTTPS/HTTP only? or would I need to disable this every now and then to update things?

Updating would be outbound traffic (i.e. initiated from the “inside”), while the solution is about setting up inbound rules (affecting traffic initiated on the “outside”). So you will be fine if the rules are set up correctly.

1 Like