Hi Quincy @ossia
Let’s take a quick step back for a second and look at this from a professional cybersecurity perspective, sans the speculation and “grasping at straws” approach.
The key concept in all cybersecurity tasks is the concept of “situational awareness” so in this case, it’s called “cyber situational awareness” (CSA).
In order to know “what is happening”, in a definitive way, you need to develop the best situational knowledge you can without speculation or guessing. Just the facts.
How do you do this?
Well, very briefly:
Well, very briefly:
We do this by fusing information from all our sensors, and for web-based applications this normally comes from the log files and the session data. I don’t think (off the top of my head) that discourse maintains session information in the PG database (the last time I checked there was no session table like in some LAMP web apps), but that’s not a showstopper at all.
You have most everything you need in the nginx log files
for both your reverse proxy outside the container (I recall reading in this topic that you were using nginx
as a proxy) and the same logging information is also in the container
. In both setups, the log file is here, in the standard OOTB setup:
Here is an example in one of our setups (outside the container) for the reverse proxy:
# cd /var/log/nginx
# ls -l
total 779964
-rw-r----- 1 www-data adm 0 Jun 17 06:25 access.log
-rw-r----- 1 www-data adm 660766201 Jun 25 18:26 access.log.1
-rw-r----- 1 www-data adm 107367317 Jun 17 03:18 access.log.2.gz
-rw-r----- 1 www-data adm 21890638 May 21 03:08 access.log.3.gz
-rw-r----- 1 www-data adm 7414232 May 5 07:26 access.log.4.gz
-rw-r----- 1 www-data adm 63289 Apr 18 09:12 access.log.5.gz
-rw-r----- 1 www-data adm 0 Jun 17 06:25 error.log
-rw-r----- 1 www-data adm 904864 Jun 25 18:19 error.log.1
-rw-r----- 1 www-data adm 96255 Jun 17 03:17 error.log.2.gz
-rw-r----- 1 www-data adm 79065 May 21 02:58 error.log.3.gz
-rw-r----- 1 www-data adm 70799 May 5 06:54 error.log.4.gz
-rw-r----- 1 www-data adm 1977 Apr 18 05:49 error.log.5.gz
Here is the same basic information logging inside the discourse container:
# cd /var/discourse/
# ./launcher enter socket
# cd /var/log/nginx
# ls -l
total 215440
-rw-r--r-- 1 www-data www-data 87002396 Jun 25 18:28 access.log
-rw-r--r-- 1 www-data www-data 101014650 Jun 25 08:02 access.log.1
-rw-r--r-- 1 www-data www-data 8217731 Jun 24 08:02 access.log.2.gz
-rw-r--r-- 1 www-data www-data 6972317 Jun 23 07:53 access.log.3.gz
-rw-r--r-- 1 www-data www-data 3136381 Jun 22 07:50 access.log.4.gz
-rw-r--r-- 1 www-data www-data 2661418 Jun 21 07:45 access.log.5.gz
-rw-r--r-- 1 www-data www-data 5098097 Jun 20 07:38 access.log.6.gz
-rw-r--r-- 1 www-data www-data 6461672 Jun 19 07:40 access.log.7.gz
-rw-r--r-- 1 www-data www-data 0 Jun 25 08:02 error.log
-rw-r--r-- 1 www-data www-data 0 Jun 24 08:02 error.log.1
-rw-r--r-- 1 www-data www-data 20 Jun 23 07:53 error.log.2.gz
-rw-r--r-- 1 www-data www-data 254 Jun 23 02:36 error.log.3.gz
-rw-r--r-- 1 www-data www-data 20 Jun 21 07:45 error.log.4.gz
-rw-r--r-- 1 www-data www-data 20 Jun 20 07:38 error.log.5.gz
-rw-r--r-- 1 www-data www-data 20 Jun 19 07:40 error.log.6.gz
-rw-r--r-- 1 www-data www-data 274 Jun 18 15:40 error.log.7.gz
Note: That “in container” info is also available from outside the container on the shared volume.
Hence (and to keep this reply short), @ossia, just about everything you need to gain situational knowledge of what is happening is in these robust log files. No speculation is necessary. The data is all there.
There is even more great data available in the rails log, for example. on one of our setups here is the rails production log:
tail -f /var/discourse/shared/socket/log/rails/production.log
The rails log has a lot of great user logging information as well, for example:
Started GET "/embed/comments?topic_id=378686" for 73.63.114.60 at 2020-06-25 18:36:15 +0000
Started GET "/embed/comments?topic_id=378686" for 195.184.106.202 at 2020-06-25 18:36:16 +0000
Started GET "/embed/comments?topic_id=378686" for 17.150.212.174 at 2020-06-25 18:36:16 +0000
Started GET "/embed/comments?topic_id=378686" for 76.235.99.73 at 2020-06-25 18:36:18 +0000
Started GET "/embed/comments?topic_id=378686" for 124.253.211.42 at 2020-06-25 18:36:19 +0000
Started GET "/embed/comments?topic_id=378686" for 103.96.30.11 at 2020-06-25 18:36:21 +0000
Started GET "/embed/comments?topic_id=378686" for 72.191.206.59 at 2020-06-25 18:36:22 +0000
Started GET "/embed/comments?topic_id=378686" for 68.252.68.76 at 2020-06-25 18:36:23 +0000
Started GET "/embed/comments?topic_id=378686" for 69.17.252.83 at 2020-06-25 18:36:23 +0000
Started GET "/embed/comments?topic_id=378686" for 98.109.33.230 at 2020-06-25 18:36:24 +0000
Note: Here (above, as an example) we see the IP addresses of clients pulling the discourse embedded code from another server.
The task at hand....
Back to the task at hand, the “trick” is to move past speculation and guessing, and to do the fun (1) filtering / data cleansing, (2) data fusion and (3) analysis of your sensor data ( logfiles ) to create (4) the situational awareness (SA) of what is happening at your site.
For older LAMP apps, I actually have custom code I wrote years ago which writes all this information to a DB table and does the analysis in real-time and counts the “hits” by IP address (as one example) where I can quickly see what and who and from where is hitting the site, because it does take some code to do this kind of data cleansing, filtering and fusion. (Useful during DDOS attacks, and rogue bot activity, for example).
That’s no problem for you @ossia because you are freeCodeCamp.org
so you have both the knowledge to find great log file analysis tools (there are many out there is the cyberverse) and/or create your own custom code to do the analysis quickly and easily based on the scenario you wish to understand (your topic and issue).
I wrote my custom code for an old legacy LAMP app in a few hours many years ago, and I’m no coding genius
by any stretch of the imagination, even thought I am sometimes referred to as a “legend” by many in the cybersecurity field, LOL
To summarize....
Well, to summarize…
You have all the data you need to create deep situational knowledge of “what is going on” on your site and. you can create that SA by cleaning, filtering, fusing and doing some basic analysis of your logfile data. There are tools out there which can help, but I always find it easier to bang out some custom code based on the objective of the analysis (dependent analysis), YMMV. but you can easily do this because you are freeCodeCamp.org
and have a lot of tech skills.
I recommend you move alway from trying to gain SA from Google Analytics and other JS based third party apps. Nothing is better than your own web log files (and DB session data if you have it) and you don’t have to worry about “what may or may not be blocked” etc. Your web server log files contain the data to gain the CSA you need (and can also be customized when needed).
In some of my CSA code, I actually intercept the session info and log information from the HTTP requests not logged by nginx, apache2 and other web servers (for additional info); but I have not written this kind of code for discourse (yet) as I’m not as “easy as pie” of a discourse plugin developer (like the meta discourse team gurus here) as I am with LAMP apps, having only started with discourse a few months ago and have not yet written any custom CSA code for discourse (and trying to code less this year, to be honest).
CSA is based on the fusion of sensor data and from CSA comes the knowledge to understand what actions you need to take to remediate any cybersecurity issue.
All the best in your quest and hope this helps you to have more rest
Cheers!
Original (Historical) CSA Reference:
Original (Historical) CSA Reference:
https://www.researchgate.net/publication/220420389_Intrusion_Detection_Systems_and_Multisensor_Data_Fusion
(Reference only for people interested in the origins and core tech of CSA)