SQL Error with `screened_ip_addresses` (API throws 500)

Hi friends,

The API is giving me an error 500 when I call to create a new post (in an existing topic). In the logs I see:

ActiveRecord::StatementInvalid (PG::InvalidTextRepresentation: ERROR:  invalid input syntax for type inet: ""
LINE 1: ..._addresses".* FROM "screened_ip_addresses" WHERE ('' <<= ip_...
                                                             ^
)
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-mini-profiler-2.0.1/lib/patches/db/pg.rb:69:in `exec_params'

Failed to handle exception in exception app middleware : PG::InvalidTextRepresentation: ERROR:  invalid input syntax for type inet: ""
LINE 1: ..._addresses".* FROM "screened_ip_addresses" WHERE ('' <<= ip_...
                                                             ^

Here is my Screened IP list - in addition to the screen shot below, I have also whitelisted the IP for the machine that is calling the API. (Iā€™m using a system-level API key to mass-import old topics/messages from my old forum software.)

Just for kicks, I also asked the API for the Screened IP listā€¦ same results. (https://mydiscourse.com/admin/logs/screened_ip_addresses.json)

Iā€™m not sure what else to check. :man_shrugging:t2:

Anyone know:

1. What is causing this error, and
2. How can I fix it now and prevent it from happening in the future?

Help :slight_smile:

Thanks!

Or is this more about some other SQL trying to insert into that table?

Can you post the code you are using to make an API request with credentials removed so that we can see how you are making the api request?

2 Likes

Hey @blake, thanks for the reply. Just to be completely sure it wasnā€™t a problem with my code, I put together the basic API call in Insomnia (similar to Postman but IMHO simpler/easier). Unfortunately, same results, but at least now itā€™s very clear:

Hereā€™s the call I prepared to create a new post (I have already ā€œminimum number of words in postsā€ turned down to 1):

And here are the results

And the 2 error messages in the logs from these test calls:

Error 1:

Message (15 copies reported)

ActiveRecord::StatementInvalid (PG::InvalidTextRepresentation: ERROR:  invalid input syntax for type inet: ""
LINE 1: ..._addresses".* FROM "screened_ip_addresses" WHERE ('' <<= ip_...
                                                             ^
)
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-mini-profiler-2.0.1/lib/patches/db/pg.rb:69:in `exec_params'

Backtrace

rack-mini-profiler-2.0.1/lib/patches/db/pg.rb:69:in `exec_params'
rack-mini-profiler-2.0.1/lib/patches/db/pg.rb:69:in `exec_params'
activerecord-6.0.1/lib/active_record/connection_adapters/postgresql_adapter.rb:672:in `block (2 levels) in exec_no_cache'
activesupport-6.0.1/lib/active_support/dependencies/interlock.rb:48:in `block in permit_concurrent_loads'
activesupport-6.0.1/lib/active_support/concurrency/share_lock.rb:187:in `yield_shares'
activesupport-6.0.1/lib/active_support/dependencies/interlock.rb:47:in `permit_concurrent_loads'
activerecord-6.0.1/lib/active_record/connection_adapters/postgresql_adapter.rb:671:in `block in exec_no_cache'
activerecord-6.0.1/lib/active_record/connection_adapters/abstract_adapter.rb:718:in `block (2 levels) in log'
/usr/local/lib/ruby/2.6.0/monitor.rb:235:in `mon_synchronize'
activerecord-6.0.1/lib/active_record/connection_adapters/abstract_adapter.rb:717:in `block in log'

Error 2:

Message (15 copies reported)

Failed to handle exception in exception app middleware : PG::InvalidTextRepresentation: ERROR:  invalid input syntax for type inet: ""
LINE 1: ..._addresses".* FROM "screened_ip_addresses" WHERE ('' <<= ip_...
                                                             ^


Backtrace

rack-mini-profiler-2.0.1/lib/patches/db/pg.rb:69:in `exec_params'
rack-mini-profiler-2.0.1/lib/patches/db/pg.rb:69:in `exec_params'
activerecord-6.0.1/lib/active_record/connection_adapters/postgresql_adapter.rb:672:in `block (2 levels) in exec_no_cache'
activesupport-6.0.1/lib/active_support/dependencies/interlock.rb:48:in `block in permit_concurrent_loads'
activesupport-6.0.1/lib/active_support/concurrency/share_lock.rb:187:in `yield_shares'
activesupport-6.0.1/lib/active_support/dependencies/interlock.rb:47:in `permit_concurrent_loads'
activerecord-6.0.1/lib/active_record/connection_adapters/postgresql_adapter.rb:671:in `block in exec_no_cache'
activerecord-6.0.1/lib/active_record/connection_adapters/abstract_adapter.rb:718:in `block (2 levels) in log'
/usr/local/lib/ruby/2.6.0/monitor.rb:235:in `mon_synchronize'
activerecord-6.0.1/lib/active_record/connection_adapters/abstract_adapter.rb:717:in `block in log'

I just created this user a moment before, so I wonder if this is a bug related to trust levels, somehow, or something else that isnā€™t fully authorized for a new user. Iā€™ll experiment a bit with that and see if I can figure out what will let the post get through the API.

But either way, it seems like I have surfaced some kind of bugā€¦ ?!

So hereā€™s how I got around the bugā€¦ I started experimenting:

  1. I changed the user in my API call to ā€˜systemā€™ and that worked
  2. So I thought, hmm, and I changed the user to a 3rd user that I hadnā€™t tried. That also worked.
  3. Then I changed the user back to the original username, which means I would be repeating the exact API call that threw a 500 error before. Except now it worked. :man_shrugging:t2:

If it comes up again Iā€™ll see if I can figure out how to reproduce it reliably. In the meantime, perhaps someone who knows the Discourse code better than me :wink: can glance over that part and be sure thereā€™s no obvious bugs jumping out.

1 Like

Hmmm there is definitely something weird going on here. Iā€™ve fired up my script again, and itā€™s starting to bulk-create topics and post messagesā€¦ and once again, now with a different user, Iā€™m getting the 500 error.

I take that username into Insomnia and test the APIā€¦ sure enough, 500 error. I change to ā€˜systemā€™ and it goes throughā€¦ then I go back to the original username and it fails with a 500 again!

The user throwing the 500 (and all my users being imported except ā€˜systemā€™) are TL1, and Iā€™ve changed the limits so TL1 can post and

This definitely feels like something to do with rate limiting, somehow? I would suspect that nginx, or something else upstream, could be causing the 500, but Iā€™ve removed all rate limiting from nginx, and the 500 error is very much showing up as that SQL bug in the error logs.

I noticed the SQL error is crapping out around a column that references IP addressā€¦ could there be anything strange going on with how rate limiting might be working that could cause this problem? I tried logging in with a VPN (to change my IP) but I still get the 500 error.

In the meantime, I see the problem is in rack-mini-profiler which is not necessary. Letā€™s see if I can turn that off and if that fixes the problem. ā€¦ Nope. Now I no longer see the mini-profiler on my admin account, but Iā€™m still getting the same HTTP 500 with the same errors in the error log :frowning:

Update: by changing the user to TL3, the problem seems to go away. Switching back to TL1 causes it happen again. Iā€™m testing this theory nowā€¦ nope. Sometimes that works, but other times even if I manually edit the userā€™s TL, the user seems to get ā€œstuck.ā€

@blake or someone else on @staffā€¦ help! :wink: whatā€™s left to try? Is there something I can flush completely here (related to the code throwing the error) and reset to factory settings?

1 Like

It looks like a possible networking problem to me, based on the stack trace sometimes no IP address is seen for the user, somehow? Or you have bad database data in admin, logs, screened IP addresses?

1 Like

Agreed - <<= is ā€œLHS contained within RHSā€ so it seems like a bad/blank ip address is getting to the app.

(Iā€™m curious as to how you have NO IP address available to the app; my only guess is the request is coming in a unix socket with no forwarded IP info header)

2 Likes

Agreed, but I canā€™t see why sometimes the IP would be there and sometimes not. In my import script, I get this 500 error after at least 5-7 successful API calls before it. Thatā€™s why I am thinking itā€™s corrupt data in the database somehow.

Yeah, itā€™s weird. But at the same time, if I call with a ā€œbadā€ username, it fails; if I call with ā€œsystemā€ or another username, it works. So I think weā€™re pointing to corrupt data in my database, possibly.

Can someone give me some easy Ruby command-line things I can type to clear/reset any database tables that might be screwed up?

I donā€™t think anything is wrong with your db because the error is checking if there is a blank ip address in the screened ip address table not that there is empty ip addresses in your db.

The interesting thing though is that this code is checking for the existence of an ip_address before making that sql request that is throwing the 500.

Can you describe how your Discourse instance is setup? Is it from an import? Are you on the latest version? How much ram does it have? Did you follow this guide?

Can you check this site setting: max new accounts per registration ip? Iā€™m not sure it matters in this case but perhaps it is causing an issue.

In your bulk script could you slow it down and have a 1 sec pause between requests and see if that makes a difference?

What is the purpose of these api calls? If it is a one time script to build import users and posts then perhaps an import script that runs on the actual server that doesnā€™t make api calls would be better?

Sorry for all the questions, but we havenā€™t run into this issue before and the code around screened_ip_addresses hasnā€™t been updated in a long while. Iā€™m not saying we donā€™t have a bug in the code base somewhere, but after having a quick look nothing is standing out at the moment.

Sure ā€“ itā€™s completely Docker, on Digital Ocean. I followed that excellent guide to the T.

Sure, I just changed that setting from default 3 to 99999. No different, still getting the 500 error.

Tried that, no difference. Note that I still get the 500 error with ONE ā€œbadā€ account, through Insomnia. So at this point, itā€™s like that one account is poisoned, and even if I just make the one ā€œcreate messageā€ API call with it (none before or after), I still get the 500 error. But yes, my import script is getting the 500 also :wink:

Yes, Iā€™m an experienced coder but donā€™t know RoR/Ruby at all, so I canā€™t really use the off-the-shelf options that yā€™all provide, though I recognize they are likely superior to me manually walking my existing forums and creating users, etc on the fly via API. Thus my marketplace postā€¦ I would love to get this all working for myself but I also have a hard deadline :wink:

Totally get it, and I appreciate your attention to this.

So hereā€™s something that I can offer which might help a lot: since itā€™s an off-the-shelf installation, and I havenā€™t done much of any customizing, AND the bug is easily reproducible without my code (just use Insomnia) AND I havenā€™t launched the forums, I could pass you the root login for the Digital Ocean instance, my API key etc., and I have no problem with you mucking around in there. My Discourse forums are currently a ton of empty categories and a few other special intro messages weā€™ve set up, but itā€™s basically empty and no real users are in there yet (just admins). So it would be fine if you want to test things, create/erase topics and messages, etc.

This would definitely be the fastest way to see the bug first-hand for yourself. And then since you would be in there as root, you can also muck with whatever low-level Discourse things you want, to find out why this is happening.

E

1 Like

What happens if you disable SSO?

1 Like

No difference, still throws the 500 error.

Another idea: is it possible that every account itā€™s worked for is an admin? I know some of the application-level ratelimits are bypassed for admins.

In general, however, itā€™s a lot easier to just write an import script instead. This entire topic is what ā€œa lotā€ looks like :wink:

Hmm, I thought I had something thereā€¦ my test ā€œpoisonedā€ problem user, george21, was at TL0. So I changed him to TL1 and then it worked. Ok! Maybe thatā€™s it! So then I changed george21 back to TL0ā€¦ and now he is no longer ā€œpoisonedā€-- he can make the API call even as TL0.

Now Iā€™ll run my import script again, and aha. Now george21 is throwing the 500 error in the import script. And when I try in Insomnia, it fails. So now Iā€™ll put george21 back to TL1 andā€¦ yes, he can run the HTTP call.

So hereā€™s what I seem to be able to reproduce:

  1. If a series of API calls are made (?) it somehow causes a subsequent API call to fail with a TL0 user.
  2. Changing the TL0 user to a TL1 allows the API call to get through.
  3. And strangely, then changing that same user back to TL0 still allows the API call to get through.
  4. Running the script again is fine until it once again fails with another TL0 user.

Note that:

  1. As far as I know all minimums etc for TL0 have been raised (i.e. I have tried to remove every block that would prevent a TL0 user from posting), and
  2. Even if this is an issue with some kind of internal rate limit for TL0 users, the API shouldnā€™t be throwing a 500 and putting a SQL error in the error log. So I think we can say at this point there is definitely a bug in there somewhere.

Yeah, um, I know, and Iā€™ve gone through why Iā€™m not writing my own import script (based on the examples given) like four times already. :wink: Thus my change of approach.

And in the meantime, Iā€™m continuing to contribute here to try to help find and fix this bug. Today itā€™s impacting my import script. Tomorrow it could be some important script you have on your site that needs the APIā€¦

1 Like