Do all the things like ++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatarSign Up
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple APILearn More
ltlian21201y"Okay, then we have a great opportunity where any user can take down our database"
Actual quote from the office.
ltlian21201y@Oktokolo Just a poorly formulated query that could lead to a timeout, which lead to an internal retry, which made the next query more likely to time out, repeat. We were all aware of it but had some trouble convincing the higher ups that it needs fixing when it "works".
"It works, but it's a big problem. This is not okay", "We can't just talk about problems all day. Are there any opportunities here?", "Yes, an opporrunity for anyone to bring down the whole system".
Luckily it was all in jest and we got the point across. Pretty sure we spent more time justifying the time needed to fix it than the less-than-a-day someone spent fixing it.
@Oktokolo One non-injection exploit example is a Regex Dos -- If a database supports regex and the backend uses a regex to either find or validate a record somewhere, it can crash the DB daemon.
Lets say for example we want to either find certain subsequences in a field containing a DNA sequence, or add a check constraint so only certain types of sequences can be added.
Take the underlying pattern, which matches an A and a G, with any combination of Cs and Gs in-between. So ACCG, ATTG, ACTTTCTG are all valid, but ATTCTA is not because it goes back to A instead of G.
It works quite well for finding all matches globally even on a somewhat large string.
But one specific short malicious string causes even the Regex101 website to call it quits.
I've used this to bring down PostGres systems which used a regex check constraint on the email field.
@Oktokolo Yeah but the DoS exploit/bug works in reverse: The user doesn't provide the regex, they provide the string. Regex is basically a language on its own, and the above example is a string which triggers a nasty nested loop with some exponentially growing time complexity depending on the amount of "T" characters.
The bug often occurs non-maliciously when writing validation algorithms for things which aren't easily regexable, like email or URL validation.
A regex is a non-deterministic finite automaton, which has to be transformed into a deterministic finite automaton by its engine. This transformation can require NP time.
"^(xx+)$" means: Match one x, and then another 1-or-more x's. In practice, that means it matches xx, xxx, xxxx, xxxxx, etc.
Now try: "^(xx+?)\1+$". That means we're matching the above group of 2, 3, 4, 5, 6, etc x's, but now 2-or-more times.
Boom, you've got a prime number sieve. (More accurate would be ^x?$|^(xx+?)\1+$ to account for 1 as a non-prime).
I wouldn't expect "^(xx+)$" to have complexity explosion potential.
I always assumed strictly regular patterns like that to be safe.
But "^(xx+?)\1+$" is using a back reference and therefore isn't defining a regular language anymore.
Complexity explosions in non-regular languages aren't surprising.
Back references aren't necessary for a regex to have high degrees of time complexity.
It takes 6 steps to see that ax doesn't match, 12 for aax, 24 for aaax -- In other words, 3 * 2^n, or O(2^n), with n being the amount of "a" characters before it finds the x.
This happens because for aaaax it tries
8 possible groupings, 6 steps per grouping = 48 steps total.
But yes, this weakness exists specifically to accommodate for references, even when the references are UNUSED in the regular expression.
On the other hand, the complexity in a regex without groups, like for ^a+$ is just a linear 2+n, or O(n).
Many languages actually also have the concept of atomic non-capturing groups to avoid DoS vulnerability: ^(?>a+)+$ is also a regex with linear time complexity.
It's also quite funny how often large documentation sources get this wrong, especially regarding email validation.
If I test "firstname.lastname@example.org" on the above, it will just hang the tab and max out a core on my CPU.
Imagine using that on some autoscaling NodeJS cluster.
If I replace it with this:
It's still a very bad regex for actually matching email addresses correctly, but at least it's not vulnerable anymore.
Oh yeah - email verification.
It is like parsing HTML with regexps. Email addresses are pretty overcomplicated even when not supporting comments...
I ended with just making sure that it contains an @ and that the link from the verification email gets clicked.
Thanks for the insights into how regular expressions work in practice.
Most of the DoS opportunities looked odd to me before (like having +/* quantfiers inside a group and on the group itself or using non-regular features).
But i really didn't know, that making sure to actually parse a regular language isn't enough in practice.
This optimism is ruining me.
This isn’t right.