7
typosaurus
136d

Finished my regex validator. But now the edgy stuff kept coming. It seems that you can do a-d or 3-8. OK, makes sense (else it would be just copies of \w and \d), but anyone ever saw someone using it? I only knew a-z and 0-9.

Thing is, I wrote the perfect design now for the interpreter. Adding features is easy now and not so exciting.

Still, I have a big plan for it that makes it possible to validate nests like (()) or {{"}"}} or anything you see as start / close tag while keeping regex generic. I'm not learning it that signs between some chars ("') has special rules. That would be specialization.

Fun fact: my regex is six times slower than native C code (not c regex) validating the same. In half of test cases faster than c regex. I consider it a success.

Thanks for listening

Comments
  • 2
    IP address regex uses sub ranges
  • 2
    You are writing it in C?
  • 1
    @atheist hmm, could workaround with [123456789] 😁. But thanks for use case
  • 2
    @Lensflare yes! Wanted to compete with the C regex. When it comes to compile and execute together, I kick its ass. With only executing, I win 7/14 tests in performance. The C regex compiler can have serious performance issues. For some reason it doesn't handle ".{33}" very well and the executor can crash on a very simple (but long) date validation I had. So, I win when it comes to one time validation. My performance is more consistent. I balanced what compiler does and what executor does pretty well I guess. The C compiler seems to focus on compiler more. My compiler only recorders the expression and often even generates code. [3]{3} hard work is done by compiler converting it to [3][3][3]. [34]? Becomes?[34] so my executor only have to look forward. The C regex compiler does probably some advanced calculations that often backfires
  • 1
    Retoorgex?
  • 2
    @retoor have you implemented the finite automata algo? ie are you protected against what took out cloudflare https://blog.cloudflare.com/details...
  • 1
    @atheist I didn't, but I'm probably safe since I don't support ` and ™. I'm sure it's possible to make a infinite loop somehow. Boost lib had it too, just fixed it afaik.

    Dammit, again two more things to do. ` and ~.

    I never really took the effort to learn regex before starting this project.

    I learned regex from gpt and the flaws are there. My code is not from gpt, let me make that clear :)

    Learning from gpt has as issue that you declare what to learn yourself. You can start learning smth in the middle of subject.

    Regex comes anyway in much flavors. After testing what my regex can do - it's compatible with Rust regex what in my opinion is the best. I allow [[abc][def]] recursion. In my opinion syntatically not wrong ant my implementation allows it by its design automatically. So does rust. I tested per language at regex101
  • 4
    I'd say [0-9a-f] is pretty common
Add Comment