11

If you were to write a regular expression to match phone numbers in the format of either:
(123) 456-7890 or
123-456-7890

Would you prefer a regular expression that looked like:
A) /^(?:\(\d{3}\) |\d{3}-)\d{3}-\d{4}$/g
B) ^\(\d{3}\) \d{3}-\d{4}$|^\d{3}-\d{3}-\d{4}$/g
C) Other
D) I hate regex

Reasoning? Alternative? Discuss.

(I'm curious about preferences surrounding the readability of regular expressions)

Comments
  • 12
    Strip out all non numbers first.
    Check for allowed digits, order, length.

    Then in code check for country codes, validity, etc.
  • 4
    Either option A or my specific solution would be

    ^(\(\d{3}\)\s|\d{3}-)\d{3}-\d{4}$

    Not sure if I can give you many tips regarding readability of regex but I'd personally rather like the shorter with the first group matching either (123) or 123-

    Id also rather use \s rather than just a space character, mosty because It's more explicit to what the intention is

    I'd drop the non-capturing group in favor of a normal group, less characters and no real impact other than an extra group. Unless you're coding this on a microcontroller the memory/perf hit is negligable

    Anyone that wrote a few dozen of these should be able to read any of the options just fine, but I like to try and minimize the reading context one must store in their head as they read, which is why I'd rather use A than B even though B is simpler to understand once you read it entirely. A just seems more natural to read and parse as I go
  • 5
    I'd go for D in this case.
  • 3
    I love regex as a search (and replace) tool for refactoring code.

    I dislike to use it in code, and avoid it when possible, although that's not always doable.

    For phone numbers, I'd look for a way to sanitize/normalize it at the input as much as possible, and then make sure my API only accepts E.164 formatted input for example.

    Ideally I'd use an existing package for things like phone/email/zipcode verification, because there's a lot of pitfalls in doing it properly yourself. Google's libphonenumber (available for many languages) does validation, sanitizing & parsing quite well for example.

    If there is no package, then a quick and dirty E.164 regex would be ^\+[1-9]\d{1,14}$

    But I would prefer a method of actually verifying the phone number, like sending a text — if the budget would allow for such a service.
  • 0
    I’m on 2 hrs sleep from a foot infection so I’m not sure this is right, but..split it at the dashes into an array. If it’s not of size 2 or 3, fail. If it’s size 2, check for the space in the first index of the array (bc the first example has a space after the area code) and split it there, fail if you can’t split it in two. Now either way you have three pieces to scan. The regex for each piece should be much simpler from here! First check the area code then, um...what’s the next seven numbers called (lol)? The idea is to try to get it into three smaller pieces so you can use smaller regex expressions for each
  • 2
    Definitely D. I hope I know someone with a kind heart that knows it (regex) if I ever really need it. But I would still prefer human readable code if possible.
  • 0
    Regex is a bit niche and easy to forget. Not to say it’s useless, it’s still good to know, but the less complex the better since long winded regex isn’t easy to read at all
  • 1
    There is no such thing as “readability of regular expressions”.
  • 3
    I wish all sites I had to enter my phone number into allowed me to just type it like 1234567890. Requiring me to format with () or - is kind of pointless. It's great it if does it automatically, but getting an error because I didn't type (123)-456-7890 is annoying. This seems especially common for job applications.
  • 2
    I absolitely adore regex. I avoid it in my actual work code because it confuses people sometimes but I use it relatively often in my personal projects to speed things up during prototyping. It's simply amazing and powerful as hell. A single line of regex can save me several lines of code.

    Also OP didn't really specify what she needs it for, so we don't know if this is an input sanitization problem or search/replace. If the task is to find all phone numbers in a document it's a heavy approach to bust up python or god forbid something heavier to do it if you can do "grep -e 'regex'".

    I really recommend everyone to use regex more and stop being scared of it, It's like programming a tiny computer inside your computer!
  • 1
    Most regex classes support an pattern initialization.

    Write a small method containing initialization, creating the pattern if it isn't initialized.

    Eg
    // return if initialized
    if $rePattern != null
    return $rePattern;

    // match (...) at beginning of string
    $re = '...'
    // 3 digits following
    $re .= '...'
    ...

    $rePatten = new RegexPattern($rePattern);

    It's not hard to understand a regular expression if you split it.

    A simple factory for the regEx pattern or string and you have a good documentation.

    If you need to change the regEx it will be easier, too.

    Don't split it up over multiple functions, keep it simple.

    Back to the regular expression itself...

    The validation is very strict.

    Eg instead of a single whitespace, use \s+ to match any whitespace.

    I'd write the regular expression different, although it would be less performant, I guess.

    Eg.
    $seperator = '([\s-]+)';

    Allowing a line feed,whitespace,tab, hyphen...
  • 0
    I would google "x-language regex phone y-year" and then copy and paste whatever the fuck somebody came up with this year on SO without even a cursory glance at the regex to see if it even makes sense to me.

    On rare occasion I can't find a regex that works for me and then would probably do something closer to @bittersweet 's approach by normalizing and sanitizing with more standard string manipulation to make the regex easy enough I can write it myself.
  • 0
    Thanks, friends. The question was really borne out of having solved a Leetcode question with regex and having discussed both my solution and the solution that was marked as having been the fastest for the language I was using with a friend who has significantly more industry experience. He intimated that he would wish physical harm on someone who wrote a regex like A and strongly preferred B for readability. Was curious if that was a common reaction or if every programmer had a different preference.

    Mostly because I had a ridiculous momentary idea of writing a program to reduce regular expressions. XD
  • 0
    @AmyShackles

    hey, if you ever make that RegEx reducer link it on devrant, I'd love to see it. Could be useful here and there. Like packing a longer RegEx for delivery to a user (like compiled javascript for example) or if you just want to write a minimal sized framework/lib, every byte helps.

    If you made something that turned B into A it would be quite fun, cause then we could just write separate RegEx's like in B for each case, and the reducer would produce a single optimized regex... it might even make it easier to spot flaws or blindspots if the optimized version clearly misses something or does something extra...

    It would be pretty niche, but still sounds like fun!
  • 0
    1. Regex is awesome
    2. I literally just wrote this on my phone lol

    ^[\d]{0,3}[(.-]?[\d]{3}[).-]?[\d]{3}[.-]?[\d]{4}$

    It's not that bad at all if you invest a little time into it and just lock it down
  • 0
    Let me validate your email
    (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
  • 0
    @jak645 you don't validate emails.

    Simple solution.

    Email validation is wrong. No matter how hard you try.
  • 0
  • 0
    Accepte only digit and formats it for the user XD
Add Comment