4

Every once in a while I come across a challenge that's actually challenging. Most recently ... "Develop Regex for validating and extracting a recipe's ingredient's quantity"

Regex should properly identify the numbers in each of the following lines:

1 cup of ingredient
Diced 1/2 cup of ingredient
.5 tsp of ingredient
1 1/2 packed cup of ingredient
1.5 cup of Heavy whipping cream

My answer is the first comment in case you want to solve it yourself. I'd love to know what others come up with.

Comments
  • 0
    (\d?\s?\d?[\/.]?\d)
  • 3
    Just identifying the numbers isn't enough when the goal is to convert the obscure ingredient amounts into SI unit amounts. You definitely need to get the unit (cup/tsp/packed cup...) too.

    Also "whipping cream" sounds pretty kinky to me...
  • 0
    @Oktokolo step two is the measurements. It seems easier to break them into separate searches rather than try to glean them all out in one massive and brittle regex.
  • 1
    @devphobe I would actually go for one relatively strict regexp per ingredient line variant. That way, you can know for sure, whether there is a variant, you didn't take into account. Instead of getting incomplete data, the parser will fail.

    N unit of ingredient => ^(\d+|\d*\.\d+)\s(unit1|unit2|packed\sunit1)\sof\s
  • 1
    You need the unit - otherwise the recipe is useless imho.

    A unit is well defined usually and should be matcheable via a word boundary.

    What one could do is using regex groups and the alternation seperator, e.g.:

    \b(?P<unit>cup|tsp|g)\b

    ?P is python for named pattern group.

    Unit could be imploded from e.g. an array - after all, the unit should be well defined and matcheable.

    Now to the numbers:

    This is a bit meh.

    \s is a bit dangerous - it may involve newline, tabs and return characters.

    Better is to directly reference the whitespace - it prevents funky stuff.

    Rest is lazy matching to make sure we don't play pacman and eat too much.

    https://regex101.com/r/E3FCw1/1

    (Updated to include remaining examples)
Add Comment