2
Dunky13
8y

I had to do a modular deduplication project that could read, parse and clean up the data.
The data? Personal information: Name, Surname, phone, address and more.
Imagine the zip code in any of the following formats: ####AA, #### AA. Names with and without dashes. Address with(out) spaces, dashes, underscores etc. as well as typos! Now clean it up, and dedup.

But what files have priority over another? What data is newer? How to process address changes?

Deadline: 2 moths, impossible deadline for a (at the time - 4 years ago - rookie developer)
Anyway, night before the deadline, code was running somewhat (Java) and was able to get a Regexed address cleanup of about 70 - 80%.

My boss comes in to check the progress, sits me down next to him and says: Not good enough, let's do it together tonight, it was 4pm, day normally ends at 5pm.

No thank you, I can't do that. if you don't want this code, then I can't meet your deadline.
bye

Comments
Add Comment