6
GTom
6y

How do you approach generating "random" unique numbers/strings ? Exactly, when you have to be sure the generated stuff is unique overtime? Eg. as few collisions in future as possible.

Now I don't mean UUIDs but when there is a functionality that needs some length defined, symbol specific and definitely unique data, every time it does it's stuff.

TLDR STORY: Generating 8 digits long numbers so they are (deterministically - wink wink) unique is hard but Format Preserving Encryption saves the day. (for me)

FULL STORY:
I had to deal with both strings and codes today.
One was to generate shortlink word for url, luckily found a library that does exactly this. (Hashids)
BUT generating 8 digits long, somewhat random number was harder then I thought, found out on SO something like "sha256(seed) => bytes => ascii/numbers mangling" but that had a lot of collisions because of how the hash got mangled to actually output numbers and also to fit the length.
After some hours I stumbled upon Format Preserving encryption (pyffx) and man it did what I wanted and it had max 2 collisions in 100k values. Still the solution with this feels hacky af. (encrypting straddled unix timestamp with lots of decimals)

Comments
  • 2
    Umm.. But you don't need distributed generation, do you? I think a db sequence would be a nice way to get unique values for your formatted string seed [instead of the timestamp]
  • 2
    @netikras
    Friiick, I forgot the KISS again.
    You are right.

    EDIT: After reading your comment, I went from red to green before I fully facepalmed myself, I don't know why but I HAVE HAD tried seeding it with single n++ number which obviously worked but I was in this illusion that I HAVE TO use timestamp somehow.
  • 2
    int randnum = 6; // random number from a dice
  • 1
    So bssically you are doing tiny url.
    I feel like you may get lot of collisions.
    You need re design your architecture for tiny url. Its not that simple at app level to generate tiny url.
  • 0
    @zotigapo why do you think it's not easy? There's almost nothing hard that I can see about it.
  • 0
    If you are trying to make a URL shortener, look into base62 conversion. At least that's what I used for mine.

    Edit: After 100k records it uses 3 characters as an ID
Add Comment