0
Wolle
1y

So I was thinking of using Hexatrigesimal strings (base 36 numbers) for indexing like in a database. It could be very short indexes for long numbers, and still be completely ordered. However it doesn't seem to be supported a lot in programming languages. Does anyone know why it could be a bad idea?

Comments
  • 1
    Ordering is important, thats why UUIDs are usually not the best idea for database.

    UUID v7 is the exemption, as it is timestamp based.

    But Base36 will end up with collisions.

    Its too compact.
  • 2
    Why not 64bit integers? With strings you'll end up using more memory just to store the same information as the integers
  • 2
    @IntrusionCM if ordering is important in your use case, and you implement it by _ids_ - then you just stink at your job.

    an ID should be treated as an ID and not as a string or number.
  • 1
    @IntrusionCM what? Different bases don‘t have different ranges for values. A base100 number won‘t have more collisions than a base2 number.
    I think you confuse it with bits.
  • 0
    @IntrusionCM You con't have collisions. Hexatrigesimal is not a UUID it is a number system with a base of 36 digits instead of 10. Every number is has exactly one higher and one lower neighbor and can be translated into exactly one number in a different base system.
  • 0
    @iiii this probably seems to be the answer. My knowledge on how the numbers are actually stored on disk is somewhat flaky in this matter.
  • 1
    @Wolle

    My bad.

    Brain mixed some stuff up.

    Reddit for example uses it.

    I thought you meant conversion of arbitrary strings to a base 36 representation. Brain made oops.

    @tosensei

    For databases, this is as wrong as it can get.

    For a small database or ID only access maybe, but as soon as you start with things like partitioning or having large tables, you definitely want the ordering to play a role.

    Can make the decisive difference between a database stuck and a database running smooth.
  • 2
    @IntrusionCM on the contrary. for small databases, you _might_ get away with it, but as soon as you go distributed, ordered IDs flies out the window.

    ordering is important, yes. but _ordering by IDs_ is like taking the square root of your social security number - you can do it, but it's idiotic.
  • 2
    @tosensei i think you misunderstood something here and I didn't pick up on it early on.

    When a database is *implicitly* ordered by time by its primary key, it has a lot of benefits.

    I'm not talking about query execution here.

    Hence the comment regarding partioning - which plays a role in query execution sure, but is more a necessity for organizing larger tables to prevent locking.

    When you have *implicitly* a time based ordering, inserting and querying data can be made more efficient.

    Chance for gap locking / row locking is like zero.

    Again. I'm not talking about querying itself, rather how choosing a ID with a natural ordering is relevant for storage. Storage then of course influences performance for queries / indexes / modifying operations.
  • 2
    @IntrusionCM sorry, the way to order by time is to use a motherflipping timestamp.

    and again: implicitly ordering by a timed identifier completely falls out the windows the very moment you enter the realm of distributed database servers.

    unless you want _every single insert_ to synchronise its ID generation with _every single node_ of the cluster. (hint: it's slow.)
Add Comment