Another gem from my Database Fundamentals class, this time it's from the textbook:

So right now we're learning about data modeling with ERDs and the book is explaining a few things about attributes. I got to a part where the book was explaining when you should split an attribute into many (the book mixes up conceptual modelling and logical modelling). The first example the book gave was an address, splitting it up by street name, address number, city, postal code, etc. So far so good. Now we get to the second example: a phone number. The book split the the number 55 11 9784-8900 into four parts:

Country code: 55
Area code: 11
Number prefix: 9784
Number suffix: 8900

At this point I was like "WHAT?". Separating area and country codes from the rest of the number is ok, that's useful, but splitting the number itself in half? Why the fuck would you want to do that? Correct me if I'm wrong but the dash in the middle of the number is just used for "chunking", to make it easier for our brains to read the number. Why would you want to split the number in half? There's literally no reason to do it, at least not in the example the book was showing.

Every time I open this book I keep wondering why the hell my teacher chose it to be our textbook. He's a great teacher, his lectures are awesome, he explains stuff super well, but he chose this book. A book that's filled with shitty literal translations to domain-specific words and acronyms, shitty examples, and convoluted sentences.

  • 1
    i'm not familiar with that book.

    But somehow familiar with the antiquated and fundamentally wrong way of education.

    And that seems like one of these examples....

    Eg. in Germany:

    +49 - country code
    234 - city code, Bochum
    Xxxxxxx - telephone number

    The telephone system had a hierarchical database.

    Some of the examples in textbooks still stem from that era - at least that's the only _sane_ example I could come up with.

    When you look up hierarchical databases it makes more sense, since the prefix is a leaf node.

    The hierarchical database pattern was - as far as I can remember - internationally common, so it shouldn't be specific for Germany.

    If anyone has sources for/against,I'd really like to hear them.
  • 1
    @IntrusionCM I'm not sure I understand what you said, but my point is that splitting area and country codes from the rest of the number is a sensible decision, because they have a meaning by themselves, but splitting the number itself in two just because there's a dash in the middle is stupid.

    In the number 55 11 9784-8900, the number 55 has a standalone meaning, it's the country code; the number 11 has a standalone meaning, it's an area code; but 9784 has no meaning if it's not together with 8900, so why split it up?

    My whole point is that this is a terrible example to give to a newbie, nothing to do with _how_ the book explains stuff (which is awful too, but that's a whole different discussion).
  • 2
    @neeno sorry a tad drunk.

    Before rational databases were a thing (and probably before you and I were born) humans needed a system to store telephone numbers.

    Binary trees? That's a hierarchical database, but with a few specialties and restrictions.


    Look eg here.

    Many educational books took over the examples of "hierarchical databases" to relational databases... I think.

    I might be totally wrong.

    But for my brain and psyche it's more soothing than saying "they didn't know what they we're doing and called it education".

    When you look at a hierarchical structure a prefix makes sense. It's a leaf of a node.

    And the suffix is the leaf node.
  • 1
    @IntrusionCM lol np

    Oh okay, I get what you mean. But the thing is, why split the number into "prefix" and "suffix"? There's no reason to do that, even the names are misleading. A prefix is something that comes before another thing and a suffix is something that comes after another thing. Splitting the number in half and calling the first half a prefix and the second one a suffix makes no sense, they're not prefixes and suffixes to anything else, only to each other.

    In this example I'm concerned with the meaning of the separation of 9784-8900 into two. Each half of the number don't have meanings only by themselves, the dash in the middle does not have a meaning, it's just to make it pretty. 9784-8900 is the same as 97848900, but prettier.
  • 0
    However, 55 11 9784-8900 is not the same as 551197848900. The spaces between 55, 11 and 9784-8900 have meaning: they separate what's a country code, what's an area code, and what's the number itself. Thus, they should be split into separate attributes since that might be useful for some queries.

    There is a hierarchical structure here:


    But 9784-8900 is not two pieces of data, they're just a single one with a symbol in the middle for readability.
  • 1
    @neeno yes.

    You're completely right.

    In a relational database it makes no sense.

    Didn't want to confuse you.

    That's the gist: the example is bogus because it stems from an different _era_ in history.

    Many educational books used in schools are wild copy pasta slightly changed in dozen of years.

    In a relational database, you _could_ Go down the teached theorical normalization way.

    Fully functional dependency,....

    In practical.

    +49% with a LIKE on a column consisting of normalized data and a single index will be better than 3 columns (49… 234... xxxxxx ) with a compound index.

    The "practical" part in databases boils down to the fact that a database _must consist of_ normalized data - otherwise it becomes a shitfest.

    Normalization is a tool ;) Not a fixed guideline
Add Comment