Why does mysql use Latin1_swedish_ci by default?
And why don't they switch to utf-8?

  • 12
    Because MySQL (the company) was Swedish.

    Tech debt from the 90's 😅
  • 3
    One of the many quirks why most devs prefer to work with https://www.postgresql.org/
    PostgreSQL if it has to sql.
  • 1
    @C0D4 Yeah, but what stops them from just switching? Isn't unicode a superset of Latin1? Does the DB behave differently for different locales?
  • 4
    Latin1 is an ANSI Codepage.
    ANSI is always single byte, where as Unicode is either Multibytes or wide encoded.
    I'd guess that some of the internals rely on single byte encoding.
    Also, depending on what Codepage we're talking, the upper half of ANSI (>=128) is often Multibyte encoded, in utf-8.
  • 2
    Hey, they made Caramelldansen, so they have every rights to do that.
  • 1
    @Lor-inc In a word - compatibility stops them from switching, despite the fact UTF 8 would be a better choice these days for nearly everyone.
  • 3

    There are many things wrong in your statement.

    MySQL uses by default UTF8MB4 in newer versions.

    What your _distribution_ set as a default configuration has nothing to do with MySQL per se.

    MySQL default configuration became better since 5.6 .

    And it's _again_ amazing what people have in their minds.

    What distribution are you using and what version of MySQL?

    The original reason was sentimental at first, but InnoDB as a storage engine needed to mature later.

    It was problematic till Barracuda file format and large_prefix support, as in the previous Antelope file format and without large prefix the index byte length was at a max of 768 byte.

    With MB3, the limited charset of UTF8, this would allow a max. of an 256 char string.

    With MB4, 192 chars.

    Note that the limit applies for _all_ indices seperately.

    It was "fixed" or better became a default in MariaDB 10.2 / 5.7, to be removed later.

    But you could configure it all they way down to MySQL 5.5.

    Which is a long long long time ago.

    Please. Stop spreading nonsense.
  • 2
    because god forbid any component of a standard php website stack didn't have all those tiny bits of needless obtuseness XD
  • 2
    They have switched to utf8mb4 a long time ago.
Add Comment