4
galena
1y

string vs wstring
byte size vs double byte size

Pain! Agony even!

Comments
  • 2
    C++'s string handling is comically underdesigned for the rest of the language
  • 4
    Just use UTF-8.
  • 0
    @happygimp0

    Windows doesn't speak UTF-8. You gotta face the pain one way or another.
  • 1
    @CoreFusionX UTF-16 or conversion on I/O depending on whether your program is more about throughput or retention. Either way, statically sized chars are universally a bad idea.
  • 0
    @happygimp0 I really wish! I literally have to use a function block to translate utf-8 url's, from requests, to this garbage.
  • 1
    Also note that CONCAT only works with strings and not wstrings. And brsstrcat ignores \0 and just appends the entire thing causing the PLC to crash.
  • 2
    @Nanos

    Problem IMO comes from incomplete understanding of the whole thing. Unicode is not an encoding. (Mathematically it is, I know, but bear with me).

    UTF-8, 16, 32, etc, are different encodings for the Unicode codespace.

    Like @lorentz said, it all stems from the ingrained belief that strings are sequences of fixed size chars.

    Windows is just doubly painful because when dealing with ANSI, it ain't really ANSI, cuz they use the extended ASCII part for their code page shenanigans.

    But when dealing with what they call "Unicode", it ain't really Unicode (makes no sense, it's not an encoding), nor UTF16 (because it does not handle things like surrogate pairs which are standard in UTF16.)

    It just globs many of their infamous code pages in the high 9 bits and calls it UTF16.

    So yeah. Windows is a fucking pain no matter what you do.

    If you don't need to send those out or read them from external sources, just abuse the _T macro and TCHAR type every fucking where.
  • 0
    @Nanos

    In UTF-8 *encoding*, yes, they can be.

    In correct implementations of UTF16, they can only be 2 or 4 bytes (surrogate pairs)
  • 0
    @CoreFusionX Isn't the Windows encoding called UCS2?
  • 0
    @happygimp0

    Windows used UCS2 up until windows 2000, when it switched to UTF16.

    Still, UTF16 is a superset of UCS2 so no problems there.

    However I've seen several inconsistencies with windows' advertised support of UTF16 over the years, though I admit they are always related to obscure cases you don't find frequently.
Add Comment