string vs wstring byte size vs double byte size Pain! Agony even!

Ranter

galena

7405

Comments

2

lorentz

15308

2y

C++'s string handling is comically underdesigned for the rest of the language
4

happygimp0

935

2y

Just use UTF-8.
0

CoreFusionX

3498

2y

@happygimp0

Windows doesn't speak UTF-8. You gotta face the pain one way or another.
1

lorentz

15308

2y

@CoreFusionX UTF-16 or conversion on I/O depending on whether your program is more about throughput or retention. Either way, statically sized chars are universally a bad idea.
0

galena

7405

2y

@happygimp0 I really wish! I literally have to use a function block to translate utf-8 url's, from requests, to this garbage.
1

galena

7405

2y

Also note that CONCAT only works with strings and not wstrings. And brsstrcat ignores \0 and just appends the entire thing causing the PLC to crash.
2

CoreFusionX

3498

2y

@Nanos

Problem IMO comes from incomplete understanding of the whole thing. Unicode is not an encoding. (Mathematically it is, I know, but bear with me).

UTF-8, 16, 32, etc, are different encodings for the Unicode codespace.

Like @lorentz said, it all stems from the ingrained belief that strings are sequences of fixed size chars.

Windows is just doubly painful because when dealing with ANSI, it ain't really ANSI, cuz they use the extended ASCII part for their code page shenanigans.

But when dealing with what they call "Unicode", it ain't really Unicode (makes no sense, it's not an encoding), nor UTF16 (because it does not handle things like surrogate pairs which are standard in UTF16.)

It just globs many of their infamous code pages in the high 9 bits and calls it UTF16.

So yeah. Windows is a fucking pain no matter what you do.

If you don't need to send those out or read them from external sources, just abuse the _T macro and TCHAR type every fucking where.
0

CoreFusionX

3498

2y

@Nanos

In UTF-8 *encoding*, yes, they can be.

In correct implementations of UTF16, they can only be 2 or 4 bytes (surrogate pairs)
0

happygimp0

935

2y

@CoreFusionX Isn't the Windows encoding called UCS2?
0

CoreFusionX

3498

2y

@happygimp0

Windows used UCS2 up until windows 2000, when it switched to UTF16.

Still, UTF16 is a superset of UCS2 so no problems there.

However I've seen several inconsistencies with windows' advertised support of UTF16 over the years, though I admit they are always related to obscure cases you don't find frequently.

Add Comment

string vs wstring byte size vs double byte size Pain! Agony even!

rant

string vs wstring
byte size vs double byte size

Pain! Agony even!