encoding

Ranter

Kaji

2073

Comments

2

BobbyTables

4917

5y

I've spent way too much of my life fighting with MySQL character encodings.

Unfortunately I dont have any answers just want to let you know you are not alone.
2

AlmondSauce

15618

5y

Have you definitely checked you have the correct encoding on all tables and columns, not just globally?

Had a similar issue a while back, and turns out we still had one sneaky column that was stuck on something non-unicode.
1

Kaji

2073

5y

@AlmondSauce All of the tables and columns were set to utf8mb4 from the outset, and I double-checked them as I updated the collations after changing that globally (was originally on utf8mb4_general_ci). Another check just now indicates it's set at the table/column level as well.
1

IntrusionCM

13947

5y

@Kaji Have you looked up the char?

U+24B6F is it if I'm not mistaken...

That's part of CJK Unified Ideographs Extension B... What's the servers mysql version?
0

Kaji

2073

5y

@IntrusionCM Yeah, the 𠮟 character in particular seems to account for half the issues.

Dev server is running MariaDB 10.4.11 (which came with the latest version of XAMPP); the server I'm hosting things on is using MySQL 5.5.65.

Interestingly, I went ahead and did a hack where it base64 encoded the information and flagged it if it got this error, and I was able to fix it manually on the server directly using Sequel Pro. The SQL dump to send things over to the remote server converted the offending characters to question marks, but it's letting me fix them there as well without any issues.

Based on that, I'm almost wondering if it's just PHP's MySQLi having issues with it, perhaps?
1

IntrusionCM

13947

5y

@Kaji when you tested against mariadb 10.4 it's definitely Unicode 3.1 support.

Did you set the correct client encoding?

@highlight
$mysqli = new mysqli(...)
$mysqli->set_charset('utf8mb4');

Or (DSN)
charset=utf8mb4

To verify, I'd dump the Session variables straight from that connection, eg.:

@highlight
SHOW SESSION VARIABLES LIKE 'character\_set\_%';

SHOW SESSION VARIABLES LIKE 'collation\_%';

Do not use SET NAMEs... Or stuff like that.
0

Kaji

2073

5y

@IntrusionCM Just checked in my config.php file for the project and saw the following:

$db->set_charset("utf8");

Think you may have just hit on the missing link there. The variable dumps in the top of the post were gotten using the following:

SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
2

IntrusionCM

13947

5y

@Kaji would be nice. It's pretty common to forget this....

Happens all the time xD
0

Kaji

2073

5y

@IntrusionCM Just did a check to be certain, and yeah, it was the set_charset() call in the PHP code. Changed it to $db->set_charset("utf8mb4"); and then ran a query inserting the string '𠮟る叱る' (which should definitely trip things, based on past tests) and it entered it without issue.

Thanks to everyone for the assistance sorting this one out!
1

IntrusionCM

13947

5y

@Kaji Happy to hear that. :)

Related Rants

Add Comment

question

mysql

unicode