SQLSTATE[HY000]: General error: 1366 Incorrect string value: ‘\xF0\x9F\x98\x8A\xF0\x9F…’ 解决

Reason
The culprit is the emoji in the tweets. Some of the emojis have four bytes UTF-8 character.

The MYSQL's character set named UTF8 uses a maximum of three bytes per character and contains only BMP characters. As of MySQL 5.5.3, the UTF8MB4 character set uses a maximum of four bytes per character supports. http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html

Solution proposed

Solution 1
Strip down all string that need more than 3 bytes character

Solution 2
Use MySQL 5.5 or later and change the column encoding from utf8 to utf8mb4. This encoding allows storage of characters that occupy 4 bytes in UTF-8.

Solution 1(http://stackoverflow.com/questions/8491431/remove-4-byte-characters-from-a-utf-8-string):

Since 4-byte UTF-8 sequences always start with the bytes 0xF0-0xF7, the following should work:

$str = preg_replace('/[\xF0-\xF7][\x00-\xFF]{3}/s', '', $str);

Alternatively, you could use preg_replace in UTF-8 mode but this will probably be slower:

$str = preg_replace('/[\x{10000}-\x{10FFFF}]/u', '', $str);

This works because 4-byte UTF-8 sequences are used for code points in the supplementary Unicode planes starting from 0x10000.

发表评论

电子邮件地址不会被公开。

*