tag:blogger.com,1999:blog-3993498847203183398.post7515552238325003490..comments2024-03-28T09:19:27.451+00:00Comments on RevK<sup>®</sup>'s ramblings: 💩RevKhttp://www.blogger.com/profile/12369263214193333422noreply@blogger.comBlogger9125tag:blogger.com,1999:blog-3993498847203183398.post-64337953114120929952014-07-02T18:10:28.837+01:002014-07-02T18:10:28.837+01:00Those aren't ORCs, the Object Replacement Char...Those aren't ORCs, the Object Replacement Character looks like a dotted box, it's code is U+FFFC and means roughly "A thing was supposed to be here, but it couldn't be represented as text". What you're seeing (or at least, should be seeing) is U+FFFD Replacement Character which appears as a diamond with an inverse question mark symbol and means roughly "A character was supposed to be here, but some sort of error occurred". Unicode specifies that when something goes wrong while processing Unicode data and real error handling (e.g. throwing a Java Exception) is not possible each code unit causing an error should be replaced by U+FFFD instead. This prevents many text processing bugs from becoming security bugs instead.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-3993498847203183398.post-49288361793816327362014-06-15T15:21:13.556+01:002014-06-15T15:21:13.556+01:00I don't know about TTRSS, but I can help with ...I don't know about TTRSS, but I can help with MySQL... UTF-8 is composed of 17 planes. The first plane contains most of the characters for existing languages, so MySQL (and I suspect quite a few other programs) has implemented only the first plane and calls that UTF-8, which is wrong wrong wrong. OK, the 16 remaining plane contain mostly dead language stuff, but that's also where Pile of Poo is, and who can live without that?<br /><br />What you want to use is what MySQL calls 4 byte UTF-8 (utf8mb4), which is really the bona fide UTF-8 with a fancy name. To activate that across the board when I first installed MariadDB (it should be the same with MySQL), I created a .cnf file in mymysql/conf.d/ folder, containing the following:<br /><br />[mysqld]<br />character-set-client-handshake = FALSE<br />character-set-server = utf8mb4<br />collation-server = utf8mb4_unicode_ci<br /><br />I suspect you'll have to look into converting your current tables before using the above, but at least that should give you a good starting point.P Dothttps://www.blogger.com/profile/05057181794474851375noreply@blogger.comtag:blogger.com,1999:blog-3993498847203183398.post-67758564968711323072014-06-15T11:08:08.385+01:002014-06-15T11:08:08.385+01:00It arrived in my feed reader correctly encoded...
...It arrived in my feed reader correctly encoded...<br /><br />I suspect what has happened is somehow you've gotten two surrogates UTF-8 encoded in your post. Somewhere along Blogger's E-Mail chain, and somewhere along the route to my feed reader, some software has converted these to UTF-16 using a non-validating parser. At this point, the surrogates have correctly gotten shoved together in UTF-16. When they came back out, well, they came back out as valid UTF-8.<br /><br />This could quite easily happen if there was, say, some Python or Java in between the two.Owen Shepherdhttps://www.blogger.com/profile/00571493467544526223noreply@blogger.comtag:blogger.com,1999:blog-3993498847203183398.post-19244727246422970592014-06-15T08:49:25.145+01:002014-06-15T08:49:25.145+01:00Aha! Well this post spectacularly killed my RSS re...Aha! Well this post spectacularly killed my RSS reader. ttrss failed to insert the record into MySQL so that gives me something to look into!batfastadhttps://www.blogger.com/profile/13727627380156105031noreply@blogger.comtag:blogger.com,1999:blog-3993498847203183398.post-14750970674657314712014-06-14T17:57:07.935+01:002014-06-14T17:57:07.935+01:00Well yes, but it is it that generated the surrogat...Well yes, but it is it that generated the surrogates - I posted a UTF-8 character. Annoying.RevKhttps://www.blogger.com/profile/12369263214193333422noreply@blogger.comtag:blogger.com,1999:blog-3993498847203183398.post-88047360113590995992014-06-14T14:41:28.660+01:002014-06-14T14:41:28.660+01:00Upon downloading the page with wget and then looki...Upon downloading the page with wget and then looking at a hexdump, it's actually serving up & # 55357; & # 56489; without the spaces, so there's no browser funkiness going on here.<br /><br />You would think if Blogger's going to go to the effort of replacing Unicode characters with escapes, it would be smart enough to recognise surrogates too!Keijihttps://www.blogger.com/profile/11073037482259360139noreply@blogger.comtag:blogger.com,1999:blog-3993498847203183398.post-13627043188328428062014-06-14T09:40:34.064+01:002014-06-14T09:40:34.064+01:00The email sent to me to approve the post had prope...The email sent to me to approve the post had proper utf-8 pile-of-poo characters, but looks like blogger is being rather odd on this on the web page even for comments. Strange.RevKhttps://www.blogger.com/profile/12369263214193333422noreply@blogger.comtag:blogger.com,1999:blog-3993498847203183398.post-43705916300749651362014-06-14T09:08:06.502+01:002014-06-14T09:08:06.502+01:00Yes, the post title seems to be two surrogate char...Yes, the post title seems to be two surrogate characters (which are invalid characters in UTF-8, the page encoding).<br /><br />If I put a pile of poo into this comment and hit 'Preview' and then 'Edit' then Blogger gives me two surrogate characters so I suspect it might simply be broken, but let me try just posting without editing again.. 💩Anonymoushttps://www.blogger.com/profile/09770044284126887469noreply@blogger.comtag:blogger.com,1999:blog-3993498847203183398.post-65295181016475663192014-06-13T22:35:49.910+01:002014-06-13T22:35:49.910+01:00Are those surrogate characters in the title, becau...Are those surrogate characters in the title, because I see two ORCs (object replacement characters, but I love the unintentional acronym!) instead of one?<br /><br />On a side note, if 09F9 was an illegal number, is 1F4A9 a (mildly) profane number now?Keijihttps://www.blogger.com/profile/11073037482259360139noreply@blogger.com