Converting Japanese Character Sets

A client of mine – Kyoko Inatome – who writes both in English and Japanese, had an issue with the Japanese characters (letters). She had saved the database language differently earlier. Now things no longer worked as it should and all the Japanese had become gibberish. The Japanese Character Sets needed to be converted to be displayed properly.

This is a WordPress site that has been around for a long time. At the time all data was stored as Latin and only partly converted to UTF8. And what was converted or added was not done successfully as the database character set and collation had not been adjusted and nor had the necessary table columns been adjusted.

One Step Conversion

What I tried first was a one step conversion of the post content and post titles. This I did using the following commands. I ran these from PHPMyAdmin:

ALTER DATABASE database_name CHARACTER SET utf8;
ALTER TABLE wp_posts CHARACTER SET utf8

The two queries above were to convert the database to UTF8 and to set the character set of the table with the main issues. Then I ran:

alter table wp_posts change post_content post_content LONGTEXT CHARACTER SET utf8;

This did not work at all.

Three Step Conversion

Then I read the article on the possible need of a three way conversion and the reasons behind this. I quote:

This is necessary when the charset of the table/column does not match the charset of the data that was being saved to it. Specifically it is necessary when the WordPress site was sending data in latin1 format (because DB_CHARSET was not set to UTF8) but the database was in fact formatted as UTF8. This seems crazy but it can happen because of mysterious compatibility layers in MySQL/MariaDB.

So I ran the three way conversion on the following table columsn

 alter table wp_posts change post_content post_content LONGTEXT CHARACTER SET latin1;
 alter table wp_posts change post_content post_content LONGBLOB;
 alter table wp_posts change post_content post_content LONGTEXT CHARACTER SET utf8;

alter table wp_posts change post_title post_title LONGTEXT CHARACTER SET latin1;
 alter table wp_posts change post_title post_title LONGBLOB;
 alter table wp_posts change post_title post_title LONGTEXT CHARACTER SET utf8;

alter table wp_terms change name name LONGTEXT CHARACTER SET latin1;
 alter table wp_terms change name name LONGBLOB;
 alter table wp_terms change name name LONGTEXT CHARACTER SET utf8;

Database Character Set

As they stated the three way conversion was probably needed as the database character set was not set properly. It wasn’t as I saw that the data in the database was still gibberish while it now looked fine on the site. So I added”

define( 'DB_CHARSET', 'utf8' );
define( 'DB_COLLATE', '' );

to wp-config.php .

Issues with Latin Characters

But running it got some Latin Chars mangled. These I fixed using the WordPress WYSIWYG Editor. This I could do as there was not too much text to go round and most of it was fine.

Japanese Letters Gone and Back Again!

Then when I checked all the Japanese categories and posts as well as post titles I saw they were all being displayed as gibberish again. Oh my goodness what now? then I realized reading it all I just had to repeat the three way conversion now that the database char set was all good.

So I ran the above variation three again going from latin to blob to utf8.

After that the Japanese in the database was fine:

phpmyadmin - Japanese All OK

as well as it was on the website:

Japanese on the site

Yay!

Tagged in : Tagged in : , , ,
Jasper Frumau

Jasper has been working with web frameworks and applications such as Laravel, Magento and his favorite CMS WordPress including Roots Trellis and Sage for more than a decade. He helps customers with web design and online marketing. Services provided are web design, ecommerce, SEO, content marketing. When Jasper is not coding, marketing a website, reading about the web or dreaming the internet of things he plays with his son, travels or run a few blocks.