Friday, January 29, 2010

Unicode Encoding Problems with Java- Mysql-Tomcat web page and solution

There are many things to take care if we want to correctly display utf-8 characters from mysql properly. Also if we wish to do internationalization based upon database, the following steps serves as a good place to start with.

Thing to take care :

first :
while entering utf-8 data into the table from shell be careful to set names as follows :
By setting name as utf8 we are telling mysql to use the given charset. We need to do it every time we log into the server. If not done mysql regards whatever entered data to the table in the default charset format, which will not make your program display characters properly.

set names utf8 collate utf8_general_ci

Also we can use this command mysql > default-character-set=utf8;

Check your mysql server encodings by this command:

show variables like 'character\_set\_%';
Make sure you have similar response.

Variable_name | Value |

+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8


If not edit /etc/my.cnf to make the encodings to utf8 as shown in the above table. Although its not necessary to change all the fields to utf8. I have the following configuration and its working perfectly fine. I am facing no problem of encodings provided i fulfill all other steps carefully.

Variable_name | Value |
+--------------------------+--------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------+


second : create the database with character set utf8, create table with charset utf8 also make the columns of table utf8 . we need to do this to make sure we get characters correctly.


Third : set the page charset and default encoding to utf8 if we are displaying table data in a web page.


<%@ page pageEncoding="UTF-8" %>



Adding these lines instructs the browser to use the utf8 encoding for the page.
By performing step three we will be able to see hard coded utf8 characters in the page properly but this might not work in case of database backed utf8 encoded data. For it to work we need to follow step first and second properly.


This also might not work in case the reponse is coming from server. So we have to explicitely tell the server to use utf8 as encoding for our pages, So we need to go to step fourth.

Fourth : We may also have to write a custom filter to encode the response sent by the server as utf-8. Define a filter and set the responseEncoding to utf-8


Write a ResponseFilter class implementing Filter. and in the doFilter() method set following to tell the server explicitely that you want utf-8 encoded response.


response.setCharacterEncoding("utf-8");
request.setCharacterEncoding("utf-8");
chain.doFilter(request,response);



This should display any utf8 characters properly in any web page or console.


Step Five : Well this is not necessary if we performed all the above steps above, at least not necessary for latest Mysql connectors. But for older versions of the driver we need to add the following to the connection url

jdbc:mysql://localhost/table-test?useUnicode=yes&characterEncoding=UTF-8


Note : - Have to set charset utf8 before entering any data into table from the mysql prompt.

Must Read article :
unicode-how-to-get-characters-right


If problems persists ! feel free to post your problem in this blog. I shall try to solve the issue. Well it worked for me in case of Japanese, Arabian, DevNagari characters and others.