I am parsing rss news feeds from over 10 different languages.
All the parsing is being done in java and data is stored in MySQL before my API's written in php are responding to the clients.
I constantly come across garbage characters when I read the data.
What have I tried :
- I have configured my MySQL to store utf-8 data. My db,table and even the column have UTF8 as their default charset.
- While connecting my db,I set the character set results as utf-8
When I run the jar file manually to insert the data,the character's appear fine. But when I set a cronjob for the same jar file,I start facing the problem all over again.
In English,I particularly face problems like this and in other vernacular languages,the character appear to be totally garbish and I cant even recongnize a single character.
Is there anything that I am missing?
Sample garbage characters :
Gujarati :"રેલવે મà«àª¸àª¾àª«àª°à«€àª®àª¾àª‚ સામાન ચોરી થશે તો મળશે વળતર!"
Malyalam : "നേപàµà´ªà´¾à´³à´¿à´²àµ‡à´•àµà´•àµà´³àµà´³ കോളàµâ€ നിരകàµà´•ൠകàµà´±à´šàµà´šàµ"
English : Bank Board Bureau’s ambit to widen to financial sector PSUs