I use simpleXML to process xml file. It has Cyrillic characters. I also use dom_import_simplexml
, importNode
and appendChild
to copy trees from file to file and place to place.
At the end of processing I do print_r
of resulting simpleXmlElement
and everything is ok. But I also do asXml('outputfile.xml')
and something strange is going on: all cyrillic characters that was not wrapped with CDATA
(some tags bodies and all attributes) change to their unicode code.
For example, the output of print_r
(just a fragment):
SimpleXMLElement Object ( [@attributes] => Array
( [NAME] => Государственный аппарат и механизм
[COSTYES] => 3.89983579639 [COSTNO] => 0
[ID] => 9 )
[COMMENTYES] => Вы совершенно правы.
[COMMENTNO] => Нет, Вы ошиблись. ) ) )
But in file that asXml
generates, i get something like this:
<QUEST NAME="Теория#x434;вухмечей"
style="educ" ID="1">
<DESC><![CDATA[Теория происхождения государства, известная как теория "двух мечей" [2, с.40],
представляет из себя...
]]></DESC>`
I set utf-8 locale everywhere it's possible, googled every combination of words "simplexml, unicode, cyrillic, asXml, etc" but nothing worked.
UPD Looks like some function used does htmlentities()
. So, thanks to voitcus, the solution is to use html_entity_decode()
as adviced here.