0

doc = REXML::Document.new file My code is failing at this point whenever my xml file contains some special characters other than UTF-8 .

REXML::ParseException (#<REXML::ParseException: #<ArgumentError: invalid byte sequence in UTF-8>
Vinay
  • 237
  • 2
  • 8
  • 17

1 Answers1

0

You can call something like this

REXML::Document.new(file.force_encoding("FILE_ENCODING").encode("UTF-8"))

FILE_ENCODING is the encoding of your file variable.

Ermin Dedovic
  • 907
  • 4
  • 6
  • I tried but get undefined method error for file ... here is what i have done ...... file =File.open(file_name) ....... doc = REXML::Document.new(file.force_encoding("UTF-8")) ... my file is UTF-8 by default as its an XML file ..... – Vinay Jun 19 '13 at 17:53
  • The file should be text. How can you have non-utf8 characters in utf-8 file? – Ermin Dedovic Jun 19 '13 at 17:57
  • I loading the .XML files.. I think there is a copy paste step from browser while generating the XML ..... so it has some non UTF-8 characters. – Vinay Jun 19 '13 at 18:12
  • Right now am using this step : REXML::Document.new(File.open(file_name,"r:iso-8859-1:utf-8")) solves my problem ...but am not sure how it works ! Do you have any idea on this ? – Vinay Jun 19 '13 at 18:13
  • Well, this reencodes your file when opening, so you don't have problems with this. My function above does the same, but on the file contents. – Ermin Dedovic Jun 19 '13 at 18:14
  • But i am not sure why .... I tried your function gives me an error ... did I give it correctly in the first comment ?? and what is :iso-8859-1 does it take the entire XML characters even if they are special characters ?? how does it wrk ? – Vinay Jun 19 '13 at 18:29
  • data = File.read("/path/to/file"); data.force_encoding("FILE_ENCODING").encode("UTF-8"); -- I was thinking of this. – Ermin Dedovic Jun 19 '13 at 20:33
  • ok will try that too ... Thanks so much for taking out time to reply my queries :) – Vinay Jun 19 '13 at 21:41