2

I'm using Watir to fill out a text_field with the html-code that I have scraped with another program before.

The language of the website-content that I'm transfering is German, so there are some special characters involved, that don't exist in the English alphabet.

Those characters are displayed properly in the html-file, but when transfered into the text_field of the Joomla installation (I'm transfering a website to Joomla with this program), the special characters are not displayed properly.

As the result of a users great help, I have been able to solve a previous problem and am now transfering the content with the following method:

browser.text_field(:id => "text").value=(open('my-site.html') { |f| f.read })

The result was, that the special characters were shown as follows:

über => ³ber 
vergißt => vergi▀t 
wählen => wõhlen 
geförderter => gef÷rderter 

The user guessed that it had something to do with the codepage that I'm on and encoding issues. Running DOS: chcp resulted in the output of 850.

His attempt to solve the problem was the following:

require 'iconv'
browser.text_field(:id => "text").value=(
  Iconv.iconv('CP850', 'ISO-8859-1', open('my-site.html') { |f| f.read })
)

Unfortunately, this didn't solve the problem and the special characters are now shown for example as: \x81ber = über vergi\xE1t = vergißt and new lines are shown as \n

I scraped the pages with the Mechanize, using the following code:

auszug=page.search ('/html/body/table/tr/td/table/tr[2]/td/table/tr/td[4]')
outputFile<<auszug

I hope you can somehow help me, as I'm just a volunteer working here with a bit of programming experience. If I don't get this program running by next week (this encoding thing is the only thing that's really stopping me), then I'll have to manually transfer a hundred pages using copy+paste :/

Thanks for taking the time and all the effort you're putting into this! :-)

Sebastian

Andrew Grimm
  • 78,473
  • 57
  • 200
  • 338
Sebastian
  • 63
  • 2
  • "Ruby" shouldn't be all capitals: http://stackoverflow.com/questions/6053240/how-should-i-capitalize-ruby/6053314#6053314 – Andrew Grimm May 19 '11 at 12:50

1 Answers1

3

Did you try converting to UTF-8?

browser.test_field(:id => "text").value=(Iconv.conv(‘utf-8’, 'CP850', open('my-site.html') {|f| f.read})
Dave McNulla
  • 2,006
  • 16
  • 23