4

I'm using

Dir.entries("myFolder")

to get all the filenames. The problem is that instead of some special characters I get placeholders for them. The error occurs if filenames contain special characters like Č, Š and so on.

I've specified the file encoding:

#encoding: utf-8.

On linux this worked, on Windows it doesn't.

Result from the test in irb:

[".", "..", "MA\xC8KA.png", "PES.png", "VLAK.png", "\x8EOGA.png"]

It should be:

[".", "..", "MAČKA.png", "PES.png", "VLAK.png", "ŽOGA.png"]

Is there any other way of fixing this other than substituting these characters if there are any?

--------EDIT-----------

irb(main):001:0> Dir.entries("myFolder").map {|e| e.force_encoding('Windows-1250').encode('UTF-8')}

=> [".", "..", "MA\u010CKA.png", "PES.png", "VLAK.png", "\u017DOGA.png"]

irb(main):002:0> Dir.entries("myFolder").map {|e| e.force_encoding('UTF-8')}

=> [".", "..", "MA\xC8KA.png", "PES.png", "VLAK.png", "\x8EOGA.png"]

--------EDIT-----------

---------EDIT 2-------------

#encoding: utf-8
require 'green_shoes'

Shoes.app do

  button "Get sample image name" do
    @words_images = Dir.entries("myFolder").each {|word| word.gsub!(".png", "")}
    @words_images.delete(".")
    @words_images.delete("..")
    @test.append{para @words_images.sample}
  end

  @test = stack do
  end

end

---------EDIT 2-------------

Thank you.

Regards, Seba

Sebastjan Hribar
  • 396
  • 3
  • 13
  • Looks like your results are encoded in [Windows-1250](https://en.wikipedia.org/wiki/Windows-1250). You might try `name.force_encoding('Windows-1250').encode('UTF-8')`? Note that the `\xC8` "placeholder" isn't actually part of the string, there's an actual, single char `0xC8` there -- you're just seeing how `String#inspect` renders it. – Lynn Jul 30 '15 at 12:36
  • Please see the edit above. I've put both versions. – Sebastjan Hribar Jul 30 '15 at 12:47
  • Your problem is solved; e.g. `\u010C` is the Unicode codepoint for Č. They just aren't showing up in `irb` because `Array#inspect` calls `String#inspect` which shows Unicode characters as escaped on your platform. [See this paste](https://glot.io/snippets/e5tkabbche). – Lynn Jul 30 '15 at 13:08
  • In my case, the difference is in p and puts in irb. p always outputs "\u010C" while puts results in Č. – Sebastjan Hribar Jul 30 '15 at 13:17
  • Indeed! `p x` is simply defined to do the same thing as `puts x.inspect`. – Lynn Jul 30 '15 at 13:20
  • I still don't know how to solve this, especially in my Green shoes app, though. I've tried more encoding, but no luck. – Sebastjan Hribar Jul 30 '15 at 13:34
  • How are you displaying the strings in your shoes app? – Lynn Jul 30 '15 at 13:36
  • See EDIT 2; I am using the para method and appending the string. – Sebastjan Hribar Jul 30 '15 at 13:42

1 Answers1

3

I solved the issue by passing the encoding option when reading the files in the directory:

Dir.entries("myFolder", encoding: "utf-8")

Any attempt later on to change or force change the encoding failed.

A reminder to myself to read the docu more carefully...

regards, seba

Sebastjan Hribar
  • 396
  • 3
  • 13