4

I want to read the contents of a file and save it into a variable. Normally I would do something like:

text = File.read(filepath)

Unfortunately there's a file I'm working with that is encoded with UTF-16LE. I've been doing some research and it looks like I need to use File.Open instead and define the encoding. I read a suggestion somewhere that said to open the file and read in the data line by line:

text = File.open(filepath,"rb:UTF-16LE") { |file| file.lines }

However if I run:

puts text

I get:

#<Enumerator:0x23f76a8>

How can I read in the content of the UTF-16LE file into a variable?

Note: I am using Ruby 1.9.3 and a Windows OS

Stew C
  • 697
  • 3
  • 10
  • 24
  • 2
    I doubt that you’re using Ruby 2.7, unless you’ve come from the future. 2.1.2 is the current version. – matt Jul 16 '14 at 18:11
  • lol oops it's 1.9.3 my dyslexia grabbed the 2.7 from the python 2.7 folder in the same directory – Stew C Jul 17 '14 at 17:00

2 Answers2

7

The lines method is deprecated. If you expect text to be an array with lines, then use readlines.

text = File.open(filepath,"rb:UTF-16LE"){ |file| file.readlines }

As the Tin Man says, it's better practise to process each line seperately, if possible:

File.open("test.csv", "rb:UTF-16LE") do |file|
  file.each do |line|
    p line
  end
end
steenslag
  • 79,051
  • 16
  • 138
  • 171
2

First, don't make it a practice to read a file directly into a variable unless you absolutely have to. That's called "slurping", and is not scalable. Instead, read it line by line.

Ruby's IO class, which File inherits from, supports a parameter they call open_args, which is a hash, on the majority of "read" type calls. For example, here are some method signatures:

read(name, [length [, offset]], open_args)
readlines(name, sep=$/ [, open_args])

The documentation says this about open_args:

If the last argument is a hash, it specifies option for internal open().  The
key would be the following.  open_args: is exclusive to others.

encoding:
  string or encoding

  specifies encoding of the read string.  encoding will be ignored if length
  is specified.


mode:
  string

  specifies mode argument for open().  It should start with "r" otherwise it
  will cause an error.


open_args:
  array of strings


  specifies arguments for open() as an array.
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • Here's what I came up with after reading the documentation. 'IO.read(filename,mode: 'rb', encoding: 'UTF-16LE')' Do this look right? My goal is to do a gsub on the lines I want to change in the file. – Stew C Jul 16 '14 at 18:22