8

I would like to save a Ruby set, or a hash, to the disk and it back from the file when necessary. How can I do this?

Wayne Conrad
  • 103,207
  • 26
  • 155
  • 191
Konstantin
  • 2,983
  • 3
  • 33
  • 55
  • 3
    If we can close an old question because it is no longer on topic, then we can re-open an old question because it the reason it was closed is no longer valid. This is a fair question that has generated good answers, so voting to re-open. – Wayne Conrad Jan 27 '17 at 16:35

2 Answers2

8

The Marshal module dumps an object in a string, which can be written to a file. Reading the file and Marshal.load'ing the string gives the original object.

Marshal.dump takes an optional parameter described as 'anIO'; in practise: a file.

h = { "hi" => 1}
# dumping:
File.open("test.marshal", "w"){|to_file| Marshal.dump(h, to_file)}
# retrieving:
p File.open("test.marshal", "r"){|from_file| Marshal.load(from_file)} #=> {"hi"=>1}

Caveats

The file format that Marshal.dump creates can change with different Ruby versions or implementations. If you need to create a file that any version of Ruby can read, then prefer something like YAML or JSON over Marshal.dump.

There are certain objects that cannot be serialized (that is, cannot be dumped using Marhsal.dump). Lambdas and procs are among those objects, so if your hash has its default_proc attribute set, then you will not be able to marshal that hash. To work around this, you can set default_proc to nil before saving (e.g. my_hash.default_proc = nil), but you will need to reset default_proc to its correct value after loading the hash.

Wayne Conrad
  • 103,207
  • 26
  • 155
  • 191
steenslag
  • 79,051
  • 16
  • 138
  • 171
8

A set is basically a hash with the value for each key/value pair set to the same values. It's the key that makes it behave like a set.

Once we know that, saving and restoring a Set is just like saving an Array or a Hash, and can be done in a number of ways.

@steenslag mentions using Marshall.dump, which is good.

Because a Set is a variant of a Hash, you can also use YAML or JSON to serialize the data. The big advantage to either is that they are easy to reuse in other languages. YAML and JSON are commonly used to store and transfer data between hosts and are very readable formats.

Here are some examples to give you ideas:

require 'set'
require 'json'
require 'yaml'

foo = Set.new
foo << 1
foo << 2
foo << 1
foo # => #<Set: {1, 2}>

foo is a Set. It can also be converted to an Array:

foo.to_a # => [1, 2]

We can use YAML to serialize the Set:

puts YAML.dump(foo)
# >> --- !ruby/object:Set
# >> hash:
# >>   1: true
# >>   2: true

And, we can create the serialized version, then parse it again back into a Set:

YAML.load(YAML.dump(foo)) # => #<Set: {1, 2}>

Because foo is a Set, we can also convert it to an Array, and then use YAML to serialize that:

puts YAML.dump(foo.to_a)
# >> ---
# >> - 1
# >> - 2  

Then we can read the data back in, and, if we choose, convert it back to a Set:

bar = YAML.load(YAML.dump(foo.to_a)).to_set
bar.class # => Set
bar # => #<Set: {1, 2}>

Or, if it's a language that is reading the YAML, that doesn't support Set, like Perl, it can be left as an Array when the Perl code reads and parsed the data.

JSON works similarly. Here's an example of a round-trip of the data via Array:

foo.to_a.to_json # => "[1,2]"
JSON.parse(foo.to_a.to_json) # => [1, 2]
JSON.parse(foo.to_a.to_json).to_set # => #<Set: {1, 2}>

JSON also has the [] method, which is smart enough to figure out whether the parameter passed in is a String or an Array or Hash. If it's the first the parameter is parsed and returned as a Ruby object. If it's the later two, it's serialized and turned into a string:

JSON[foo.to_a] # => "[1,2]"
JSON[JSON[foo.to_a]] # => [1, 2]
JSON[JSON[foo.to_a]].to_set # => #<Set: {1, 2}>

In either case of using JSON or YAML, reading and writing the resulting string is easily done using several different IO or File methods. Look at File.write and File.read, both inherited from IO, and, if you decide to work with YAML, look at YAML.load_file, which is inherited from Psych.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303