-1

I have a CSV in the following format:

name,contacts.0.phone_no,contacts.1.phone_no,codes.0,codes.1
YK,1234,4567,AB001,AK002

As you can see, this is a nested structure. The CSV may contain multiple rows. I would like to convert this into an array of hashes like this:

[
  {
    name: 'YK',
    contacts: [
        {
            phone_no: '1234'
        },
        {
            phone_no: '4567'
        }
    ],
    codes: ['AB001', 'AK002']
  }
]

The structure uses numbers in the given format to represent arrays. There can be hashes inside arrays. Is there a simple way to do that in Ruby?

The CSV headers are dynamic. It can change. I will have to create the hash on the fly based on the CSV file.

There is a similar node library called csvtojson to do that for JavaScript.

Yedhu Krishnan
  • 1,225
  • 15
  • 31
  • 3
    Why do you use an array for the address but not for the phone numbers? That looks not DRY. – spickermann Oct 28 '19 at 13:10
  • Your CSV file always contains just one line after the headers? If it may contain two or more, do you want to return an array of hashes, one hash per line (after the first, containing the headers)? It's unusual for a CSV to have a comma followed by one space as the field separator. Is that what you want? If you want just the comma please remove the spaces. – Cary Swoveland Oct 28 '19 at 17:22
  • @CarySwoveland I have updated the question to remove the spaces after the comma. it can contain multiple rows. – Yedhu Krishnan Oct 29 '19 at 06:16

2 Answers2

1

Just read and parse it line-by-line. The arr variable in the code below will hold an array of Hash that you need

arr = []

File.readlines('README.md').drop(1).each do |line|
  fields = line.split(',').map(&:strip)

  hash = { name: fields[0], contacts: [fields[1], fields[2]], address: [fields[3], fields[4]] }
  arr.push(hash)
end
An Nguyen
  • 1,487
  • 10
  • 21
  • The CSV headers can change. I have updated the question to add that information. – Yedhu Krishnan Oct 28 '19 at 12:21
  • How dynamic it can be? Is the number of columns varied? Does the header name follow any format? – An Nguyen Oct 28 '19 at 13:31
  • Number of columns can vary. For example, if there is another contact, it would have a header `contacts.2.phone_no`. The number in the header represents the position in the array – Yedhu Krishnan Oct 28 '19 at 13:41
  • It can happen to `codes` field also. There can be other new fields as well. The CSV is completely dynamic. – Yedhu Krishnan Oct 28 '19 at 13:53
  • @YedhuKrishnan But you didn't describe the way to parse it. "Completely dynamic" mean there are headers like "blahblah1" and then how you deal with it? Put it into a hash key, or group it with another "blahbleh"? Is there any principle to deal with it or you are asking us to figure it out? – An Nguyen Oct 28 '19 at 14:10
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/201500/discussion-between-yedhu-krishnan-and-firice-nguyen). – Yedhu Krishnan Oct 28 '19 at 14:37
  • `name, *fields = line.split(...)` is a nice way of carving out those items. – tadman Oct 28 '19 at 17:24
1

Let's first construct a CSV file.

str = <<~END
name,contacts.0.phone_no,contacts.1.phone_no,codes.0,IQ,codes.1
YK,1234,4567,AB001,173,AK002
ER,4321,7654,BA001,81,KA002
END

FName = 't.csv'

File.write(FName, str)
  #=> 121

I have constructed a helper method to construct a pattern that will be used to convert each row of the CSV file (following the first, containing the headers) to an element (hash) of the desired array.

require 'csv'

def construct_pattern(csv)
  csv.headers.group_by { |col| col[/[^.]+/] }.
      transform_values do |arr|
        case arr.first.count('.')
        when 0
          arr.first
        when 1
          arr
        else 
          key = arr.first[/(?<=\d\.).*/]
          arr.map { |v| { key=>v } }
        end
      end
end

In the code below, for the example being considered:

construct_pattern(csv)
  #=> {"name"=>"name",
  #    "contacts"=>[{"phone_no"=>"contacts.0.phone_no"},
  #                 {"phone_no"=>"contacts.1.phone_no"}],
  #    "codes"=>["codes.0", "codes.1"],
  #    "IQ"=>"IQ"}

By tacking if pattern.empty? onto the above expression we ensure the pattern is constructed only once.

We may now construct the desired array.

pattern = {}
CSV.foreach(FName, headers: true).map do |csv|
  pattern = construct_pattern(csv) if pattern.empty?
  pattern.each_with_object({}) do |(k,v),h|
    h[k] =
    case v
    when Array
      case v.first
      when Hash
        v.map { |g| g.transform_values { |s| csv[s] } }
      else
        v.map { |s| csv[s] }
      end
    else
      csv[v]
    end
  end
end
  #=> [{"name"=>"YK",
  #     "contacts"=>[{"phone_no"=>"1234"}, {"phone_no"=>"4567"}],
  #     "codes"=>["AB001", "AK002"],
  #     "IQ"=>"173"},
  #    {"name"=>"ER",
  #     "contacts"=>[{"phone_no"=>"4321"}, {"phone_no"=>"7654"}],
  #     "codes"=>["BA001", "KA002"],
  #     "IQ"=>"81"}] 

The CSV methods I've used are documented in CSV. See also Enumerable#group_by and Hash#transform_values.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
  • Thank you for the elaborate answer. Although the second and third assumptions are correct, the first one not correct. There could be multiple fields without dot. That means that it is a top-level key. – Yedhu Krishnan Oct 29 '19 at 06:59
  • 1
    I made what I think are the necessary adjustments in my answer. In addition, I tweaked the code generally. – Cary Swoveland Oct 29 '19 at 17:22