0

I have this following test data that needs to export to the format shown in my desired output with ruby programming. The real data array has 1000000 records.

data_array1=aaaa
data_array2=bbbb
----------------
----------------
data_array8=hhhh , which means there are 8 data array, those have the following format :

aaaa= [a,[1,2,3,4],20]
bbbb= [b,[8,7,9,19],23]
-----------------------
-----------------------
hhhh= [h,[25,26,29,30],28]

My desired out put needs to be exported to text file (headers just for FYI,no need to include in the output file) :

output.txt

hash tx time

a    1    20
a    2    20
a    3    20
a    4    20
b    8    25
b    7    25
b    9    25
b    19   25
------------
------------
h    25   28
h    26   28
h    29   28
h    30   28

I am a newbie in Ruby and so far I have done this, which inconclusive:

def bhash
  1.upto(8) do |bid|
    blk=[bid]

    keys = %w[hash tx time ]
    data = keys.map{|key| blk[key]}

    hash, txids, time, difficulty = data
    CSV.open('output.txt', 'w', headers: keys, write_headers: true, col_sep: 
    "\t") do |csv|
    txids.each do |tx|
      csv << [hash,tx,time]
    end
  end
end 

Thanks in advance for all your help.

Rubz
  • 95
  • 8
  • How are you calling this function? You've got a parameter named `file` but you're not using it. Also, `[a,[1,2,3,4],20]` is not a hash, its an array, the second element of which is another array. – Tom May 31 '18 at 03:32
  • thanks @Tom. Just edited the mistake, you pointed out. – Rubz May 31 '18 at 03:35
  • You're missing at least two `end` statements. This function won't even run as written. It's also not getting any data work with. The line `data =....` will fail because you can't implicitly cast a string to an integer. And that's all just for starters. – Tom May 31 '18 at 03:37
  • Please provide your suggestion, my code was inconclusive, thanks. – Rubz May 31 '18 at 03:38
  • How are you getting data into the function? Where does the data come from? `1.upto(8) do |bid|` followed by `blk=[bid]` means that `blk` is now an array with one element in it with the value `1`. In other words, on the first iteration of the loop, `blk=[1]`. Then you try to access `blk` as if it were a hash, but it isn't. – Tom May 31 '18 at 03:40
  • I'm still not clear on the problem. Where does the data come from? What does it look like *exactly*? – Tom May 31 '18 at 03:41
  • Sorry, there was some spacing issue of my question and the desired output, please check the problem again and suggest , thanks. – Rubz May 31 '18 at 03:47

1 Answers1

0

Given a file json.txt with the following content:

["a",[1,2,3,4],20]
["b",[8,7,9,19],23]
["h",[25,26,29,30],28]

The following program:

require 'csv'
require 'pp'
require 'json'

def custom_expansion(a)
  expanded = Array.new
  h = Hash.new
  h['hash'] = a[0]
  h['time'] = a[2]
  inside = a[1]
  inside.each do |tx|
    h['tx'] = tx
    expanded.push h.dup
  end
  expanded
end

CSV.open('output.txt', 'w', col_sep: "\t") do |csv|
  File.open('json.txt') do |f|
    while !f.eof?
      array = JSON.parse(f.readline)
      ex = custom_expansion(array)
      ex.each do |e|
        csv << [ e['hash'], e['tx'], e['time'] ]
      end
    end
  end
end

will produce this in output.txt:

a   1   20
a   2   20
a   3   20
a   4   20
b   8   23
b   7   23
b   9   23
b   19  23
h   25  28
h   26  28
h   29  28
h   30  28
Tom
  • 412
  • 2
  • 9
  • It works perfectly! only one issue. The test array you have defined in coding can it be defined in a method? In my question, there was 8 data array(aaaa,bbbb ....hhhh, eventhough only three I have shown). Can a method be defined in the coding instead to accumulate just 3 data array in one test array? – Rubz May 31 '18 at 04:24
  • You'll note that I set the variable `test` to the data to operate on. However you do that is up to you without more information. Accumulate the data *from where* ? Note that the expansion function does not care how many elements are in the array passed to it. There can be 3, 8, or 1,000. – Tom May 31 '18 at 04:32
  • This is needed as I have a range of data array(around 1000000), which I need to combine as you did in test array. So, I need a method to combine and then call the custom_expansion method. – Rubz May 31 '18 at 04:34
  • Combine *from where*? – Tom May 31 '18 at 04:35
  • data_array = [] data_array << data1 data_array << data2 data_array << data3 ..... – Rubz May 31 '18 at 04:40
  • I've updated it to show how to deal with that. `custom_expansion` operates on a single array. Instead of accumulating 1,000,000 arrays in memory, you can read them one at a time in the loop in place of the `test.each` loop. If want more than that, you have to tell me where `data1`, `data2` etc come from. MySQL? A CSV file? How are you getting these 1,000,000 arrays? – Tom May 31 '18 at 04:45
  • I would not acclimate 1,000,000 arrays in memory, expand them into a 4,000,000 value hash in memory, and then write the whole thing out. I would do it one at a time, otherwise you'll run into memory issues. – Tom May 31 '18 at 04:50
  • Sorry for mentioning it so late. I was just thinking about it while test run your coding. Shall I update the big data issue in the question?can you kindly modify your code to do one array at a time and then export the total output to a file? – Rubz May 31 '18 at 04:55
  • I updated the answer to read the input arrays from a JSON file containing one JSON-formatted array per line. – Tom May 31 '18 at 05:12
  • If you actually have 8 JSON arrays per line in an enclosing array, you have to do something slightly different. This is why the *input specification matters.* – Tom May 31 '18 at 05:20
  • If the file has the "data_array1=" at the beginning of each line, you can just discard that part of the string before passing it to `JSON#parse`. In that case just change the one line `array=...` to `array = JSON.parse(f.readline.split('=')[1])` – Tom May 31 '18 at 05:24
  • It has to be properly formatted JSON though, to use the JSON parser. `[a,[1,2,3,4],20]` is not JSON. `["a",[1,2,3,4],20]` is correct JSON. – Tom May 31 '18 at 05:29