3

I'm trying to take a dataset that looks like this:

Source format of data

And transform the records into this format:

Destination Format

The resulting format would have two columns, one for the old column names and one column for the values. If there are 10,000 rows then there should be 10,000 groups of data in the new format.

I'm open to all different methods, excel formulas, sql (mysql), or straight ruby code would work for me also. What is the best way to tackle this problem?

OpenCoderX
  • 6,180
  • 7
  • 34
  • 61

3 Answers3

10

You could add an ID column to the left of your data and use a Reverse PivotTable method.

  • Press Alt+D+P to access the Pivottable Wizard with the steps:

    1.  Multiple Consolidation Ranges
    2a. I will create the page fields
    2b. Range: eg. sheet1!A1:A4 
        How Many Page Fields: 0
    3.  Existing Worksheet: H1
    
  • In the PivotTable:

    Uncheck Row and Column from the Field List
    Double-Click the Grand Total as shown
    

enter image description here

lori_m
  • 5,487
  • 1
  • 18
  • 29
1

Just for fun:

# Input file format is tab separated values

# name  search_term address code
# Jim jim jim_address 123
# Bob bob bob_address 124
# Lisa  lisa  lisa_address  126
# Mona  mona  mona_address  129


infile = File.open("inputfile.tsv")

headers = infile.readline.strip.split("\t")
puts headers.inspect
of = File.new("outputfile.tsv","w")
infile.each_line do |line|
  row = line.split("\t")
  headers.each_with_index do |key, index|
    of.puts "#{key}\t#{row[index]}"
  end
end

of.close



# A nicer way, on my machine it does 1.6M rows in about 17 sec

File.open("inputfile.tsv") do | in_file |
  headers = in_file.readline.strip.split("\t")
  File.open("outputfile.tsv","w") do | out_file |
    in_file.each_line do | line |
      row = line.split("\t")
      headers.each_with_index do | key, index | 
        out_file << key << "\t" << row[index]
      end
    end 
  end
end
Björn Nilsson
  • 3,703
  • 1
  • 24
  • 29
0
destination = File.open(dir, 'a') do |d|   #choose the destination file and open it

    source = File.open(dir , 'r+') do |s|  #choose the source file and open it
      headers = s.readline.strip.split("\t")  #grab the first row of the source file to use as headers
      s.each do |line| #interate over each line from the source

        currentLine = line.strip.split("\t") #create an array from the current line
           count = 0   #track the count of each array index
        currentLine.each do |c| #iterate over each cell of the currentline
              finalNewLine = '"' + "#{headers[count]}" + '"' + "\t" + '"' + "#{currentLine[count]}" + '"' + "\n" #build each new line as one big string
          d.write(finalNewLine) #write final line to the destination file.
          count += 1 #increment the count to work on the next cell in the line
        end

      end
  end

end
OpenCoderX
  • 6,180
  • 7
  • 34
  • 61