My text file data looks like this:(protein-protein interaction data)
transcription_factor protein
Myc Rilpl1
Mycn Rilpl1
Mycn "Wdhd1,Socs4"
Sox2 Rilpl1
Sox2 "Wdhd1,Socs4"
Nanog "Wdhd1,Socs4"
I want it to look like this:( To see each protein has how many transcription_factor interact with)
protein transcription_factor
Rilpl1 Myc, Mycn, Sox2
Wdhd1 Mycn, Sox2, Nanog
Socs4 Mycn, Sox2, Nanog
After using my code, what I got is this:(how can I get rid off the "" and separate the two protein to new line)
protein transcription_factor
Rilpl1 Myc, Mycn, Sox2
"Wdhd1,Socs4" Mycn, Nanog, Sox2
Here is my code:
input_file = ARGV[0]
hash = {}
File.readlines(input_file, "\r").each do |line|
transcription_factor, protein = line.chomp.split("\t")
if hash.has_key? protein
hash[protein] << transcription_factor
else
hash[protein] = [transcription_factor]
end
end
hash.each do |key, value|
if value.count > 2
string = value.join(', ')
puts "#{key}\t#{string}"
end
end