2

I have 2 files like below:

file1:

a1,b1,c1,d1,e1,f1,g1,h1
a2,b2,c2,d2,e2,f2,g2,h2
a3,b3,c3,d3,e3,f3,g3,h3
a4,b4,c4,d4,e4,f4,g4,h4

file2:

x1,y1,z1
x2,y2,z2
x3,y3,z3
x4,y4,z4

I want to read simultaneously from both and output the variables in a pattern like below:

a1,b1,c1,d1,x1,e1,f1,y1,g1,z1,h1
a2,b2,c2,d2,x2,e2,f2,y2,g2,z2,h2
a3,b3,c3,d3,x3,e3,f3,y3,g3,z3,h3
a4,b4,c4,d4,x4,e4,f4,y4,g4,z4,h4

Good news - I've managed to achieve it !!

Bad news - Too many arrays and while loops (too much computation!). I am looking for something simpler as the script will have to read through much data (4k lines and 1M words).

Limitation - BASH shell (probably not a limitation!)

This is what I've done

exec 5<file1 # Open file into FD 5
exec 6<file2 # Open file into FD 6

while IFS=$"," read -r line1 <&5
IFS=$"," read -r line2 <&6
do
    array1=( `echo $line1` )
    array2=( `echo $line2` )
    array3=("${array1[@]}","${array2[@]}")
    echo ${array3[@]} >> tmpline
done
while IFS="," read var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11
do
    echo -e "$var1,$var2,$var3,$var4,$var9,$var5,$var6,$var10,$var8,$var11,$var9" >> tcomb
done < tmpline

exec 5<&- # Close FD 5
exec 6<&- # Close FD 6

Thanks in advance -- I'm waiting patiently :) !!

jfg956
  • 16,077
  • 4
  • 26
  • 34
Marcos
  • 845
  • 3
  • 10
  • 21
  • I now understand how to edit in this forum.. was struggling from last couple of days to do that .... anyone looking for classes on "How To" ? :) ! – Marcos Mar 17 '13 at 13:01
  • Modified my code a bit.... but still too much computation while read -r line1 <&5 && read -r line2 <&6 do stuff.. done while IFS="," do stuff.. done < tmpline exec 5<&- # Close FD 5 exec 6<&- # Close FD 6 – Marcos Mar 17 '13 at 13:18
  • I'm using this to do the stuff now.. looks simpler, thnx everyone! --> paste -d , file1 file2 | awk -F, '{OFS=","}{print $1,$2,$3,$4,$9,$5,$6,$7,$10,$7,$11,$8}' – Marcos Mar 17 '13 at 20:49

3 Answers3

5

Try this:

exec 5<file1 # Open file into FD 5
exec 6<file2 # Open file into FD 6

while IFS=, read -a t <&5 &&
      IFS=, read -a u <&6
do
    echo -n "${t[0]},${t[1]},${t[2]},${t[3]},${u[0]},${t[4]},"
    echo    "${t[5]},${u[1]},${t[6]},${u[2]},${t[7]}"
done >| tcomb

exec 5<&- # Close FD 5
exec 6<&- # Close FD 6
chepner
  • 497,756
  • 71
  • 530
  • 681
Edouard Thiel
  • 5,878
  • 25
  • 33
  • Thanks Edouard.... I think there's a reduction in computation cycles here.. can this be reduced further ? Also what is the "|" doing at last ?? – Marcos Mar 17 '13 at 13:30
  • the `>|` force overwriting, without need of `set -o nocclober`. I like your `exec`! No other idea to reduce the computation times… – Edouard Thiel Mar 17 '13 at 13:40
  • I've modified ur "echo" part a bit -- echo -e "${t[0]},${t[1]},${t[2]},${t[3]},${u[0]},${t[4]},${t[5]},${u[1]},${t[6]},${u[2]},${t[7]}" -- hope you like it! – Marcos Mar 17 '13 at 13:49
  • Modified my script again paste -d , file1 file2 > file3 while IFS=, read var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 do echo -e "$var1,$var2,$var3,$var4,$var9,$var5,$var6,$var10,$var8,$var11,$var9" >> tcomb unset IFS done < file3 Still on look out for a better one ! – Marcos Mar 17 '13 at 14:58
  • The `IFS=,` value dies with `while`, so `unset IFS` remove the old `IFS` you want to retreive! – Edouard Thiel Mar 17 '13 at 15:16
  • To remove the complexe lines with `echo` you could combine the 2 arrays in a single array and loop on indexes. It becomes: `v=("${t[@]}" "${u[@]}"); for n in 0 1 2 3 8 4 5 9 6 10; do echo -n "${v[$n]},"; done; echo "${v[7]}"`. – jfg956 Mar 17 '13 at 16:04
  • @Edouard... I didn't get the 'unset IFS' thing that you said earlier.. plz explain! – Marcos Mar 17 '13 at 18:19
1

You can use paste to combine the lines of the files. Then, you have to reorder the columns, I used Perl for that:

paste file1 file2 -d, | \
    perl -F, -ane 'chomp $F[-1]; $"=","; print "@F[0..3,8,4,5,9,6,10,7]\n"'
choroba
  • 231,213
  • 25
  • 204
  • 289
  • I can't use perl.. but I'll still try this out... thnx – Marcos Mar 17 '13 at 13:29
  • 1
    Instead of `perl`, use awk then (it is standard in alla UNIXes): `awk -F , -v OFS="," '{print $1, $2, $9, $3, $4, $10, $5, $6, $11, $7, $8}`. – jfg956 Mar 17 '13 at 15:46
  • @jfgagne, I like ur 'awk' thing... can you please tell me which one will use less computation cycles, ur 'awk' one or 'array' one that I've put up in my comments?... this is kind of important for the script that I'm working on! – Marcos Mar 17 '13 at 18:22
  • @Marcos: my guess is that `paste` + `awk` will be faster than a single (but complex and sub-optimized) bash, but there is only one way to know it, and it is to test it ;-). Moreover, do not use temporary files: instead of `paste ... > tmp_file; awk ... tmp_file > final_file` do `paste ... | awk ... > file`. – jfg956 Mar 17 '13 at 19:36
  • @jfgagne, the 'awk' command seems to have missed something. I have added paste + awk ilike this --> paste -d , file1 file2 | awk '{FS=","} {print $1,$2,$3,$4,$9,$5,$6,$7,$10,$7,$11,$8}' | tr -s " " "," -- U'll notice that I've added a "tr -s", as the 'awk' was not sending the output as CSV but as "space separated", my requirement is that the output comes as CSV, but adding "tr -s" doesn't help much as if any of my value as a space( ), it adds comma there.. :( ! Any ideas ? – Marcos Mar 17 '13 at 20:33
  • @Marcos: `awk -F , -v OFS="," ...` – jfg956 Mar 17 '13 at 20:47
  • @jfgagne, sorry for so many comments... u made a small mistake, hence so many comments... here's the correct paste + awk --> paste -d , file1 file2 | awk -F, '{OFS=","}{print $1,$2,$3,$4,$9,$5,$6,$7,$10,$7,$11,$8}', thnx ! I'll stick to this now ! – Marcos Mar 17 '13 at 20:48
0

If you allow yourself to read the files more than once, and using bash process substitution:

paste -d , <(cut -d , -f 1-4 file1) \
           <(cut -d , -f 1 file2) \
           <(cut -d , -f 5-6 file1) \
           <(cut -d , -f 2 file2) \
           <(cut -d , -f 7 file1) \
           <(cut -d , -f 3 file2) \
           <(cut -d , -f 8 file1)
jfg956
  • 16,077
  • 4
  • 26
  • 34
  • I'm afraid I can't use this method.. the data I'm going to work on has many files, with each file containing about 4k lines and ~1M words.. PS: I know I shouldn't be using bash! .... but ! – Marcos Mar 17 '13 at 18:16