2

I want to have standard output for KEGG pathways of a gene placed side by side no matter how many lines of the KEGG pathways it has. For example, a gene TT123456 is involves in several pathways:

Valine, leucine and isoleucine degradation
Histidine metabolism
Ascorbate and aldarate metabolism
Lysine degradation
Glycerolipid metabolism

By using the sed command

sed '$!N;s/\n/\t/'

I able to have two lines joined side by side

Valine, leucine and isoleucine degradation  Histidine metabolism
Ascorbate and aldarate metabolism   Lysine degradation
Glycerolipid metabolism

But, I would like to have the output as

Valine, leucine and isoleucine degradation  Histidine metabolism    Ascorbate and aldarate metabolism   Lysine degradation  Glycerolipid metabolism

I have been searching around, but, I failed to find a good solution.

Could the community please shares your expertise with me?

Thank you.

KJ Lim
  • 107
  • 1
  • 9

6 Answers6

4

Using awk:

awk 'ORS="\t"' file

$ awk 'ORS="\t"' file
Valine, leucine and isoleucine degradation      Histidine metabolism    Ascorbate and aldarate metabolism       Lysine degradation      Glycerolipid metabolism 

If you wish to use sed then:

$ sed ':a;N;s/\n/\t/;ba' file
Valine, leucine and isoleucine degradation      Histidine metabolism    Ascorbate and aldarate metabolism       Lysine degradation      Glycerolipid metabolism
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
  • Thanks a lot. If I may, could you please explain ':a;N;s/\n/\t/;ba' more? Thanks. Your method works. – KJ Lim Mar 10 '14 at 14:10
  • 1
    @KJLim `a` is label, `N` is to grab the next line with `\n`. We use normal substitution to replace newline with tabs. `:ba` is the critical part which allows you to loop back to label to continue this substitution until end of file. – jaypal singh Mar 10 '14 at 14:18
4

This is really what paste(1) is for:

$ paste -s "$file"
Valine, leucine and isoleucine degradation  Histidine metabolism    Ascorbate and aldarate metabolism   Lysine degradation  Glycerolipid metabolism

Here's what the manpage says the -s flag should do:

Concatenate all of the lines of each separate input file in command line order. The <newline> of every line except the last line in each input file shall be replaced with the <tab>, unless otherwise specified by the -d option.

You can also process standard input by using a - instead of the filename.

somecommand | paste -s -

What's the difference between tr '\n' '\t' and paste -s (with an implied tab delimiter)? The former will strip even the trailing newline, but paste will leave the final newline intact. Also, paste can handle both standard input and files, but tr can only handle standard input.

kojiro
  • 74,557
  • 19
  • 143
  • 201
  • 1
    +1 because this does not produce an extraneous tab at the end of the output. (Since the shell lets you trivially redirect stdin to a file, I don't really see the value-added of "paste can handle both standard input and files".) – rici Mar 10 '14 at 15:20
  • @rici it's useful if you don't want to consume standard input, such as in a read loop or when piping a command to ssh. – kojiro Mar 10 '14 at 16:33
  • 2
    @kojiro: `tr – rici Mar 10 '14 at 17:57
3

You could use tr:

tr '\n' '\t' < inputfile

For your input, it'd produce:

Valine, leucine and isoleucine degradation      Histidine metabolism    Ascorbate and aldarate metabolism       Lysine degradation      Glycerolipid metabolism

Using sed:

sed '$!{:a;N;s/\n/\t/;ta}' inputfile
devnull
  • 118,548
  • 33
  • 236
  • 227
  • Please see [my answer](http://stackoverflow.com/a/22301883/418413) to differentiate `tr '\n' '\t'` and `paste -s`. – kojiro Mar 10 '14 at 13:52
  • 1
    @kojiro I understand the difference between `tr` and `paste`. Your note seems to indicate that there is something wrong with it. If so, please indicate and feel free to downvote. – devnull Mar 10 '14 at 13:58
  • 1
    There's nothing wrong with using `tr`, but I felt making the distinction was important. I originally described the difference in my comment, but then decided it would be better to put it in my answer. – kojiro Mar 10 '14 at 14:01
  • 1
    @kojiro So performing a _comparative analysis_ of different answers is your hobby? – devnull Mar 10 '14 at 14:02
  • StackOverflow doesn't pay as well as you might think. – kojiro Mar 10 '14 at 14:08
  • @devnull Thanks. Your method works. Could you please explain '$!{:a;N;s/\n/\t/;ta}' more? Thanks. – KJ Lim Mar 10 '14 at 14:11
  • 1
    @KJLim Unless it's the last line in the file, it joins the next line with the pattern space, replaces the newline with the tab character and loops again. – devnull Mar 10 '14 at 14:15
1

You can use paste in serial mode:

paste -s file
Timmah
  • 2,001
  • 19
  • 18
0

You can use xargs like this:

$ xargs -n15 <file
Valine, leucine and isoleucine degradation Histidine metabolism Ascorbate and aldarate metabolism Lysine degradation Glycerolipid metabolism

Note 15 is the number of words in your file. You could write a bigger number like xargs -n50 < file to make sure everything printed in the same line.

fedorqui
  • 275,237
  • 103
  • 548
  • 598
0

Also printf '%s ' $(< file) or printf '%s ' $(cat file) if your shell doesn't have $(< ...).

Etan Reisner
  • 77,877
  • 8
  • 106
  • 148