5

I have the following lines in 2 chunks (actually there are ~10K of that). And in this example each chunk contain 3 lines. The chunks are separated by an empty line. So the chunks are like "paragraphs".

xox
91-233
chicago

koko
121-111
alabama

I want to turn it into tab-delimited lines, like so:

xox  91-233  chicago
koko 121-111 alabama

How can I do that?

I tried tr "\n" "\t", but it doesn't do what I want.

Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47
neversaint
  • 60,904
  • 137
  • 310
  • 477

5 Answers5

5
$ awk -F'\n' '{$1=$1} 1' RS='\n\n' OFS='\t' file
xox     91-233  chicago
koko    121-111 alabama 

How it works

Awk divides input into records and it divides each record into fields.

  • -F'\n'

    This tells awk to use a newline as the field separator.

  • $1=$1

    This tells awk to assign the first field to the first field. While this seemingly does nothing, it causes awk to treat the record as changed. As a consequence, the output is printed using our assigned value for ORS, the output record separator.

  • 1

    This is awk's cryptic shorthand for print the line.

  • RS='\n\n'

    This tells awk to treat two consecutive newlines as a record separator.

  • OFS='\t'

    This tells awk to use a tab as the field separator on output.

John1024
  • 109,961
  • 14
  • 137
  • 171
4

This answer offers the following:
* It works with blocks of nonempty lines of any size, separated by any number of empty lines; John1024's helpful answer (which is similar and came first) works with blocks of lines separated by exactly one empty line.
* It explains the awk command used in detail.

A more idiomatic (POSIX-compliant) awk solution:

awk -v RS= -F '\n' -v OFS='\t' '$1=$1""' file
  • -v RS= tells awk to operate in paragraph mode: consider each run of nonempty lines a single record; RS is the input record separator.

    • Note: The implication is that this solution considers one or more empty lines as separating paragraphs (line blocks); empty means: no line-internal characters at all, not even whitespace.
  • -F '\n' tells awk to consider each line of an input paragraph its own field (breaks the multiline input record into fields by lines); -F sets FS, the input field separator.

  • -v OFS='\t' tells awk to separate fields with \t (tab chars.) on output; OFS is the output field separator.

  • $1=$1"" looks like a no-op, but, due to assigning to field variable $1 (the record's first field), tells awk to rebuild the input record, using OFS as the field separator, thereby effectively replacing the \n separators with \t.

    • The trailing "" is to guard against the edge case of the first line in a paragraph evaluating to 0 in a numeric context; appending "" forces treatment as a string, and any nonempty string - even if it contains "0" - is considered true in a Boolean context - see below.
  • Given that $1 is by definition nonempty and given that assignments in awk pass their value through, the result of assignment $1=$1"" is also a nonempty string; since the assignment is used as a pattern (a condition), and a nonempty string is considered true, and there is no associated action block ({ ... }), the implied action is to print the - rebuilt - input record, which now consists of the input lines separated with tabs, terminated by the default output record separator (ORS), \n.

Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775
3

another alternative,

$ sed '/^$/d' file | pr -3ats$'\t'

xox     91-233  chicago
koko    121-111 alabama

remove empty lines with sed and print to 3 columns with tab delimiter. In your real file, this should be the number of lines in blocks.

Note that this will only work if all your blocks are of the same size.

karakfa
  • 66,216
  • 7
  • 41
  • 56
  • ++, but I suggest an easier-to-understand reformulation of the `pr` command (`pr` is a complex utility for _pagination_ that is being repurposed here; `pr` has many options, and making sense of the `man` page is nontrivial): `pr -3 -l 1 -s`: read `3` lines of input at a time, and output them on a single line (`-l 1`), separated (`-s`) with the default separator, `\t`. (As an aside: this won't generally work with _BSD_ `pr` (as found on OS X), because it will insert a _space_ if an input line's length is a _multiple of 8 - 1_). – mklement0 Jun 29 '16 at 05:01
3
xargs -L3 < filename.log |tr ' ' '\t'
xox 91-233 chicago
koko 121-111 alabama
P....
  • 17,421
  • 2
  • 32
  • 52
  • 1
    ++ for a simple solution; add `| tr ' ' '\t'` for tab-delimited output (as requested by the OP). There are caveats when it comes to generalizing this solution: (a) `xargs` will "eat" unescaped `\ ` chars. in the input and (b) lines with line-internal whitespace will have it normalized to a single space each. Quibble: use `-L3`, because the `-l` option is deprecated. – mklement0 Jun 29 '16 at 13:48
2

another version of awk to do this

 awk '{if(NF>0){a=a$1"\t";i++};if(i%3==0&&NF>0){print a;a=""}}' input_file
Shravan Yadav
  • 1,297
  • 1
  • 14
  • 26
  • 1
    This works (though you may want to pass in the number of lines in a block as a _variable_), but (a) only processes the _first_ whitespace-separated field on each input line (which does work with the _sample_ data, but is worth pointing out as a general constraint) and (b) results in a _trailing tab_ on each output line - both of which may or may not be acceptable. – mklement0 Jun 29 '16 at 03:06