Fix line wraps in plaintext tables with Unix command-line tools

Question

I'm trying to process a tab-separated table in which some of the cells have line-wraps. The tables were extracted from PDF tables automatically and look like this:

1   UNITED STATES OF    3797
    AMERICA
2   CANADA  3855
3   ISLAMIC REPUBLIC    636
    OF IRAN

where the left-hand column in each text line has an entry only if the line actually starts a new data entry. (I've used spaces to simulate the effect of tab-spacing because StackOverflow won't allow me to input tabs.) I'd like to find some simple way to transform this table into the following, ideally with line-oriented Unix text-processing tools:

1   UNITED STATES OF AMERICA    3797
2   CANADA  3855
3   ISLAMIC REPUBLIC OF IRAN    636

Is there an easy way to do this with the standard Unix tools? I've experimented a bit and haven't found one.

ctac_ · Answer 1 · 2017-12-25T12:02:26.073

0

You can try this awk

awk -F '\t' '
NF==3{
  if(b)
    print b
  b=$0
  }
NF==2{
  split(b,a,FS)
  b=a[1] FS $2 " " a[2] FS a[3]
  }
END{
  print b
  }
' infile

You must reorder the output as you like

edited Dec 25 '17 at 12:02

answered Dec 25 '17 at 09:56

ctac_

2,413
2
7
17

Fix line wraps in plaintext tables with Unix command-line tools

1 Answers1