0

I start with a file containing two very long csv lines of data. The first contains header column names, the second contain the corresponding values:

header1,header2,header3,header4.........,header20
data1,data2,data3,data4............,data20

I can display these in a tabular format by using:

cat inputFile | column -t -s ','

Result:

header1 header2 header3 header4 .................... header20
data1   data2   data3   data4   .................... data20

This works fine, except that there are so many columns that I have to widen my terminal window beyond the width of 2 monitors to overcome the wrap and see them all lined up nicely.

Is there a way to break this into multiple rows of N columns? Something like:

header1 header2 header3 ........................ header10
data1   data2   data3   ........................ data10

header11  header12  header13    ......................  header20
data11    data12    data13      ......................  data20 
markp-fuso
  • 28,790
  • 4
  • 16
  • 36
Joe H.
  • 147
  • 1
  • 10
  • 2
    Does this answer your question? [Split delimited file into smaller files by column](https://stackoverflow.com/questions/5265839/split-delimited-file-into-smaller-files-by-column); for your case I'm thinking `inc=2`; when I tested this solution with 10 input columns and 5 output columns it generated a 3rd output file with just `\n\n`, so you may need to tweak the `for` loop variables – markp-fuso Jul 27 '21 at 20:30
  • Redirect the output to a file and then open the file without word-wrap on. `less`, `vim`, `nano`, etc.. can all do this, along with any of the GUI editors, kate/kwrite, geany, etc... – David C. Rankin Jul 27 '21 at 20:34
  • Not your actual goal but `column -t -s ',' file.csv | less -Ss` is worth a try. – Jetchisel Jul 27 '21 at 20:36

2 Answers2

1

Sample data:

$ cat my.csv
header1,header2,header3,header4,header5,header6,header7,header8,header9,header10
data1,data2,data3,data4,data5,data6,data7,data8,data9,data10

So, 10 columns, and I want to split on 5 columns ...

One awk idea that emulates the cut solution (see my other answer) but allows us to scan the input file just once:

rm -rf my.csv.*

awk -v inc=5 -F',' '
{ for (start=1; start<=NF; start=start+inc)
      { pfx=""
        end=start + inc - 1
        if (end > NF) end=NF
        for (i=start; i<=end; i++)
            { printf "%s%s", pfx, $i >> FILENAME"."start"-"end
              pfx=FS
            }
        printf "\n" >> FILENAME"."start"-"end
      }
}' my.csv

This generates the following files:

for f in my.csv.*
do
        echo "#################### $f"
        cat $f
done

#################### my.csv.1-5
header1,header2,header3,header4,header5
data1,data2,data3,data4,data5
#################### my.csv.6-10
header6,header7,header8,header9,header10
data6,data7,data8,data9,data10


And the same solution using -v inc=3 generates:

#################### my.csv.1-3
header1,header2,header3
data1,data2,data3
#################### my.csv.4-6
header4,header5,header6
data4,data5,data6
#################### my.csv.7-9
header7,header8,header9
data7,data8,data9
#################### my.csv.10-10
header10
data10
markp-fuso
  • 28,790
  • 4
  • 16
  • 36
  • 2
    Definitely the `awk` solution. Far greater efficiency than spawning a separate subshell for `cut` every iteration. (but for files of a few hundred lines and less than 100 columns the difference won't be that bad -- for 1M+ lines --- it will be on the order of minutes) – David C. Rankin Jul 27 '21 at 22:30
0

Sample data:

$ cat my.csv
header1,header2,header3,header4,header5,header6,header7,header8,header9,header10
data1,data2,data3,data4,data5,data6,data7,data8,data9,data10

So, 10 columns, and I want to split on 5 columns ...

Using this answer - Split delimited file into smaller files by column - as a starting point and tweaking the for loop logic a bit ...

infile=my.csv
ncol=$(awk -F',' 'NR==1{print NF}' "$infile")
echo "${ncol}"                                  # => 10

inc=5

for ((start=1; start<=ncol; start=start+inc))
do
    end=$((start+inc-1))
    [[ "${end}" -gt "${ncol}" ]] && end="${ncol}"
    echo "############ columns ${start} - ${end}"
    cut -d',' -f${start}-${end} my.csv
done

This generates:

############ columns 1 - 5
header1,header2,header3,header4,header5
data1,data2,data3,data4,data5
############ columns 6 - 10
header6,header7,header8,header9,header10
data6,data7,data8,data9,data10

Once OP is happy with output the echo can be removed and for each pass through the loop a new output file name can be created as needed.



What happens if we pick inc such that ncol is not evenly divided:

inc=3

for ((start=1; start<=ncol; start=start+inc))
do
    end=$((start+inc-1))
    [[ "${end}" -gt "${ncol}" ]] && end="${ncol}"
    echo "############ columns ${start} - ${end}"
    cut -d',' -f${start}-${end} my.csv
done

This generates:

############ columns 1 - 3
header1,header2,header3
data1,data2,data3
############ columns 4 - 6
header4,header5,header6
data4,data5,data6
############ columns 7 - 9
header7,header8,header9
data7,data8,data9
############ columns 10 - 10
header10
data10
markp-fuso
  • 28,790
  • 4
  • 16
  • 36