1

Imagine I have a directory containing many subdirectories each containing some number of CSV files with the same structure (same number of columns and all containing the same header).

I am aware that I can run from the parent folder something like

find ./ -name '*.csv' -exec cat {} \; > ~/Desktop/result.csv

And this will work fine, expect for the fact that the header is repeated each time (once for each file).

I'm also aware that I can do something like sed 1d <filename> or tail -n +<N+1> <filename> to skip the first line of a file.

But in my case, it seems a bit more specialised. I want to preserve the header once for the first file and then skip the header for every file after that.

Is anyone aware of a way to achieve this using standard Unix tools (like find, head, tail, sed, awk etc.) and bash?

For example input files

   /folder1
            /file1.csv
            /file2.csv
   /folder2
            /file1.csv

Where each file has header:

A,B,C and each file has one data row 1,2,3

The desired output would be:

A,B,C
1,2,3
1,2,3
1,2,3

Marked As Duplicate

I feel this is different to other questions like this and this specifically because those solutions reference file1 and file2 in the solution. My question asks about a directory structure with an arbitrary number of files where I would not want to type out each file one by one.

David
  • 7,652
  • 21
  • 60
  • 98
  • 1
    `once for the first file` which file is first? Or it makes no difference from which file the header is taken? – KamilCuk Nov 14 '18 at 19:21
  • Makes no difference in this case :) all files contain the same header and I don't mind which comes first. – David Nov 15 '18 at 09:22
  • None of the linked questions are exact dup of this problem hence reopening. – anubhava Nov 15 '18 at 10:38

2 Answers2

6

You may use this find + xargs + awk:

find . -name '*.csv' -print0 | xargs -0 awk 'NR==1 || FNR>1'

NR==1 || FNR>1 condition will be true for very first line in combined output or for every non-first line.

anubhava
  • 761,203
  • 64
  • 569
  • 643
0
$ {
> cat real-daily-wages-in-pounds-engla.tsv;
> tail -n+2 real-daily-wages-in-pounds-engla.tsv;
> } | cat

You can pipe the output of multiple commands through cat. tail -n+2 selects all lines from a file, except the first.

moo
  • 1,597
  • 1
  • 14
  • 29