1

I want the unique headers for a bunch of csv files whose names contain ABC or XYZ.

Within a single directory, I can sort of get what I need with:

head -n ` *.csv > first.txt
cat -A first.txt | tr ',' '\n' | sort | uniq

Of course, this isn't recursive and it includes all csv files, not just the ones I want.

If I do the following, I get the recursive search, but also a bunch of junk:

find . -type f -name "ABC*.csv" -o -name "XYZ*.csv" | xargs head -n 1 | tr ',' '\n' | sort | uniq

I'm on Windows 10 with MinGW64. I suppose I could use Python, but I feel so close to having it!

Lorem Ipsum
  • 4,020
  • 4
  • 41
  • 67

1 Answers1

1

When head is given multiple files (xargs does that) it prints their names as well.

Using find's -exec action (you should force the precedence of -name 'ABC*.csv' -o -name 'XYZ*.csv for it to work) you can obtain the desired result. uniq is also not required here, sort can do that on its own. And as a sidenote, you better enclose literal strings in single quotes.

find . -type f \( -name 'ABC*.csv' -o -name 'XYZ*.csv' \) -exec head -n 1 {} \; | tr ',' '\n' | sort -u

If your files have DOS line endings above command will not work though. In that case you should delete carriage returns using tr or sed:

find . -type f \( -name 'ABC*.csv' -o -name 'XYZ*.csv' \) -exec head -n 1 {} \; | tr -d '\r' | tr ',' '\n' | sort -u
# or
find . -type f \( -name 'ABC*.csv' -o -name 'XYZ*.csv' \) -exec head -n 1 {} \; | sed 's/\r//; s/,/\n/g' | sort -u
oguz ismail
  • 1
  • 16
  • 47
  • 69
  • Hmm, it seems to not return all results. Searching only `ABC*.csv` turns up different headers than when both `ABC` and `XYZ` are searched. – Lorem Ipsum Apr 25 '19 at 17:55
  • Different results as well. :/ – Lorem Ipsum Apr 25 '19 at 18:03
  • When using the `cat -A` approach, I noticed that some headers have junk in them. For example, `First Name` and `First Name^M`. – Lorem Ipsum Apr 25 '19 at 18:04
  • Still not returning some known headers. I do have `sed`, though. – Lorem Ipsum Apr 25 '19 at 18:17
  • Unfortunately, I can't provide the files :( I sincerely appreciate your assistance. I learned a few things and, if I'm losing my mind, at least I'm not alone! – Lorem Ipsum Apr 25 '19 at 18:20
  • 1
    A note for future me, if you need a more complex regex, do something like `find . -type f -regextype posix-extended -regex '.*/(ABC|XYZ)[0-9]+.csv'`. The leading `.*/` is needed because `find` returns the absolute path. The different regextype syntaxes are described here: https://www.gnu.org/software/findutils/manual/html_node/find_html/Regular-Expressions.html – Lorem Ipsum Apr 25 '19 at 20:20