Split a CSV if specific string in row is found (Python/Pandas/Bash)

Question

I've got one crappy CSV file with multiple headers inside it. It looks like this:

File1:
    #HEADER COL1 COL2
    data
    data
    data
    #HEADER COL1 COL2 COL3
    data
    data
    data
    data
    data
    #HEADER COL1 COL2 COL3 COL4
    data
    data
    ...

Since I can't load it with pandas due to the in-the-file-headers, I'm looking to split the data at each header to a separate file (or separate data frames in pandas). Is there a way to do this?

This CSV is generated by sensors. If a sensor is added, the header will get a new column. This could also happen in the in-the-file-headers. So it is NOT a solution to delete those headers. (Clean wrong header inside Dataframe with Python/Pandas)

It would be really nice to do it in python/pandas, but I would also be happy with a bash command/script solution.

Expected output:

File1:
        #HEADER COL1 COL2
        data
        data
        data
File2:
        #HEADER COL1 COL2 COL3
        data
        data
        data
        data
        data
File3:
        #HEADER COL1 COL2 COL3 COL4
        data
        data
        ...

Thank you!

Are you able to show your expected output as well as what you've tried ? — Umar.H, Nov 13 '19 at 15:37
Are those `HEADER` lines always the same? Can you post a realistic input? — RomanPerekhrest, Nov 13 '19 at 15:37
No the HEADER lines can change (but they all start with "#"). So if I could locate "#", I could split the files — vince, Nov 13 '19 at 15:44

karakfa · Answer 1 · 2019-11-13T15:57:34.463

1

awk to the rescue!

$ awk '/^#HEADER/{close(FILENAME "_" c); c++} {print > (FILENAME "_" c)}' file

will split input file into file_n parts, where n is the section counter.

edited Nov 13 '19 at 15:57

answered Nov 13 '19 at 15:42

karakfa

66,216
7
41
56

in my case this just worked beautifully! `awk '/^#/{close(FILENAME "_" c); c++} {print > (FILENAME "_" c)}' file` – vince Nov 13 '19 at 16:03
last question: how can I export the result file to another folder? – vince Nov 13 '19 at 16:09

score 1 · Answer 2 · answered Nov 13 '19 at 16:17

With convenient csplit command (to split a file into sections by pattern):

csplit -b %d -f file -z input_file '/#HEADER.*/' '{*}'

Viewing results:

$ head file[0-9]
==> file0 <==
#HEADER COL1 COL2
data
data
data

==> file1 <==
#HEADER COL1 COL2 COL3
data
data
data
data
data

==> file2 <==
#HEADER COL1 COL2 COL3 COL4
data
data
...

Split a CSV if specific string in row is found (Python/Pandas/Bash)

2 Answers2