Split CSV values on single row into individual rows

Question

I have a Python script that outputs a text file with thousands of random filenames in a comma separated list, all on a single row.

randomFileName1, randomFileName2, randomFileName3, etc.

I want to take each value in the list and put it into its own row in a new CSV file.

randomFileName1
randomFileName2
randomFileName3

I've tried some variations of awk with no success. What's the best way to move these values into their own rows?

Doesn't python itself have the ability to split strings? – Ed Morton Mar 26 '20 at 16:46 — Ed Morton, Mar 26 '20 at 16:46
`'\n'.join(row.replace(',', ' ').split()) + '\n'` – martineau Mar 26 '20 at 17:24 — martineau, Mar 26 '20 at 17:24

score 1 · Accepted Answer · answered Mar 26 '20 at 16:58

1

With GNU sed:

sed 's|, |\n|g' file

Or, for a portable alternative,

sed 's|, |\
|g' file

answered Mar 26 '20 at 16:58

Quasímodo

3,812
14
25

score 0 · Answer 2 · answered Mar 26 '20 at 17:09

0

(g)awk:

echo randomFileName1, randomFileName2, randomFileName3 | \
   awk  '{ split($0,a,/,[ ]*/); for (i in a) { print a[i] }}'

python:

import re
a="randomFileName1, randomFileName2, randomFileName3"
b=re.split(r',[ ]*',a)
for i in b:
   print(i)

(inspiration from: String splitting in Python using regex )

answered Mar 26 '20 at 17:09

Luuk

12,245
5
22
33

The awk one isn't specific to gawk, it'll work in any awk. You don't need to put the blank in a bracket expression, though, it's already literal - `/, */`. – Ed Morton Mar 26 '20 at 17:24
@Ed Morton, on a default installed `awk` on macOS this is not printing in the proper order, so his stating "(g)awk" is appropriate. – user3439894 Mar 26 '20 at 17:48
@user3439894 No, the order is "random" in any awk given the use of `for (i in a)` which will visit the array elements in hash order, not any specific order you might have in mind. If you got the order you wanted from gawk given a specific input set then that's just coincidence, it could've been any order and will be different orders for different data sets. – Ed Morton Mar 26 '20 at 18:06
@Ed Morton, If the output order is randomized from the input order then Luuk's `awk` answer is not a good solution. On my Linux system the output order matches the input order (1 2 3) with `gawk 4.1.x` but under macOS, not `gawk` and old version of `awk` the output order is 2 3 1, consistently the same in each. – user3439894 Mar 26 '20 at 19:11
Luuks answer may be perfectly fine, the output on each line WILL be "random" but that doesn't mean it's not a good solution it all depends on the OPs requirements. I'm not making this up - see `By default, when a for loop traverses an array, the order is undefined...` in [the man page](https://www.gnu.org/software/gawk/manual/gawk.html#Controlling-Scanning). It's a simple fix though - just change `for (i in a)` to `for (i=1; i in a; i++)`. – Ed Morton Mar 26 '20 at 22:01
1

@Ed Morton, Thanks for the info and the link. I now know why I've always used e.g. `for (i=1; i in a; i++)` , because I typically always want ordered output. :) – user3439894 Mar 26 '20 at 23:35
You're welcome. Just bear in mind that only works when you have contiguous numerical indices as in this case where the array was created by split(). Otherwise if you want output in a specific order then you need to write code to support that order, e.g. see that man page link and https://stackoverflow.com/a/60869348/1745001. – Ed Morton Mar 26 '20 at 23:51

score 0 · Answer 3 · answered Mar 26 '20 at 19:00

I was actually able to figure this out using the import csv module in Python. I'm sure this could be cleaned up a bit, but it does what I need it to do.

import csv

with open('parse.txt', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)

    with open('parse_write.csv', 'w', newline='') as new_file:
        csv_writer = csv.writer(new_file, delimiter='\t')

        for line in csv_reader:
            for file_name in line:
                csv_writer.writerow(file_name)

Split CSV values on single row into individual rows

3 Answers3