0

I have a Python script that outputs a text file with thousands of random filenames in a comma separated list, all on a single row.

randomFileName1, randomFileName2, randomFileName3, etc.

I want to take each value in the list and put it into its own row in a new CSV file.

randomFileName1
randomFileName2
randomFileName3

I've tried some variations of awk with no success. What's the best way to move these values into their own rows?

Ella C.
  • 3
  • 1
  • 4

3 Answers3

1

With GNU sed:

sed 's|, |\n|g' file

Or, for a portable alternative,

sed 's|, |\
|g' file
Quasímodo
  • 3,812
  • 14
  • 25
0

(g)awk:

echo randomFileName1, randomFileName2, randomFileName3 | \
   awk  '{ split($0,a,/,[ ]*/); for (i in a) { print a[i] }}'

python:

import re
a="randomFileName1, randomFileName2, randomFileName3"
b=re.split(r',[ ]*',a)
for i in b:
   print(i)

(inspiration from: String splitting in Python using regex )

Luuk
  • 12,245
  • 5
  • 22
  • 33
  • The awk one isn't specific to gawk, it'll work in any awk. You don't need to put the blank in a bracket expression, though, it's already literal - `/, */`. – Ed Morton Mar 26 '20 at 17:24
  • @Ed Morton, on a default installed `awk` on macOS this is not printing in the proper order, so his stating "(g)awk" is appropriate. – user3439894 Mar 26 '20 at 17:48
  • @user3439894 No, the order is "random" in any awk given the use of `for (i in a)` which will visit the array elements in hash order, not any specific order you might have in mind. If you got the order you wanted from gawk given a specific input set then that's just coincidence, it could've been any order and will be different orders for different data sets. – Ed Morton Mar 26 '20 at 18:06
  • @Ed Morton, If the output order is randomized from the input order then Luuk's `awk` answer is not a good solution. On my Linux system the output order matches the input order (1 2 3) with `gawk 4.1.x` but under macOS, not `gawk` and old version of `awk` the output order is 2 3 1, consistently the same in each. – user3439894 Mar 26 '20 at 19:11
  • Luuks answer may be perfectly fine, the output on each line WILL be "random" but that doesn't mean it's not a good solution it all depends on the OPs requirements. I'm not making this up - see `By default, when a for loop traverses an array, the order is undefined...` in [the man page](https://www.gnu.org/software/gawk/manual/gawk.html#Controlling-Scanning). It's a simple fix though - just change `for (i in a)` to `for (i=1; i in a; i++)`. – Ed Morton Mar 26 '20 at 22:01
  • 1
    @Ed Morton, Thanks for the info and the link. I now know why I've always used e.g. `for (i=1; i in a; i++)` , because I typically always want ordered output. :) – user3439894 Mar 26 '20 at 23:35
  • You're welcome. Just bear in mind that only works when you have contiguous numerical indices as in this case where the array was created by split(). Otherwise if you want output in a specific order then you need to write code to support that order, e.g. see that man page link and https://stackoverflow.com/a/60869348/1745001. – Ed Morton Mar 26 '20 at 23:51
0

I was actually able to figure this out using the import csv module in Python. I'm sure this could be cleaned up a bit, but it does what I need it to do.

import csv

with open('parse.txt', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)

    with open('parse_write.csv', 'w', newline='') as new_file:
        csv_writer = csv.writer(new_file, delimiter='\t')

        for line in csv_reader:
            for file_name in line:
                csv_writer.writerow(file_name)
Ella C.
  • 3
  • 1
  • 4