0

Currently working on an EDI translation for a proprietary file format. The input file may have hundreds to thousands of records. Each line is a record.

INFILE:(Ignore beginning blank line, it's only there for visual representation)

"1","2","3","4","5","6","7","can also contain text, and commas","9","10","11","12","13"
"1","2","3","4","5","6","7","can also contain text, and commas","9","10","11","12","13"
"1","2","3","4","5","6","7","can also contain text, and commas","9","10","11","12","13"
"1","2","3","4","5","6","7","can also contain text, and commas","9","10","11","12","13"
"1","2","3","4","5","6","7","can also contain text, and commas","9","10","11","12","13"

I need a loop that can go line by line and copy data from one column in the original file, and paste it into a new file that holds the data in a different column. kind of like a vlookup without column headers. here is the relationship between column numbers.

InColumn-OutColumn 
1-1, 2-2, 3-3, 4-4, 5-5, 6-6, 7-7, 8-8, 12-9, 14-10, 25-11, 68-24

Thanks in advance, I can't seem to wrap my mind around this one.

EDIT: as requested here is the broken code that i couldn't get to work.

KEY = [1,2,3,4,5,6,7,8,12,14,25,68]
Body850 = open(TEMP, 'r+')

for line in Body850:
    for x in line.split(','):
        END += KEY[x]
    print line
AtomicDog
  • 79
  • 2
  • 2
  • 8

2 Answers2

1

As mentioned, the csv module can take care of the quoted commas. Create a csv reader and a writer and then the only trick is filtering the columns.

import csv

# todo: for testing...
open('testfile.csv', 'w').write('''"1","2","3","4","5","6","7","can also contain text, and commas","9","10","11","12","13"
"1","2","3","4","5","6","7","can also contain text, and commas","9","10","11","12","13"
"1","2","3","4","5","6","7","can also contain text, and commas","9","10","11","12","13"
"1","2","3","4","5","6","7","can also contain text, and commas","9","10","11","12","13"
"1","2","3","4","5","6","7","can also contain text, and commas","9","10","11","12","13"''')

# the columns wanted, in order they should appear in output (zero based)
# ...example doesn't go to 68 cols, so abreviated
column_order = (0,1,2,3,4,5,6,7,11)

with open('testfile.csv') as csvin:
    reader = csv.reader(csvin)
    with open('testfile_new.csv', 'w') as csvout:
        writer = csv.writer(csvout)
        for row in reader:
            writer.writerow([row[i] for i in column_order])

# todo: for testing...
print(open('testfile_new.csv').read())
tdelaney
  • 73,364
  • 6
  • 83
  • 116
0

If the text does not contain quotes, you can do:

for line in Body850:
    for x in line.split('","'):
        if x in KEY:
             END += KEY[x]
        else
             pass # do something here in case x is not in KEY
    print line
Valentin Lorentz
  • 9,556
  • 6
  • 47
  • 69