I have the following code for unzip and concatenate .csv files of a directory into a new merged .csv file named base_flash.csv
def merge(self):
file_list = [self.unzipDir + '\\' + f for f in os.listdir(self.unzipDir) if f.startswith('relatorio')]
csv_list = []
for file in sorted(file_list):
csv_list.append(pd.read_csv(file, sep = ';', dtype=str, index_col=False, encoding='ansi', on_bad_lines='warn').assign(File_Name = os.path.basename(file)))
csv_merged = pd.concat(csv_list, ignore_index=True)
csv_merged.to_csv('base_flash.csv', index=False, sep = ';', encoding='ansi')
print(csv_merged)
Some files have bad lines with more fields than expected:
b'Skipping line 331: expected 68 fields, saw 70\nSkipping line 343: expected 68 fields, saw 70\nSkipping line 468: expected 68 fields, saw 70\nSkipping line 484: expected 68 fields, saw 70\n'
b'Skipping line 327: expected 68 fields, saw 70\nSkipping line 343: expected 68 fields, saw 70\nSkipping line 415: expected 68 fields, saw 70\n'
b'Skipping line 131: expected 68 fields, saw 70\n'
b'Skipping line 518: expected 68 fields, saw 70\nSkipping line 558: expected 68 fields, saw 70\n'
b'Skipping line 124: expected 68 fields, saw 69\nSkipping line 137: expected 68 fields, saw 69\n'
b'Skipping line 187: expected 68 fields, saw 70\nSkipping line 259: expected 68 fields, saw 70\n'
I have found the problem in the CSV file which is: there is more ;; than should have.
There is a way to make a separated dataframe to deal with these bad lines? Or maybe even remove a pair of semicolons to restore the column in place?