I have csv files spread around in multiple directories, each of the csv file has only one column containing data. What I want to do is read all these files and bring each file's column into on csv file. Final csv file will have columns with filename as its headers and respective data from its original file as its column data.
This is my directory structure inside ~/csv_files/ ls
ab arc bat-smg bn cdo crh diq es fo gd haw ia iu ki ksh lez lv mo na no os pih rmy sah simple ss tet tr ur war zea
ace arz bcl bo ce cs dsb et fr gl he id ja kk ku lg map-bms mr nah nov pa pl rn sc sk st tg ts uz wo zh
af as
each directory has two csv files, I thought of using os.walk() function but I think my understanding of the os.walk is incorrect and thats why currently what I have doesn't produce anything.
import sys, os
import csv
root_path = os.path.expanduser(
'~/data/missing_files')
def combine_csv_files(path):
for root, dirs, files in os.walk(path):
for dir in dirs:
for name in files:
if name.endswith(".csv"):
csv_path = os.path.expanduser(root_path + name)
if os.path.exists(csv_path):
try:
with open(csv_path, 'rb') as f:
t = f.read().splitlines()
print t
except IOError, e:
print e
def main():
combine_csv_files(root_path)
if __name__=="__main__":
main()
My questions are:
- What am I doing wrong here?
- Can I read a one csv column from another file and add that data as a column to another file because csv files are more row dependent and here there are no dependency between rows.
At the end i am trying to get csv file like this, (Here are the potential headers)
ab_csv_data_file1, ab_csv_data_file2, arc_csv_data_file1, arc_csv_data_file2