I am using Pandas module in python 3.5 to read crosstabs recursively from sub-directories and I want to concatenate the crosstabs inside the for loop after I call pd.crosstab() and after the for loop write the output to an excel file. I tried copying table1 into table3 (see code below) after I call pd.crosstab(), but if some values are not present in the latter data files then table3 shows NaN for those entries. I looked at pd.concat, but can't find an example of how to use it in a for loop.
Data files look like (there are 100s of files and with many columns, but here showing only the columns I am interested in):
First Data File
StudentID Grade
3 A
2 B
1 A
Second Data File
StudentID Grade
1 B
2 A
3 A
Third Data File
StudentID Grade
2 C
1 B
3 A
and so on ....
At the end the output should be like:
Grade A B C
StudentID
1 1 2 0
2 1 1 1
3 3 0 0
My python program looks like (removing the imports from the top of the file)
.....
fields = ['StudentID', 'Grade']
path= 'C:/script_testing/'
i=0
for filename in glob.glob('C:/script_testing/**/*.txt', recursive=True):
temp = pd.read_csv(filename, sep=',', usecols=fields)
table1 = pd.crosstab(temp.StudentID, temp.Grade)
# Note the if condition is executed only once to initlialize table3
if(i==0):
table3 = table1
i = i + 1
table3 = table3 + table1
writer = pd.ExcelWriter('Report.xlsx', engine='xlsxwriter')
table3.to_excel(writer, sheet_name='StudentID_vs_Grade')
writer.save()