0

I have extracted Table data from an image having multiple tables using Amazon textract and trying to map all the extracted data into a Output template CSV,

However there are multiple tables in the Extracted Input CSV file which are listed one below another. There are approx 7 tables which are listed one below other in each CSV.

Please suggest how to map the values from Input CSV to Output.

Input CSV file:

S.No    Item        Item_code      1st     2nd    3rd    4th    Avg
1      Math_book    BK001           27      36    35     23      30
2      Phy_book     BJ008           30      40    40     30      35
3      Hin_book     NK103           50      50    30     30      40
4      Che_book     CH001           40      40    40     20      35


S.No   Item_Name    Item_code      1st     2nd    3rd    4th    Avg
1      Math_book    BK001           27      36    35     23      30
2      Phy_book     BJ008           30      40    40     30      35
3      Hin_book     NK103           50      50    30     30      40


S.No   Product        Item_code      1st     2nd    3rd    4th    Avg
1      Phy_book     BJ008           30      40    40     30      35
2      Hin_book     NK103           50      50    30     30      40
3      Che_book     CH001           40      40    40     20      35
4      Bio_book     BI005           50      30    40     60      45

Expected output:

S.No   Product        Item_code      1st     2nd    3rd    4th
1      Math_book    BK001           54      72    70     46  
2      Phy_book     BJ008           90      120  120     90 
3      Hin_book     NK103          150      150   90     90 
4      Che_book     CH001           80      80    80     60 
5      Bio_book     BI005           50      30    40     60  

Code i have been trying to use :

df = pd.read_csv(r'input.csv')
df2 = pd.read_csv(r'output.csv')

How i Can add all the values by groupBy considering (Item,Item_Name,Product) columns submit the values in df2

Please Suggest

NKJ
  • 457
  • 1
  • 4
  • 11

1 Answers1

0

Just use grouby -

df  = df[df['S.No']!='S.No'].drop('S.No',axis=1) # drop header rows
df[df.columns.values[2:]]  = df[df.columns.values[2:]].apply(lambda x: x.astype(int)) #convert data to int type
df = df.groupby(['Item','Item_code'],as_index=False).sum()
df.to_csv('out.csv',index_label='S.No', sep= '\t') # specify the name of output file here
Nk03
  • 14,699
  • 2
  • 8
  • 22