I have a folder struture in S3 which looks like this.
root/
├── parter-1/
| ├── config/
| │ ├── config.json
| │ └── feature.json
| ├── customer-1
| | ├── config/
| | │ ├── config.json
| | │ └── feature.json
| | └── data/
| | ├── model-1/
| | │ ├── input/
| | | | ├── current/
| | | | | ├── tbl1.csv
| | | | | └── tbl2.csv
| | | | └── archive/
| | | | | ├── aod=20211012/
| | | | | ├── tbl1.csv
| | | | └── tbl2.csv
| | | | └── aod=20211210/
| | | | ├── tbl1.csv
| | | | └── tbl2.csv
| | │ └── output/
| | | └──(Same as input)
| | ├── model-2/
| | │ └── (Same as model-1)
| | └── input.zip
| ├── customer-2
| . └── (Same as customer-1)
| .
| |
. └── customer-n
. └── (Same as customer-1)
└── partner-n
└── (Same as partner-1)
Now, I need to generate athena tables in AWS (for tb1, tbl2) and so on. all files with the sanme tbl1.csv have the same schema (columns) and same is true for tbl2 and so on. I need to completely ignore the config folder, zip file & any json files that are present in the directory.
the final output table needs to be someting like this.
**tbl1**
col_1 | col_2 | col_3 | partner | customer | model |