Flat File Python SQL Automatize process ETL script

Question

I need to create the process ETL in the next file:

LIST OF TRANSACTIONS
COD. SE      ,COMERCIAL NAME                                    ,TYPE          ,DATE OP        ,TIME OP       ,DATE TX            ,ID UNIQUE              ,
1010101      ,CARL GAME                                         ,1244          ,09/12/2020     ,190047        ,201207             ,73777777777777777777777,
2020202      ,UNIQUE KINGDOM                                    ,1244          ,08/12/2020     ,84943         ,201208             ,73777888888888888888888,
Cantidad de Registros :     2
                         Cantidad                            Importe
Soles                          3                             0000000.00
Dolares                        0                              000000.00

I download that kind of file from a Data Server, theres no way i can change that.

I need the next process with Python:

Extract the zip information
Take the .csv file and delete the first line and the last 4, because dont have usefull information.
Use the information and give format because there are numbers, strings, date.
With the information i need two options:
4.1. Add the information to SQL.
4.2. Replace the information to SQL.
Create a script apply everyday and make this automatized.

There are two kind of files:

One who adds information who can zise: 320 MB
Theres other who need to replace who size is: 3 GB

Let me know what can i do, I try only this:

 import pandas as pd  
df=pd.read_csv (r'C:\\Users\....\OP00077.csv',header=None)  
filas=len(df.index)  
print("Filas: ",filas)  
df.drop(df.index[[filas-1]],inplace=True)  
df.drop(df.index[[filas-2]],inplace=True)  
df.drop(df.index[[filas-3]],inplace=True)  
df.drop(df.index[[filas-4]],inplace=True)  
df.drop(df.index[0],inplace=True)  
filas=len(df)  
print("Filas: ",filas)  
print (df)

Please, do not use *doesn't work* because it means nothing to us. What the error you get? You can edit your answer and add it with traceback (as text, not screenshot). Also you should be more specific and remove the details irrelevant to problem: you can focus on just CSV processing, if you successfully downloaded and extracted your file. — astentx, Dec 16 '20 at 20:52
To remove last N rows and first row you can use `df = df[1:-N]` — astentx, Dec 16 '20 at 21:04
What should I see? It is not so huge file to post it as archive on some sharing, you can save it as text file in pastebin, for example, or just embed content into question. Anyway, downloading some archive from file sharing is not a good idea. But you didn't describe what is **your** issue with it. — astentx, Dec 16 '20 at 23:08
Now is better? Because with that kind of file I can't use csv librarys because dont recognize, try copying that data into a text file and save as .csv. — Darth_Kenobi, Dec 17 '20 at 13:23
You could use pd.read_csv(, skiprows=1). This will skip the first row and then give you the column names. You can then do df[:-4] to skip last 4 rows. — Babu Arunachalam, Dec 18 '20 at 08:25

Flat File Python SQL Automatize process ETL script

0 Answers0

Linked