1

I need to create the process ETL in the next file:

LIST OF TRANSACTIONS
COD. SE      ,COMERCIAL NAME                                    ,TYPE          ,DATE OP        ,TIME OP       ,DATE TX            ,ID UNIQUE              ,
1010101      ,CARL GAME                                         ,1244          ,09/12/2020     ,190047        ,201207             ,73777777777777777777777,
2020202      ,UNIQUE KINGDOM                                    ,1244          ,08/12/2020     ,84943         ,201208             ,73777888888888888888888,
Cantidad de Registros :     2
                         Cantidad                            Importe
Soles                          3                             0000000.00
Dolares                        0                              000000.00

I download that kind of file from a Data Server, theres no way i can change that.

I need the next process with Python:

  1. Extract the zip information
  2. Take the .csv file and delete the first line and the last 4, because dont have usefull information.
  3. Use the information and give format because there are numbers, strings, date.
  4. With the information i need two options:
    4.1. Add the information to SQL.
    4.2. Replace the information to SQL.
  5. Create a script apply everyday and make this automatized.

There are two kind of files:

  • One who adds information who can zise: 320 MB
  • Theres other who need to replace who size is: 3 GB

Let me know what can i do, I try only this:

 import pandas as pd  
df=pd.read_csv (r'C:\\Users\....\OP00077.csv',header=None)  
filas=len(df.index)  
print("Filas: ",filas)  
df.drop(df.index[[filas-1]],inplace=True)  
df.drop(df.index[[filas-2]],inplace=True)  
df.drop(df.index[[filas-3]],inplace=True)  
df.drop(df.index[[filas-4]],inplace=True)  
df.drop(df.index[0],inplace=True)  
filas=len(df)  
print("Filas: ",filas)  
print (df) 
  • Please, do not use *doesn't work* because it means nothing to us. What the error you get? You can edit your answer and add it with traceback (as text, not screenshot). Also you should be more specific and remove the details irrelevant to problem: you can focus on just CSV processing, if you successfully downloaded and extracted your file. – astentx Dec 16 '20 at 20:52
  • To remove last N rows and first row you can use `df = df[1:-N]` – astentx Dec 16 '20 at 21:04
  • Do you check the .csv file? – Darth_Kenobi Dec 16 '20 at 21:23
  • What should I see? It is not so huge file to post it as archive on some sharing, you can save it as text file in pastebin, for example, or just embed content into question. Anyway, downloading some archive from file sharing is not a good idea. But you didn't describe what is **your** issue with it. – astentx Dec 16 '20 at 23:08
  • Now is better? Because with that kind of file I can't use csv librarys because dont recognize, try copying that data into a text file and save as .csv. – Darth_Kenobi Dec 17 '20 at 13:23
  • You could use pd.read_csv(, skiprows=1). This will skip the first row and then give you the column names. You can then do df[:-4] to skip last 4 rows. – Babu Arunachalam Dec 18 '20 at 08:25

0 Answers0