0

I have imported a CSV file into a List of Lists. ie Normal database.

The actions I need to do on the database is: delete a few records based on field data, & delete duplicates.

I figured converting the List of Lists into a Pandas Dataframe would be easier to work with, however Dataframes seem to be difficult for deleting records.

Is there a Pandas method for searching & deleting records? I've had to iterate via FOR loop, which is apparently NOT good Pandas methodology. I got DF.append() working to fill a 2nd DF with valid records, however I received the "depreciated" message. I have NOT been able to get pandas.concat() to work correctly for me. Seems to me that pandas.concat() prefers if I feed it a List of Lists again?

The code segment that I'm building (contains some debug print statements)

SummaryTable = Read_Control_File(AlarmsCSV)

Convert List of Lists to Dataframe

Summary_DF = pandas.DataFrame(SummaryTable,
columns=['pair','price','direction',
'exch','desc','telegram','discord','email'])

print ('Summary DF') print (Summary_DF)

2nd DF to populate with valid records

Temp_DF = pandas.DataFrame(columns=['pair','price','direction',
'exch','desc','telegram','discord','email'])

for i in range (0, len(Summary_DF)) :

TestPair = Summary_DF.iloc[i]['pair']
if TestPair[:1] != '#' :
    
    print ('Pair without #')
    print (Summary_DF.iloc[i]['pair'])
    print ('DF Row')
    print (Summary_DF.iloc[i])
    Temp_DF = pandas.concat([Temp_DF,Summary_DF.iloc[i]], \
            ignore_index = True)

    print ('Temp DF AFTER Concat')
    print (Temp_DF)

print ('Temp DF') print (Temp_DF)

Summary_DF = Temp_DF

print ('Summary DF after DROP') print (Summary_DF) sys.exit()`

Database (small enough for inclusion)

Summary DF pair price direction ... telegram discord email 0 BTC-USDT 30835 > ... x1 BTC-USDT 29660 > ... x2 BTC-USDT 23390 < ... x3 BTC-USDT 20000 < ... x4 BTC-USDT 15000 < ... x5 BTC-USDT 12000 < ... x6 CRO_USDT 0.073 > ... x7 CRO_USDT 0.054 < ... x8 CRO_USDT 0.043 < ... x9 VVS_USDT 0.00000377 > ... x10 #VVS_USDT 0.00000335 < ... x11 VVS_USDT 0.00000290 < ... x12 #LOKI-USDT 0.11 < ... x13 #LOKI-USDT 0.05 < ... x14 SCRT-USDT 0.69 > ... x15 SCRT-USDT 0.608 > ... x16 #SCRT-USDT 0.5 < ... x17 SCRT-USDT 0.3 < ... x18 HEX-USDT 0.01 < ... x19 #PRV-BUSD 0.50 < ... x

Even a single record CONCAT... in the FOR LOOP looks bizarre ....

Temp DF AFTER Concat pair price direction exch desc telegram discord email 0 0 NaN NaN NaN NaN NaN NaN NaN NaN BTC-USDT 1 NaN NaN NaN NaN NaN NaN NaN NaN 30835 2 NaN NaN NaN NaN NaN NaN NaN NaN > 3 NaN NaN NaN NaN NaN NaN NaN NaN binance 4 NaN NaN NaN NaN NaN NaN NaN NaN BTC new High .. ... ... ... ... ... ... ... ... ... 75 NaN NaN NaN NaN NaN NaN NaN NaN cdc 76 NaN NaN NaN NaN NaN NaN NaN NaN ABOVE Bottom Band 77 NaN NaN NaN NaN NaN NaN NaN NaN x 78 NaN NaN NaN NaN NaN NaN NaN NaN79 NaN NaN NaN NaN NaN NaN NaN NaN

[80 rows x 9 columns]

I think all is explained previously....

Rupert
  • 21
  • 3
  • You shouldn't iterate the rows in place but use vectorization. For instance you loop could be replaced by something like: `out = Summary_DF[~Summary_DF['pair'].str.startswith('#')]` – mozway Jun 09 '23 at 04:08
  • I provided this exact logic as comment above, give it a try ;) – mozway Jun 09 '23 at 08:48
  • @mozway OK, I need to learn about vectorization. However, what I'm trying to do is delete the rows where Summary_DF['pair'] starts with '#'. You can see that in the code, my line is ....**if TestPair[:1] != '#' : ** Is there a simple way of finding & deleting rows USING vectorization? Could I use **Summary_DF = Summary_DF.drop(out)** ? – Rupert Jun 09 '23 at 08:52
  • Thanks, I've tried your code. I see my idea of using DF.drop() doesn't work. However your code DOES work. Also fascinating/wierd that '~' means 'NOT', rather than '!' . – Rupert Jun 09 '23 at 09:27

0 Answers0