I am getting Itertools, glob, pathlib, OS` all this and getting lost in which one to use.
Is there a best most efficient way to access probably a million rows in each csv
, to run numbers of the price data.
i'm still yet to get a proper working code to spit out what I want, spent few days scrolling, copy/pasting and reviewing code but still cant work it out.
at the moment, each csv is a sub folder of there own day, same name as folder the csvs are. and im trying to grab all the price data, which should be every second and filter them into 1minute buckets and then go from there.
Any help appreciated. Cheers.
Sample csv
10 headers
code isnt in right order thats jusst i think two parts of what im trying to do
timestamp,symbol,side,size,price,tickDirection,trdMatchID,grossValue,homeNotional,foreignNotional
2020-05-22D00:00:52.012073000,ADAM20,Sell,20000,5.76e-06,ZeroPlusTick,face0e84-701d-9870-d90b-d6f8e61149af,11520000,20000,0.1152
2020-05-22D00:02:06.020714000,ADAM20,Sell,1873,5.76e-06,ZeroPlusTick,8179be8a-4256-c2dc-ff21-d5f96171e4c4,1078848,1873,0.01078848
2020-05-22D00:02:21.342170000,ADAM20,Sell,1816,5.76e-06,ZeroPlusTick,43c1cb71-6dfa-0c36-2445-1d2b08b8621d,1046016,1816,0.01046016
2020-05-22D00:10:53.716072000,ADAM20,Sell,190,5.75e-06,MinusTick,0dc7fb59-1dcb-8aa0-bb9d-5cab39a44b63,109250,190,0.0010925
import pandas as pd
from glob import glob
stock_files = sorted(glob('gangshit/trade_*.csv'))
stock_files
print(glob('gangshit\trade_*.csv')
import csv
import itertools as it
def to_minute(row):
return row[0][:16]
with open('data.csv') as src:
reader = csv.reader(src)
next(reader)
for key, group in it.groupby(reader, key=to_minute):
print(f'* {key}')
for row in group:
print(row)
print()