0

I am getting Itertools, glob, pathlib, OS` all this and getting lost in which one to use.

Is there a best most efficient way to access probably a million rows in each csv , to run numbers of the price data.

i'm still yet to get a proper working code to spit out what I want, spent few days scrolling, copy/pasting and reviewing code but still cant work it out.

at the moment, each csv is a sub folder of there own day, same name as folder the csvs are. and im trying to grab all the price data, which should be every second and filter them into 1minute buckets and then go from there.

Any help appreciated. Cheers.

Sample csv
10 headers

code isnt in right order thats jusst i think two parts of what im trying to do

timestamp,symbol,side,size,price,tickDirection,trdMatchID,grossValue,homeNotional,foreignNotional
2020-05-22D00:00:52.012073000,ADAM20,Sell,20000,5.76e-06,ZeroPlusTick,face0e84-701d-9870-d90b-d6f8e61149af,11520000,20000,0.1152
2020-05-22D00:02:06.020714000,ADAM20,Sell,1873,5.76e-06,ZeroPlusTick,8179be8a-4256-c2dc-ff21-d5f96171e4c4,1078848,1873,0.01078848
2020-05-22D00:02:21.342170000,ADAM20,Sell,1816,5.76e-06,ZeroPlusTick,43c1cb71-6dfa-0c36-2445-1d2b08b8621d,1046016,1816,0.01046016
2020-05-22D00:10:53.716072000,ADAM20,Sell,190,5.75e-06,MinusTick,0dc7fb59-1dcb-8aa0-bb9d-5cab39a44b63,109250,190,0.0010925
import pandas as pd
from glob import glob

stock_files = sorted(glob('gangshit/trade_*.csv'))
stock_files

print(glob('gangshit\trade_*.csv')

import csv
import itertools as it


def to_minute(row):
    return row[0][:16]


with open('data.csv') as src:
    reader = csv.reader(src)
    next(reader)
    for key, group in it.groupby(reader, key=to_minute):
        print(f'* {key}')
        for row in group:
            print(row)
        print()




NYC Coder
  • 7,424
  • 2
  • 11
  • 24
igotbags
  • 11
  • 2
  • Pandas is faster than the CSV module for importing larger data files (i.e. 1M rows) as [illustrated by this answer](https://stackoverflow.com/questions/62139040/python-csv-module-vs-pandas/62139456#62139456) – DarrylG Jun 03 '20 at 13:47
  • isnt pandas the main thing of all the others tho like itertools – igotbags Jun 04 '20 at 07:41

0 Answers0