I'd probably use the pandas library for this. It has lots of nice features for working with time series in general and OHLC data in particular, but we won't use any here.
import glob
import numpy as np
import pandas as pd
stocks = glob.glob("stock*.csv")
total_tick = 0
for stock in stocks:
df = pd.read_csv(stock,
names=["time", "open", "high", "low", "close", "volume"],
parse_dates=[0], index_col="time")
tick = df["close"].diff().apply(np.sign).fillna(0.0)
total_tick += tick
total_tick.to_csv("tick.csv")
which produces an output looking something like
2013-09-16 09:30:00,0.0
2013-09-16 09:31:00,3.0
2013-09-16 15:59:00,-5.0
2013-09-16 16:00:00,-3.0
2013-09-17 09:30:00,1.0
2013-09-17 09:31:00,-1.0
where I've made up sample data looking like yours.
The basic idea is that you can read a csv file into an object called a DataFrame
:
>>> df
open high low close volume
time
2013-09-16 09:30:00 461.0100 461.4900 461.00 453.484089 183507
2013-09-16 09:31:00 460.8200 461.6099 460.39 474.727508 212774
2013-09-16 15:59:00 449.7200 450.0774 449.59 436.010403 146399
2013-09-16 16:00:00 450.1200 450.1200 449.65 455.296584 444594
2013-09-17 09:30:00 448.0000 448.0000 447.50 447.465545 173624
2013-09-17 09:31:00 449.2628 449.6800 447.50 477.785506 193186
We can select a column:
>>> df["close"]
time
2013-09-16 09:30:00 453.484089
2013-09-16 09:31:00 474.727508
2013-09-16 15:59:00 436.010403
2013-09-16 16:00:00 455.296584
2013-09-17 09:30:00 447.465545
2013-09-17 09:31:00 477.785506
Name: close, dtype: float64
Take the difference, noting that if we're subtracting from the previous value, then the initial value is undefined:
>>> df["close"].diff()
time
2013-09-16 09:30:00 NaN
2013-09-16 09:31:00 21.243419
2013-09-16 15:59:00 -38.717105
2013-09-16 16:00:00 19.286181
2013-09-17 09:30:00 -7.831039
2013-09-17 09:31:00 30.319961
Name: close, dtype: float64
Make this either positive or negative, depending on its sign:
>>> df["close"].diff().apply(np.sign)
time
2013-09-16 09:30:00 NaN
2013-09-16 09:31:00 1
2013-09-16 15:59:00 -1
2013-09-16 16:00:00 1
2013-09-17 09:30:00 -1
2013-09-17 09:31:00 1
Name: close, dtype: float64
And fill the NaN
with a 0.
>>> df["close"].diff().apply(np.sign).fillna(0)
time
2013-09-16 09:30:00 0
2013-09-16 09:31:00 1
2013-09-16 15:59:00 -1
2013-09-16 16:00:00 1
2013-09-17 09:30:00 -1
2013-09-17 09:31:00 1
dtype: float64
This assumes that the recording times match across all stocks: if not, there are powerful resampling tools available to align them.