0

I have a .txt file with three columns: Time, ticker, price. The time is spaced in 15 second intervals. It looks like this uploaded to jupyter notebook and put into a Pandas DF.

time          ticker price
0   09:30:35    EV  33.860
1   00:00:00    AMG 60.430
2   09:30:35    AMG 60.750
3   00:00:00    BLK 455.350
4   09:30:35    BLK 451.514
 ...    ... ... ...
502596  13:00:55    TLT 166.450
502597  13:00:55    VXX 47.150
502598  13:00:55    TSLA    529.800
502599  13:00:55    BIDU    103.500
502600  13:00:55    ON  12.700

# NOTE: the first set of data has the data at market open for -
# every other time point, so that's what the 00:00:00 is. 
#It is only limited to the 09:30:35 data.

I need to create a function that takes an input (a ticker) and then creates a bar chart that displays the data with 5 minute ticks ( the data is every 20 seconds, so for every 15 points in time).

So far I've thought about separating the "mm" part of the hh:mm:ss to just get the minutes in another column and then right a for loop that looks something like this:

for num in df['mm']:
    if num %5 == 0:
       print('tick')

then somehow appending the "tick" to the "time" column for every 5 minutes of data (I'm not sure how I would do this), then using the time column as the index and only using data with the "tick" index in it (some kind of if statement). I'm not sure if this makes sense but I'm drawing a blank on this.

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
dergky
  • 105
  • 1
  • 9

1 Answers1

1

You should have a look at the built-in functions in pandas. In the following example I'm using a date + time format but it shouldn't be hard to convert one to the other.

Generate data

%matplotlib inline
import pandas as pd
import numpy as np

dates = pd.date_range(start="2020-04-01", periods=150, freq="20S")
df1 = pd.DataFrame({"date":dates,
                    "price":np.random.rand(len(dates))})
df2 = df1.copy()
df1["ticker"] = "a"
df2["ticker"] = "b"

df =  pd.concat([df1,df2], ignore_index=True)
df = df.sample(frac=1).reset_index(drop=True)

Resample Timeseries every 5 minutes

Here you can try to see the output of

df1.set_index("date")\
   .resample("5T")\
   .first()\
   .reset_index()

Where we are considering just the first element at 05:00, 10:00 and so on. In general to do the same for every ticker we need a groupby

out = df.groupby("ticker")\
        .apply(lambda x: x.set_index("date")\
                          .resample("5T")\
                          .first()\
                          .reset_index())\
        .reset_index(drop=True)

Plot function

def plot_tick(data, ticker):
    ts = data[data["ticker"]==ticker].reset_index(drop=True)
    ts.plot(x="date", y="price", kind="bar", title=ticker);

plot_tick(out, "a")

enter image description here

Then you can improve the plot or, eventually, try to use plotly.

rpanai
  • 12,515
  • 2
  • 42
  • 64