I have some stock price data in InfluxDB with columns Ticker
and Price
. For example:
Time Ticker Price
------ ------ ------
12:02 IBM 100.12
12:02 MSFT 50.15
12:03 IBM 100.15
12:04 MSFT 51.00
12:05 AMZN 200.00
I would like to extract the latest prices for each stock, even though they may be on different times. So the final selection should look like
Time Ticker Price
------ ------ ------
12:03 IBM 100.15
12:04 MSFT 51.00
12:05 AMZN 200.00
In regular SQL, one would usually do it like this:
SELECT values.*
FROM (SELECT Ticker, MAX(Time) AS MaxTime
FROM StockHistory
GROUP BY Ticker) as keys
INNER JOIN StockHistory as values
ON keys.Ticker = values.Ticker
AND keys.MaxTime = values.Time
The problem is, Influx does not seem to support INNER JOIN
or any other kind of join. I am just starting to learn it, and for a time series db, this type of a problem must be one of the most frequent ones it is built for. How do you do this in a fast way?
Speed is of concern for me, since I am looking at roughly 5-15 million rows in the table, and 150,000 different tickers (not all are stocks, there are many instruments I am storing).
Thank you very much.
PS If it matters, I will be accessing Influx through the Python API and can do this filtering in the code, but would strongly prefer to do it on the DB side to minimize transmission of a huge number of rows over the network...
UPDATE
I saw this question about a left join, but it seems to be outdated, and i need an inner one on time, something the answer specifically implies would be supported...
Possible Approach
Will this work?
SELECT LAST(Time), Price
FROM StockHistory
GROUP BY Ticker
If yes, how fast would it be on a large table (see above for measurements)?