1

I have some stock price data in InfluxDB with columns Ticker and Price. For example:

 Time  Ticker Price
------ ------ ------
 12:02  IBM   100.12
 12:02  MSFT   50.15
 12:03  IBM   100.15
 12:04  MSFT   51.00
 12:05  AMZN  200.00

I would like to extract the latest prices for each stock, even though they may be on different times. So the final selection should look like

 Time  Ticker Price
------ ------ ------
 12:03  IBM   100.15
 12:04  MSFT   51.00
 12:05  AMZN  200.00

In regular SQL, one would usually do it like this:

SELECT values.*
  FROM (SELECT Ticker, MAX(Time) AS MaxTime
          FROM StockHistory
         GROUP BY Ticker) as keys
       INNER JOIN StockHistory as values
               ON keys.Ticker  = values.Ticker
              AND keys.MaxTime = values.Time

The problem is, Influx does not seem to support INNER JOIN or any other kind of join. I am just starting to learn it, and for a time series db, this type of a problem must be one of the most frequent ones it is built for. How do you do this in a fast way?

Speed is of concern for me, since I am looking at roughly 5-15 million rows in the table, and 150,000 different tickers (not all are stocks, there are many instruments I am storing).

Thank you very much.

PS If it matters, I will be accessing Influx through the Python API and can do this filtering in the code, but would strongly prefer to do it on the DB side to minimize transmission of a huge number of rows over the network...

UPDATE

I saw this question about a left join, but it seems to be outdated, and i need an inner one on time, something the answer specifically implies would be supported...

Possible Approach

Will this work?

SELECT LAST(Time), Price
  FROM StockHistory
 GROUP BY Ticker

If yes, how fast would it be on a large table (see above for measurements)?

gt6989b
  • 4,125
  • 8
  • 46
  • 64

0 Answers0