I'm trying to implement a nested query to pull no more than one sample per log, and I think I know how to implement its components separately:
- Query a set of logs that contain data relevant to my analysis:
SELECT
runs.object_type as object_type,
runs.name as log_name,
from project.runs.latest_runs
WHERE object_type = "ROCKET"
group by object_type, log_name
This results in a list of log names, e.g. "log_name_2021_09_01", "log_name_2021_09_03" etc.
- Query no more than one event with a specific condition from a single known log:
SELECT
object.path_meters as pos,
object_speed as speed,
log.run as run_name,
FROM project.events.last30days
WHERE log.run = "log_name_2021_10_01"
AND object.speed > 0.0
LIMIT 1
The above query returns no more than one sample for the specified log.
How can I combine these queries to pull samples from a set of logs that is returned by Query 1, and at the same time there should be no more than one sample per log?
Update:
Let's say a DB contains three logs:
- log_name_2021_09_01. The associated object_type to the log is ROCKET. The log contains 100k data samples: 90k of them have object.speed = 0.0, 10k of them have speed > 0.0.
- log_name_2021_09_02. The associated object_type to the log is CAR. The log also contains 100k samples with a similar proportion to log 1.
- log_name_2021_09_03. The associated object_type to the log is ROCKET. The log also contains 100k samples with a similar proportion to log 1.
I'm only interested in logs with the object type ROCKET. Two logs correspond to this condition: log_name_2021_09_01 and log_name_2021_09_03. These log names can be obtained by query 1 depicted above. I'd like to pull only one sample point (with speed > 0) from each of the two logs. That is, in the end I'd like to have a query that returns two samples: one from log_name_2021_09_01 and one from log_name_2021_09_03.