Is using a timestamp field with concat(to_date) the most efficient way to query previous day in Impala?

Asked Jan 21 '19 at 12:43

Active Jan 21 '19 at 12:43

Viewed 109 times

I am querying data from HDFS using Impala in a python script using the python library Impyla. The specific data is proxy data and there is tons of it. I have a script that runs daily to pull the previous day and runs statistics. Currently I am using the devicereceipttime field for this query which is stored as a timestamp.

from impala.dbapi import connect
from impala.util import as_pandas
import pandas as pd

#Pull desired features from the proxy_realtime_p table
cursor.execute('select request, count(*) as count \
from default.proxy_realtime_p \
where devicereceipttime BETWEEN concat(to_date(now() - interval 1 days), " 00:00:00") and concat(to_date(now() - interval 1 days), " 23:59:59") \
group by request \
order by count desc')

This query takes a little bit and would like to speed this up if possible. From the given fields below is my query the most efficient?

devicereceipttime (timestamp)
year (int)
month (int)
day (int)
hour (int)
minute (int)
seconds (int)

asked Jan 21 '19 at 12:43

sectechguy

2,037
4
28
61

Is using a timestamp field with concat(to_date) the most efficient way to query previous day in Impala?

0 Answers0