I'm getting "Read timed out" when running a query on a 1,3b row db.
It is not a particular advanced query that groups together hashtags in tweets:
SELECT case when match(hashtag,
'[Cc]orona.*|COVID.*|[Cc]ovid.*|[Cc]oVID_19.*|[Cc]orvid19.*|COVD19.*|CORONA.*|KILLTHEVI.*|SARSCoV.*|ChineseVi.*|WuhanVir.*|ChinaVir.*|[Vv]irus.*|
[Qq]uarantine|[Pp]andemic.*|[Cc]linical[Tt]rial.*|FlattenTheCurve.*|SocialDistancing.*|StayHome.*|StayTheFHome.*|StayAtHome.*|stopthespread.*|
SafeHands.*|WashYourHands.*|SelfIsolation.*') then 'COVID19'
when match(hashtag, '[Jj]anta[Cc]urfew.*|[Jj]anata[Cc]urfew.*') then 'JantaCurfew'
when match(hashtag, 'Bhula.*') then 'Bhula'
when match(hashtag, '[Ss]t[Pp]atrick.*|HappyStPatrick') then 'StPatricks day'
when match(hashtag, '[Cc]hina.*') then 'China'
when match(hashtag, '[Ii]taly.*') then 'Italy'
when match(hashtag, '[Ii]ran.*') then 'Iran'
when match(hashtag, '[Ii]ndia.*') then 'India'
when match(hashtag, '[Hh]appy[Mm]others[Dd]ay.*|[Mm]others[Dd]ay.*') then 'MothersDay'
else hashtag END
as Hashtag,
SUM(CASE WHEN created >= '2020-05-14 00:00:00' AND created <= '2020-03-14 23:59:59' THEN 1 END) "May 14th'20",
SUM(CASE WHEN created >= '2020-05-13 00:00:00' AND created <= '2020-03-13 23:59:59' THEN 1 END) "May 13th'20",
SUM(CASE WHEN created >= '2020-05-12 00:00:00' AND created <= '2020-03-12 23:59:59' THEN 1 END) "May 12th'20"
FROM twitterDBhashtags
group by Hashtag
order by "May 12th'20" DESC limit 20;
Clickhouse is running on a striped hdd and accessed through GB network.
How can the timeout, if that is the challenge, be changed to allow for more time?
I would very much want to be able to run multi minutes queries without getting the "Read timed out" message, if possible.