6

I'm a little unfamiliar with ClickHouse and still study it by trial and error. Got a question about it.

Talking about the star scheme of data representations, with dimensions and facts. Currently, I keep everything in PostgreSQL, but OLAP queries with aggregations start to show bad timing, so I'm going to move some fact tables to ClickHouse. Initial tests of CH show incredible performance, however, in real life the queries should include joins to dimension tables from PostgreSQL. I know I can connect them as dictionaries.

Question: I found that using dictionaries I can make requests similar to LEFT JOINs in good old RDBMS, ie values from resultset could be joined with corresponding values from the dictionary. But can they be filtered by some restrictions on dictionary keys (as in INNER JOIN)? For example, in PostgreSQL I have a table users (id, name, ...) and in ClickHouse I have table visits (user_id, source, medium, session_time, timestamp, ...) with metrics about their visits to the site. Can I make a query to CH to fetch aggregated metrics (number of daily visits for given date range) of users which name matches some condition (LIKE "EVE%" for example)?

Ilya Konyukhov
  • 2,666
  • 1
  • 12
  • 21

2 Answers2

2

It sounds like ODBC table function is what you're looking for. ClickHouse have a bunch of table functions which work like Postgres foreign tables. The setup is similar to Dictionaries but you gain the traditional JOIN behavior. It currently doesn't show up in the official document. You can refer to this https://github.com/yandex/ClickHouse/blob/master/dbms/tests/integration/test_odbc_interaction/test.py#L84 . And in near future (this year), ClickHouse will have standard JOIN statement supported.

Amos
  • 3,238
  • 4
  • 19
  • 41
1

The dictionary will basically replace the value first. As I understand it your dictionary would be based off your users table.

Here is an example. Hopefully I am understanding your question.

select dictGetString('accountidmap', 'domain', tuple(toString(account_id))) AS domain, sum(session) as sessions from session_distributed where date = '2018-10-15' and like(domain, '%cats%') group by domain

This is a real query on our database so If there is something you want to try/confirm let me know

ozzieisaacs
  • 833
  • 2
  • 11
  • 23
  • Could you provide more details? As I understand, `session_distributed` is ClickHouse table, and `accountidmap` is PG's? What's `dictGetString`? – Ilya Konyukhov Nov 03 '18 at 02:09
  • accountidmap is the dictionary name. dictGetString is the clickhouse function accesses the dictionary ifself and performs the lookup. The accountidmap lets me swap an account id for all sorts of information about the account without having to do any expensive joins. I had several tables I used to join to that were all able to be replaced by dictionaries, – ozzieisaacs Nov 03 '18 at 17:11