0

For example there are two dimensions: [country, website] and one metric: [PV].

I want to know the average PV of website for each country.

To make it, it's easy to get the total PV in each country, however it's difficult to get the count of website in each country, furthermore the expect result is the total PV(in each country) divided by the count of website(in each country).

What I can do is apply "groupBy" query by country & website as below, and then group the result by country outside in my application. It's very very very slow, because the query extract lots of data from Druid and most of them is meaningless just for a sum.

{
    "queryType": "groupBy",
    "dataSource": "--",
    "dimensions": [
        "country",
        "website"
    ],
    "granularity": "all",
    "intervals": [
        "--"
    ],
    "aggregations": [
        {
            "type": "longSum",
            "name": "PV",
            "fieldName": "PV"
        }
    ]
}

Any one can help with this? I'm wondering it's impossible such a common query is not supported by Druid.

Thanks in advance.



To be clear, I describe my expected result by SQL, if you have known what I want to do or not familiar to SQL, please ignore the following part.

SELECT country, sum(a.PV_all) / count(a.website) as PV_AVG FROM
(SELECT country, website, SUM(PV) as PV_all FROM DB GROUP BY country, website  ) a
GROUP BY country
Junjie
  • 1,145
  • 3
  • 21
  • 37

1 Answers1

0

Have you tried using a nested groupBy query ? druid support that. In nutshell you can have something like

{
  "queryType": "groupBy",
  "dataSource":{
    "type": "query",
    "query": {
      "queryType": "groupBy",
      "dataSource": "yourDataSource",
      "granularity": "--",
      "dimensions": ["country", "website"],
      "aggregations": [
        {
            "type": "longSum",
            "name": "PV",
            "fieldName": "PV"
        }
      ],
      "intervals": [ "2012-01-01T00:00:00.000/2020-01-03T00:00:00.000" ]
    }
  },
  "granularity": "all",
  "dimensions": ["country"],
  "aggregations": [
    ----
  ],
  "intervals": [ "2012-01-01T00:00:00.000/2020-01-03T00:00:00.000" ]
} 
Slim Bouguerra
  • 359
  • 1
  • 8