There is the need to cast and split the fields first.
The date
is a value formated as "yyyymmdd....". Therefore, I suggest casting the value as a string and consider the first four characters as the year.
There is no V1.5TONE
column, but V2Tone
. It consists of a string with several digital numbers seperated by a comma. The string has to be split first. Then each component needs to be casted a decimal number.
The table gdelt-bq.extra.sourcesbycountry
should map an url to a single country. It has duplicated countries to one url! To eliminate at least some duplicated value the inner select
with the group by is used.
To obtain the values grouped by Year,Country all other dimensional columns need to be commented out.
SELECT
#DATE,
substr(cast(date as string),0,4) AS year,
FIPS as Country,
#LOCATIONS,
AVG(cast(split(V2Tone,",")[safe_offset(0)] as decimal )) as Avg_Tone,
AVG(cast(split(V2Tone,",")[safe_offset(1)] as decimal )) as PositiveS,
AVG(cast(split(V2Tone,",")[safe_offset(2)] as decimal )) as NegativeS,
COUNT(*) as counts,
From `gdelt-bq.gdeltv2.gkg_partitioned`
left join
(SELECT Domain, FIPS
from `gdelt-bq.extra.sourcesbycountry`
group by 1,2) country
on SourceCommonName=country.Domain
Where
DATE(_PARTITIONTIME) BETWEEN DATE('2020-01-01') AND DATE('2020-01-31')
AND Locations like '%CH%'
GROUP BY Year,Country
ORDER BY Year,Country
Also in where
segment: Do not mix DATE and TIMESTAMP.