Add a HAVING
clause to put a condition on the group:
SELECT repository_name
FROM [githubarchive:github.timeline]
WHERE repository_language = 'Java'
AND PARSE_UTC_USEC(repository_created_at) BETWEEN PARSE_UTC_USEC('1996-01-01 00:00:00') AND PARSE_UTC_USEC('2015-05-30 00:00:00')
GROUP BY repository_name
HAVING COUNT(*) < 100 -- Add this clause
ORDER BY COUNT(*) DESC -- Add this to make the LIMIT meaningful
LIMIT 100
I don't know if the 100
of the LIMIT
is related to the 100
maximum commit count. If so, it's not needed. If not (and you actually want to limit the rows to 100) then add an ORDER BY to chose which 100 rows you want (rather than 100 random ones).
I also simplified the date range condition to a BETWEEN
.
Although the above is valid SQL (tested OK in MySQL, Postgres, SQLServer and Oracle), in case bigquery can't handle order by COUNT(*)
(as reported by OP), use a subquery:
SELECT * FROM (
SELECT repository_name, COUNT(*) commit_count
FROM [githubarchive:github.timeline]
WHERE repository_language = 'Java'
AND PARSE_UTC_USEC(repository_created_at) BETWEEN PARSE_UTC_USEC('1996-01-01 00:00:00') AND PARSE_UTC_USEC('2015-05-30 00:00:00')
GROUP BY repository_name
HAVING COUNT(*) < 100
) x
ORDER BY commit_count DESC
LIMIT 100