I'm trying to process some IP data that we store on our clickhouse database. We have some users who have IPv6 addresses logged, and some people have multiple IP addresses logged, so what I am trying to achieve is to get only the IPv4 addresses, and if there are multiple IP addresses listed, then I choose the first one logged.
Here is the query that I made to filter them:
SELECT IF(ip LIKE '%,%', arrayElement(splitByChar(',', assumeNotNull(ip)), 1), ip) AS ip
FROM usage_analytics.users
WHERE ip NOT LIKE '%:%'
the results are not consistent. Sometimes it works fine, and gets all IPv4 addresses. However sometimes it returns null rows always at around 70 rows into the results. This happens around 4/5 times when you run this query.
What's going on? Is this a clickhouse issue, or a logic issue, or something else I'm not considering?