-1

So, let's say I have this data:

store_name               | latitude    | longitude | store_id | same_location_count | same_location_store_id
SR Restaurant and Cafe   | -41.575449  | 147.16824 | 1112     | 0           | null
Big Bite Burgers         | -41.575449  | 147.16824 | 1113     | 0           | null
Amigos                   | -41.575449  | 147.16824 | 1114     | 0           | null
Domino's                 | -38.33983   | 143.58384 | 1115     | 0           | null

I want to identity the stores that are in same coordinates.

The expected result looks like this:

store_name               | latitude    | longitude | store_id | same_location_count | same_location_store_id
SR Restaurant and Cafe   | -41.575449  | 147.16824 | 1112     | 2           | 1113:1114
Big Bite Burgers         | -41.575449  | 147.16824 | 1113     | 2           | 1112:1114
Amigos                   | -41.575449  | 147.16824 | 1114     | 2           | 1112:1113
Domino's                 | -38.33983   | 143.58384 | 1115     | 0           | null

If the stores are in same coordinates, we increment the same_location_count and get the ID of the same store location and concatenate it to same_location_store_id.

Tenserflu
  • 520
  • 5
  • 20
  • what the data type of following columns: `latitude`, `longitude`, `store_id` - usually those are `FLOAT64`, `FLOAT64`, `INT64` - please clarify! – Mikhail Berlyant Nov 17 '21 at 21:59

3 Answers3

1

Maybe something similar to the following. This works for Postgres DB, I think it should work in BigQuery also?

SELECT
  stores.*,
  COUNT(store_id) OVER (PARTITION BY latitude,longitude) - 1 AS same_location_count,
  NULLIF(
    TRIM(REPLACE(
      CONCAT(':', STRING_AGG(CAST(store_id AS TEXT), ':') OVER (PARTITION BY latitude,longitude), ':'),
      CONCAT(':', CAST(store_id AS TEXT), ':'),
      ':'
    ), ':'),
    ''
  ) AS same_location_store_id
FROM stores

Fiddle link: https://www.db-fiddle.com/f/gUfJm7X75Fdx1nuy9hSrsD/1

Edit: Updated from comment.

Count is the easy part; just subtract 1 from value. Same location stores you can see has become a horrible mess...

Basically the logic is:

  1. Collect all store_ids for same location (now using string_agg instead of array_agg
  2. Make a string of :idA:idB:...:idN: with leading,trailing :
  3. Remove the store_id of same store by replace ":id:" with just ":"
  4. Tidy up afterwards to make sure null for empty string, and remove leading,trailing : again

Step 3 works only because store_id is integer, and so guaranteed not to contain separator char :

Hitobat
  • 2,847
  • 1
  • 16
  • 12
1

You can use count and string_agg function for achieving this.

Test code:

with test as (
  select 'id1' id, '-41.575449' lat, '1112' long
  union all
  select 'id2' id, '-41.575449' lat, '1112' long
  union all
  select 'id3' id, '-41.575449' lat, '1112' long
  union all
  select 'id4' id, '-38.33983' lat, '1115' long
)
SELECT
  *,
  (COUNT(id) OVER (PARTITION BY lat, long))-1 AS same_location_count,
  REPLACE( STRING_AGG(CONCAT(id,':'),'') OVER (PARTITION BY lat, long), CONCAT(id,':'), '') AS same_location_store_id
FROM
  test

Edit: match the exact scenario without counting the id of each row and excluding it from the concatenation.

Gumaz
  • 239
  • 1
  • 4
  • Almost, but each store with same coordinates shouldn't included itself in the incrementation and concatenation of store Ids. :) – Tenserflu Nov 17 '21 at 15:46
  • 1
    This is a very beautiful answer @Gumaz. I just subtract the `same_location_count` by 1 and replace the each store id from `same_location_store_id` to "". Can you please edit the answer so it will match the answer of my question so I can tag it as correct answer? For the future explorer also... – Tenserflu Nov 17 '21 at 15:59
  • I just added a little bit of string processing for achieving exact values for same_location_store_id field and delimiters – Gumaz Nov 17 '21 at 16:56
1

Consider below approach (BigQuery)

select * except(x), 
  (( select as struct count(*) as same_location_count,
      string_agg('' || id, ':' order by id) as same_location_store_id
    from t.x id
    where id != store_id
  )).*
from (
  select *, array_agg(store_id) over (partition by '' || latitude, '' || longitude) as x, 
  from your_table
) t      

if applied to sample data in your question - output is

enter image description here

Note: I am assuming the data type of latitude, longitude, store_id columns are respectively FLOAT64, FLOAT64, INT64 and that is why you see in few places something like '' || latitude - this is to cast them to STRING. So if those columns already STRINGs - you can remove '' || part

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230