1

We have a program that gets data for GPS coordinates. We can get a dataset based on a region id in our database, which would look something like this:

gps_coords | year      | value
105        | 2010      |  5.63
102        | 1990      |  3.2
103        | 2000      |  13.23
...

Now, we want to combine that and another query set, with a sql like a.value + b.value, or (a.value+50)*b.value/100. We also filter our query by metrics (what dataset they want to query).

The problem is how to retrieve both gps_coords as one column. I figured we have to do a JOIN on the same table, but I don't know how to get both a.gps_coords and b.gps_coords in the same column.

My query (below) executes in 100 ms with zero rows. So, I'm not sure what's going wrong. Does anyone know how I can get both a's and b's gps_coords in the same column? I'm using Postgresql, but anything would help. Thanks!

Schema

data:

gps_coords
year
value
metric

regions:

gps_coords
region_id

Sample Data:

Data

|  gps_coords  |  year  |  value  |  metric  |
|  506         |  2010  |  23.23  |  5       |
|  507         |  2010  |  10.32  |  5       |
|  508         |  2010  |  28.5   |  5       |
|  509         |  2010  |  45.24  |  5       |
|  506         |  2010  |  213.53 |  4       |
|  507         |  2010  |  0      |  4       |
|  508         |  2010  |  434.4  |  4       |
|  509         |  2010  |  381.1  |  4       |

Regions

|  gps_coords  |  region_id  |
|  506         |  1          |
|  506         |  2          |
|  506         |  3          |
|  507         |  1          |
|  508         |  1          |
|  508         |  3          |
|  509         |  1          |
|  509         |  2          |

Desired Output:

Supposing I want coordinates for metric 5 in region 1, added with metric 4 in region 3 (which overlap on gps_coords 506), I want to return all gps_coords (no matter the region), and then the specified values (added where they intersect):

|  gps_coords  |  year  |  value  |
|  506         |  2010  |  233.76 |
|  507         |  2010  |  0      |
|  508         |  2010  |  434.4  |
|  509         |  2010  |  45.24  |

Sample (incorrect) SQL:

SELECT DISTINCT init.gps_coords, init.year, a.value + b.value as value

FROM data as init

INNER JOIN data as a USING (metric, value)
INNER JOIN data as b USING (metric, value)

INNER JOIN regions as r
ON (init.gps_coords = r.gps_coords)
AND r.region_id = 1

INNER JOIN regions as ra
ON (a.gps_coords = ra.gps_coords)
AND ra.region_id = 2

INNER JOIN regions as rb
ON (init.gps_coords = rb.gps_coords)
AND rb.region_id = 3

WHERE a.metric = 5
AND b.metric = 4
ORDER BY init.gps_coords

Above would be all coordinates, for every region (region 1), and then the values added where they intersect (ra.region 2 would include coords 506 and 509, and would add with rb.region 3's coords: 506 and 508, adding at coords 506). 507 doesn't appear in either either region id, so it is 0, or null, whichever.

bozdoz
  • 12,550
  • 7
  • 67
  • 96
  • 1
    Can you post table schema, sample data and desired output that correlates with your query. – peterm Aug 25 '13 at 22:20
  • Added @peterm. A lot of stuff, but I wanted to be thorough. Really it can be a simple, generalized answer. How to get both sets of coordinates in one query set column. Each in the examples returns two rows, but how to turn that into 3 (all four grouped would equal three where they share one) or 4 (a row for every coordinate). – bozdoz Aug 26 '13 at 04:38
  • @bozdos Can you explain why in desired output 507=0, and 509=45.24? They both are in region 1 and metric 5. May be I'm missing something but IMHO given your sample data result should be `506=236.76, 507=10.32, 508=462.9, 509=45.24`. – peterm Aug 26 '13 at 05:32
  • @peterm 507 is not in region 2 or 3, so it should be zero or null. – bozdoz Aug 26 '13 at 15:47
  • When you describe your desired output you say *...I want coordinates for metric 5 in region 1, added with metric 4 in region 3 ...*. There is no mention of region 2. Is it just a mistake? – peterm Aug 26 '13 at 15:51
  • Sorry, yeah. typo. Region 1 is all of the region gps coordinates. that's why it's the `init` table. So I can get all of them @peterm – bozdoz Aug 26 '13 at 17:58

2 Answers2

1

If understand correctly (in which I'm not sure) your query might look like

SELECT COALESCE(b.gps_coords, c.gps_coords) AS gps_coords,
       COALESCE(b.year, c.year) AS year,
       COALESCE(b.value, 0) + COALESCE(c.value, 0) AS value
  FROM
(
  SELECT d.gps_coords, d.year, SUM(d.value) AS value
    FROM data d JOIN regions r
      ON d.gps_coords = r.gps_coords
   WHERE d.metric = 5 AND r.region_id = 1
   GROUP BY d.gps_coords, d.year
) b FULL JOIN
(
  SELECT d.gps_coords, d.year, SUM(d.value) AS value
    FROM data d JOIN regions r
      ON d.gps_coords = r.gps_coords
   WHERE (d.metric = 4 AND r.region_id = 3)
   GROUP BY d.gps_coords, d.year
) c
    ON b.gps_coords = c.gps_coords
   AND b.year = c.year
 ORDER BY gps_coords

Sample output:

| GPS_COORDS | YEAR |  VALUE |
-------------|------|--------|
|        506 | 2010 | 236.76 |
|        507 | 2010 |  10.32 |
|        508 | 2010 |  462.9 |
|        509 | 2010 |  45.24 |

Here is SQLFiddle demo

peterm
  • 91,357
  • 15
  • 148
  • 157
  • Coalesce might be the answer. I'll take a look when I get to a computer. Thanks! – bozdoz Aug 26 '13 at 15:48
  • Is there no way to add a third table to that FULL JOIN @peterm? Suppose I need another query to add or subtract the simulation values? I haven't figured out how to add a third table. – bozdoz Sep 02 '13 at 02:18
  • The same way. Chain it after subquery `c`. And then add necessary columns to outer select `COALESCE()`s as a parameter. – peterm Sep 02 '13 at 02:31
  • Wasn't able to. Read up on a lot of hacks to get around it. Instead, I think I'm going to use the SQL I found here: http://stackoverflow.com/questions/441939/joining-3-tables-and-retrieve-all-the-records-from-all-the-tables – bozdoz Sep 02 '13 at 02:39
  • That's also a possibility. But unfortunately it's impossible to tell anything conclusive without all specifics: schema, sample data, desired output and detailed explanation of requirements. – peterm Sep 02 '13 at 02:46
0

This SQL gives me exactly what I need:

SELECT a.gps_coords, 
  a.year, 
  COALESCE(AVG(b.v1), 0) + COALESCE(AVG(b.v2), 0)
FROM data a
LEFT JOIN (
  SELECT d.gps_coords, d.year, d.value as v1, NULL v2
    FROM data d JOIN regions r
      ON d.gps_coords = r.gps_coords
   WHERE d.metric = 4 AND r.region_id = 3
UNION
  SELECT d.gps_coords, d.year, NULL, d.value
    FROM data d JOIN regions r
      ON d.gps_coords = r.gps_coords
   WHERE d.metric = 5 AND r.region_id = 2
) b
ON b.gps_coords = a.gps_coords
AND a.year = b.year
GROUP BY a.gps_coords, a.year
ORDER BY a.gps_coords

OUTPUT

|  gps_coords  |  year  |  value  |
|  506         |  2010  |  236.76 |
|  507         |  2010  |  0      |
|  508         |  2010  |  434.4  |
|  509         |  2010  |  45.24  |
bozdoz
  • 12,550
  • 7
  • 67
  • 96