2

hope someone can figure out what I'm doing wrong here. The task seems quite simple, but is apparently beyond me.

I have two tables and I am trying to append precipitation data. Three conditions need to match in the two tables to get the correct precipitation data for each field site, e.g. year, latitude, and longitude. I tried by using the following queries (among other failed queries that I can't even remember any more):

SELECT f.*, g.* 
  FROM fieldSites f  LEFT OUTER JOIN gpcp_precipitation2 g
    ON f.date = g.year 
   AND f.d_lat = g.lat
 WHERE f.d_lon = g.lon; 

this one timed out

and:

SELECT *
FROM fieldSites
INNER JOIN gpcp_precipitation2 
  ON (fieldSites.d_lon = gpcp_precipitation2.lon 
  AND fieldSites.d_lat = gpcp_precipitation2.lat 
  AND fieldSites.date = gpcp_precipitation2.year);

this one also timed out on me.

I'd like to dump to a .csv file, but right now I just want to execute a successful query.

Here are my tables:

left table: fieldSites

siteId  d_lat  d_lon   year  data1  data2  country      
  1     -13.75  18.75   2009  0.598  0.351  Angola       
  1     -13.75  18.75   2008  0.654  0.330  Angola       
  1     -13.75  18.75   2007  0.489  0.381  Angola       
  1     -13.75  18.75   2006  0.554  0.389  Angola       
  1     -13.75  18.75   2005  0.321  0.321  Angola       
  1     -13.75  18.75   2004  0.598  0.351  Angola       
  1     -13.75  18.75   2003  0.654  0.330  Angola       
  1     -13.75  18.75   2002  0.489  0.381  Angola       
  1     -13.75  18.75   2001  0.554  0.389  Angola       
  2     -78.75  163.75  2009  0.285  0.155  Antarctica   
  2     -78.75  163.75  2008  0.285  0.155  Antarctica   
  2     -78.75  163.75  2007  0.285  0.155  Antarctica   
  2     -78.75  163.75  2006  0.285  0.155  Antarctica   
  2     -78.75  163.75  2005  0.285  0.155  Antarctica   
...1052 sites, 11 years, 11496 rows

right table: gpcp_precipitation2

siteId   lat    lon   year  precipitation
1        81.5   1.25  2009  93.36571912   
1        81.5   1.25  2008  93.36571912   
1        81.5   1.25  2007  93.36571912   
1        81.5   1.25  2006  93.36571912   
1        81.5   1.25  2005  93.36571912   
1        81.5   1.25  2004  93.36571912   
1        81.5   1.25  2003  93.36571912   
1        81.5   1.25  2002  93.36571912   
1        81.5   1.25  2001  93.36571912   
1        81.5   1.25  2000  93.36571912   
1        81.5   3.75  2009  93.36571912 
1        81.5   3.75  2008  93.36571912   
1        81.5   3.75  2007  93.36571912

... 92300 rows  

What I want is this:

siteId  d_lat  d_lon   year  data1  data2  country      precipitation  
  1     13.75  18.75   2009  0.598  0.351  Angola       144.286
  1     13.75  18.75   2008  0.654  0.330  Angola       114.970
  1     13.75  18.75   2007  0.489  0.381  Angola       70.000
  1     13.75  18.75   2006  0.554  0.389  Angola       174.179
  1     13.75  18.75   2005  0.321  0.321  Angola       174.743
  1     13.75  18.75   2004  0.598  0.351  Angola       70.506
  1     13.75  18.75   2003  0.654  0.330  Angola       173.716
  1     13.75  18.75   2002  0.489  0.381  Angola       74.162
  1     13.75  18.75   2001  0.554  0.389  Angola       139.445
  2     78.75  163.75  2009  0.285  0.155  Antarctica   0
  2     78.75  163.75  2008  0.285  0.155  Antarctica   0
  2     78.75  163.75  2007  0.285  0.155  Antarctica   0
  2     78.75  163.75  2006  0.285  0.155  Antarctica   0

Am I doing something completely stupid? I am stumped. Thanks so much for any advice.

Raidri
  • 17,258
  • 9
  • 62
  • 65
velvetmonster
  • 87
  • 3
  • 10
  • 1
    I guess you mean `AND fieldSites.year = gpcp_precipitation2.year` – ypercubeᵀᴹ Jul 20 '11 at 05:47
  • I don't understand what your desired output is supposed to be. For example, in the eighth row, what is the 13.75 in the second, unnamed column? The small amount of data which you have given has no matches for latitude between the two tables. – No'am Newman Jul 20 '11 at 05:48
  • If the queries timed out, then probably your machine is too slow? You could try with some smaller temporary tables to test the queries (I would go with the second one). To get around the timeout: Increase the timeout, check if there are indexes on the columns you use for the join, etc.. – hage Jul 20 '11 at 05:49
  • It should have just thrown an error around fieldSites.date. You could also try bringing back just the top 100 records to speed it up for testing. – Matt Jul 20 '11 at 05:53

3 Answers3

5
Select fieldSites.*, precipitation.*
From fieldSites
Inner Join gpcp_precipitation2 As precipitation On precipitation.siteId = fieldSites.siteId
Where
    fieldSites.d_year = precipitation.year And
    fieldSites.d_lat = precipitation.lat And
    fieldSites.d_lon = precipitation.lon

If that query is timing out, you have an indexing problem, not necessarily a query problem. This gives you several predicates in the where clause to filter on, so it should reduce your joins quite a bit, but you may need an index that includes siteId, year, lat, and lon on both tables.

Jordan
  • 31,971
  • 6
  • 56
  • 67
  • THANK YOU JORDAN!!!! That did it, with a little tweaking. Since the siteId for precipitation are not the same as the fieldSite siteIds, I removed that bit.Here's my code: Select fieldSites.*, precipitation.* From fieldSites Inner Join gpcp_precipitation2 As precipitation On fieldSites.date = precipitation.year Where fieldSites.d_lat = precipitation.lat And fieldSites.d_lon = precipitation.lon – velvetmonster Jul 20 '11 at 19:04
  • Well, this is pathetic, I just figured out how to select an answer. Sorry for the incredibly long delay in giving you your credit. – velvetmonster Jul 04 '13 at 18:09
3

Add an index on (date, d_lat, d_lon) to the first table and a (year, lat, lon) index to the second table. Then, try the joins.

From your comments, I suggest you use the second query:

SELECT *
FROM fieldSites
INNER JOIN gpcp_precipitation2 
  ON  fieldSites.d_lon = gpcp_precipitation2.lon 
  AND fieldSites.d_lat = gpcp_precipitation2.lat 
  AND fieldSites.date = gpcp_precipitation2.year ;

Can you also post the query plan for the above, now that you have added some indexes? (use EXPLAIN SELECT ... )

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
  • I think the index really helped. I used this code for the precip data table: CREATE INDEX gpcpIndex ON gpcp_precipitation2 (lat,lon,year,precip) and this code for the fieldSite table: CREATE INDEX fieldSitesIndex ON fieldSites (siteId, DATE, d_lon, d_lat) (Sorry, I don't know how to format code in the comments) – velvetmonster Jul 20 '11 at 19:16
2
SELECT *
FROM fieldSites
INNER JOIN gpcp_precipitation2 
  ON (fieldSites.d_lon = gpcp_precipitation2.lon 
  AND fieldSites.d_lat = gpcp_precipitation2.lat 
  AND fieldSites.date = gpcp_precipitation2.year);

The last line:

AND fieldSites.date = gpcp_precipitation2.year);

According to your tables this should be fieldSites.year Is that a typo or the error?

Edgar Velasquez Lim
  • 2,426
  • 18
  • 15
  • Good eye, Edgar, it was a typo - I changed the name in that table to make it discreet from the other table, and missed it in the code here. – velvetmonster Jul 20 '11 at 18:26