2

I'm trying to plot some Turkish data using R.

The problem I'm having is when I merge my data with the shape file (in spatialpolygonsdataframe format) the data no longer matches the correct pologons. What am I doing wrong,

Below is some reproducible code. The shape file is some natural earth data (so public domain) and I've put it on my google drive zipped with the simple data excel file. It produces 2 plots with the province name plotted, before and after the merge. You can see that the second image has "jumbled" the data and the Turkey.map@data no longer matches the correct polygon.

Before merge plot with correct province names: before merge After merge plot: after merge

library(maptools)
library(readxl)

temp <- "TurkeyShapefile.zip"
URL      <- "https://docs.google.com/ucid=0B0TyKM0aACIONUxfNTJwWGhrR0k&export=download"
download.file(URL,temp, mode="wb")
unzip(temp)

trtr <- readShapeSpatial("Natural_earth_admin_RMS150518_TR")

#read excel file
fname <- "TR_data.xlsx"
TRdata <- read_excel(fname, sheet = "pcnt")

Turkey.map <- trtr       #create copy of trtr

#a plot of the map before the merge
plot(Turkey.map)
invisible(text(getSpPPolygonsLabptSlots(Turkey.map), labels=as.character(Turkey.map@data$Admin1Name), cex=0.5))


#merge (join data)
Turkey.map@data <- merge(Turkey.map@data,TRdata,by.x="Admin1Name",by.y="Province", all.x=TRUE)

#a plot of the map after the merge
plot(Turkey.map)
invisible(text(getSpPPolygonsLabptSlots(Turkey.map), labels=as.character(Turkey.map@data$Admin1Name), cex=0.5))

Many thanks!

simon77
  • 73
  • 1
  • 1
  • 9

1 Answers1

1

You're in for a world of pain if you do anything with the @data slot on a spatial object that could in any way reorder it. In general you should do everything by manual calls to which() on an ID field between matched data sets, or alternatively in your case, you can call merge() on the SpatialPolygonsDataFrame object itself:

Turkey.map <- merge(
    Turkey.map, TRdata, 
    by.x="Admin1Name", by.y="Province", 
    all.x=TRUE
)

Curious as to why the OP wasn't seeing the correct output from merging a Spatial* object with a data.frame this is a fully reproducible example showing the correct behavior:

library(sp)

##  Reproducible 10x10 grid of polygons:
set.seed(2002)
grd <- GridTopology(c(1,1), c(1,1), c(10,10))
polys <- as(grd, "SpatialPolygons")
centroids <- coordinates(polys)
x <- centroids[,1]
y <- centroids[,2]
z <- 1.4 + 0.1*x + 0.2*y + 0.002*x*x
d <- SpatialPolygonsDataFrame(
  polys,
  data=data.frame(
    x=x, y=y, z=z, ID=1:length(x), 
    row.names=row.names(polys)
  )
)

df <- data.frame("ID"=1:10, color="black")

class(d)
class(df)

Yields:

class(d)
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"
class(df)
[1] "data.frame"

And following on to merge the two:

##  The merge of a SpatialPolygonsDataFrame and a data.frame:
dm <- merge(d, df, by.x="ID", by.y="ID", all.x=T)

##  Verify we still have a Spatial* object:
class(dm)
names(dm)

Yields:

class(dm)
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"
names(dm)
[1] "ID"    "x"     "y"     "z"     "color"

plot(dm, col=dm$color)

enter image description here

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices
[4] utils     datasets  methods  
[7] base     

other attached packages:
[1] sp_1.1-1

loaded via a namespace (and not attached):
[1] tools_3.2.1     grid_3.2.1     
[3] lattice_0.20-31
Forrest R. Stevens
  • 3,435
  • 13
  • 21
  • 1
    That would just create a data frame object though, replacing the SpatialPolygonsDataFrame of the same name would it not? – simon77 Aug 13 '15 at 09:18
  • No, it returns the same `Spatial*` class object you supply the merge function with. – Forrest R. Stevens Aug 13 '15 at 14:20
  • 1
    I did try it first, it returned a dataframe object! I tried a sort=FALSE to the merge which I think may have fixed it in this case. Thanks – simon77 Aug 14 '15 at 13:05
  • If you are indeed still working with a `data.frame` object (which you shouldn't be as long as you're merging the actual `Spatial*` object and not its `@data` slot) you shouldn't rely only on `sort=F` to return a `data.frame` suitable for sticking back into the `Spatial*` object. The reason is that with `all.x=T` specified any non-matches still end up reordered at the bottom of the resulting merged `data.frame` object. This may be a comment for future readers, if indeed the merge is giving you back your `Spatial*` object as it should. – Forrest R. Stevens Aug 14 '15 at 14:27
  • yep, I used your exact code instead of the merge line in mine. No longer a spatial object. so I went back to `Turkey.map@data <- merge(Turkey.map@data,TRdata,by.x="Admin1Name",by.y="Province", all.x=TRUE, sort=FALSE)` I have since tried – simon77 Aug 14 '15 at 18:02
  • I have since tried to (avoid using merge and) use match which seems to work fine. `Turkey.map@data = data.frame(Turkey.map@data, TRdata[match(Turkey.map@data[, "Admin1Name"], TRdata[,"Province"]),])` – simon77 Aug 14 '15 at 18:09
  • I'm genuinely curious as to why you aren't seeing the appropriate behavior. I've added a reproducible example to my answer to prove how the behavior should work with a current version of `sp` and the `sp::merge()` functionality. – Forrest R. Stevens Aug 15 '15 at 02:01