3

I have a question regarding data handling in R. I have two datasets. Both are originally .csv files. I've prepared two example Datasets:

Table A - Persons
http://pastebin.com/HbaeqACi

Table B - City
http://pastebin.com/Fyj66ahq

To make it as less work as possible the corresponding R Code for loading and visualizing.

# Read csv files
# check pastebin links and save content to persons.csv and city.csv.
persons_dataframe = read.csv("persons.csv", header = TRUE)
city_dataframe = read.csv("city.csv", header = TRUE)
# plot them on a map
# load used packages
library(RgoogleMaps)
library(ggplot2)
library(ggmap)
library(sp)

persons_ggplot2 <- persons_dataframe
city_ggplot2 <- city_dataframe
gc <- geocode('new york, usa')
center <- as.numeric(gc)  
G <- ggmap(get_googlemap(center = center, color = 'color', scale = 4, zoom = 10, maptype = "terrain", frame=T), extent="panel")
G1 <- G + geom_point(aes(x=POINT_X, y=POINT_Y ),data=city_dataframe, shape = 22, color="black", fill = "yellow", size = 4) + geom_point(aes(x=POINT_X, y=POINT_Y ),data=persons_dataframe, shape = 8, color="red", size=2.5)
plot(G1)

As a result I have a map, which visulaizes all cities and persons.
My problem: All persons are distributed only on these three cities.

My questions:

  1. A more general questions: Is this a problem for R?
  2. I want to create something like a bubble map, which visualized the amount of persons at one position. Like: In City A there are 20 persons, in City B are 5 persons. The position at city A should get a bigger bubble than City B.
  3. I want to create a label, which states the amount of persons at a certain position. I've already tried to realize this with the ggplo2 geom_text options, but I can't figure out how to sum up all points at a certain position and write this to a label.
  4. A more theoretical approach (maybe I come back to this later on): I want to create something like a density map / cluster map, which shows the area, with the highest amount of persons. I've already search for some packages, which I could use. Suggested ones were SpatialEpi, spatstat and DCluster. My question: Do I need the distance from the persons to a certain object (let's say supermarket) to perform a cluster analyses?

Hopefully, these were not too many questions.
Any help is much appreciated. Thanks in advance!

Btw: Is there any better help to prepare a question containing example datasets? Should I upload a file somewhere or is the pastebin way okay?

schlomm
  • 551
  • 2
  • 11
  • 22
  • Argh...I've forgoten one line. I've edited the question and pasted the missed line. – schlomm Mar 01 '14 at 12:25
  • I'm getting `Error: ggplot2 doesn't know how to deal with data of class SpatialPointsDataFrame` when it gets to the line that defines `G1` – Jake Burkhead Mar 01 '14 at 12:31
  • You're absolutely right. Sorry. I've missed the point that ggplot does not support (at least I don't know how to do it) SpatialPointDataFrames but only data.frames. Please clear your workspace and/or run the code from my question again. I've deleted the line, where the datasets are transformed to SpatialPointDataFrames. Sorry for that (I haven't updated the lines in the browser but only in RStudio :-/). Anyway...Now it should work! – schlomm Mar 01 '14 at 12:41

1 Answers1

2

You can create the bubble chart by counting the number in each city and mapping the size of the points to the counts:

library(plyr)
persons_count <- count(persons_dataframe, vars = c("city", "POINT_X", "POINT_Y"))

G + geom_point(aes(x=POINT_X, y=POINT_Y, size=freq),data=persons_count, color="red")

You can map the counts to the area of the points, which perhaps gives a better sense of the relative sizes:

G + geom_point(aes(x=POINT_X, y=POINT_Y, size=freq),data=persons_count, color="red") +
    scale_size_area(breaks = unique(persons_count$freq))

You can add the frequency labels, though this is somewhat redundant with the size scale legend:

G + geom_point(aes(x=POINT_X, y=POINT_Y, size=freq),data=persons_count, color="red") +
    geom_text(aes(x = POINT_X, y=POINT_Y, label = freq), data=persons_count) +
    scale_size_area(breaks = unique(persons_count$freq))

You can't really plot densities with your example data because you only have three points. But if you had more fine-grained location information you could calculate and plot the densities using the stat_density2d function in ggplot2.

Ista
  • 10,139
  • 2
  • 37
  • 38
  • One thing, which I've forgotten to ask: Is it possible to sort the legend descending or ascending regarding the frequency? – schlomm Mar 01 '14 at 17:11
  • You can control the breaks with the breaks argument of `scale_size_area`, e.g., `scale_size_area(breaks = rev(unique(persons_count$freq)))` – Ista Mar 01 '14 at 18:28
  • Mh....I did not got the point. Is there any resource available, which describes the use of the `breaks` parameter? I mean, what is the difference between the code in your comment and in your answer (last line). By using the your comment-line it looks like http://imgur.com/QxXgt4f What I want to have that the counts are sorted as/descending. I've found out that I can set the `breaks` parameter to "pretty", which looks like http://imgur.com/QoZUs31 This is of course pretty (counts are sorted) but the optimal way would be the combination of both: All breaks as/descending regarding their counts. – schlomm Mar 01 '14 at 18:49
  • 1
    You are apparently now using different data than you showed in your original post, which makes it hard to keep track of what you're doing. Breaks are described in `?continuous_scale`. – Ista Mar 01 '14 at 21:55
  • You're right. I've added some more persons and cities to make clear, what I mean. I've updated the pastebin, so you could reproduce my plots. The plots plot use `G + geom_point(aes(x=POINT_X, y=POINT_Y, size=freq),data=persons_count, color="red") + geom_text(aes(x = POINT_X, y=POINT_Y, label = freq), data=persons_count) + scale_size_area(breaks = unique(persons_count$freq))`, second plot uses the same but `unique` is changed to `pretty`. – schlomm Mar 01 '14 at 22:34