I have a question regarding data handling in R. I have two datasets. Both are originally .csv files. I've prepared two example Datasets:
Table A - Persons
http://pastebin.com/HbaeqACi
Table B - City
http://pastebin.com/Fyj66ahq
To make it as less work as possible the corresponding R Code for loading and visualizing.
# Read csv files
# check pastebin links and save content to persons.csv and city.csv.
persons_dataframe = read.csv("persons.csv", header = TRUE)
city_dataframe = read.csv("city.csv", header = TRUE)
# plot them on a map
# load used packages
library(RgoogleMaps)
library(ggplot2)
library(ggmap)
library(sp)
persons_ggplot2 <- persons_dataframe
city_ggplot2 <- city_dataframe
gc <- geocode('new york, usa')
center <- as.numeric(gc)
G <- ggmap(get_googlemap(center = center, color = 'color', scale = 4, zoom = 10, maptype = "terrain", frame=T), extent="panel")
G1 <- G + geom_point(aes(x=POINT_X, y=POINT_Y ),data=city_dataframe, shape = 22, color="black", fill = "yellow", size = 4) + geom_point(aes(x=POINT_X, y=POINT_Y ),data=persons_dataframe, shape = 8, color="red", size=2.5)
plot(G1)
As a result I have a map, which visulaizes all cities and persons.
My problem: All persons are distributed only on these three cities.
My questions:
- A more general questions: Is this a problem for R?
- I want to create something like a bubble map, which visualized the amount of persons at one position. Like: In City A there are 20 persons, in City B are 5 persons. The position at city A should get a bigger bubble than City B.
- I want to create a label, which states the amount of persons at a certain position. I've already tried to realize this with the ggplo2
geom_text
options, but I can't figure out how to sum up all points at a certain position and write this to a label. - A more theoretical approach (maybe I come back to this later on): I want to create something like a density map / cluster map, which shows the area, with the highest amount of persons. I've already search for some packages, which I could use. Suggested ones were SpatialEpi, spatstat and DCluster. My question: Do I need the distance from the persons to a certain object (let's say supermarket) to perform a cluster analyses?
Hopefully, these were not too many questions.
Any help is much appreciated. Thanks in advance!
Btw: Is there any better help to prepare a question containing example datasets? Should I upload a file somewhere or is the pastebin way okay?