I am fairly new to R and the multiple R mapping packages I haven't completely flushed out their capabilities yet nor necessarily discovered all of the efficient ways to do things; so I apologize if there is an obvious answer but any help is appreciated.
The problem I am trying to solve has the following components.
- I want to create a heat map over a map of the United States based only on city and state data. I do not have zip code or geoencoding data.
- Each day I generate data of a about half a million records that usually have some form of City and State contained in a single field in them. The data is inconsistent as far as formatting and frequently has errors or non-usable data. This I am generally OK with, but it plays into possibly affecting other functions.
- The heat map (sum or count) is desired to be looked at on daily, weekly, and monthly or ad-hoc intervals. So manually doing data preparation each interval would likely be inefficient and also limit or not be efficient enough to use public APIs.
- A very large majority of the data will have the same values. A quick check of the values showed that one value makes up over 10% of the data consistently over time and the top 100 values (when formatted correctly) make up easily 25% or more.
- The program generating the data is vendor supplied and cannot be programmed to go and retrieve the geocode data itself and I am basically left to do it in R.
- I only have read only access to the data so cannot create additional fields to store information like geo-encoding information in the transaction records and at this time I cannot create another table in the same database to store the information.
I am aware of the geocode function and this similar post: [question] Plot on ggmap by using city and state as well as the getGeoCode built into RgoogleMaps that handles the various spacing of the city state well.
getGeoCode("Anchorage AK") lat lon 61.21806 -149.90028
What I am looking for guidance or help with is the following:
A) Is there a City library that contains the generic map plot information or is there a basic map package that has this built in that does not need to go out to the net for each record?
B) Is there a simple plot heat map that is something similar straight forward as the following as most examples of map plots I have seen to date are quite larger pieces of code just to call the map function.
plot(DataSource, CityCol, ValueCol)
C) And really the primary question, how (or is there a way) to build a common library or array of already identified locations in R that I could load each time on run-time and if it is not contained in that library only then go out and geocode it using a web resource. Almost a lookup table or library that is really just a custom data array?