I imported a json file using rjson
and converted it to a data.frame but all of the data is spread widthwise and the column names contain the key information.
stations <- fromJSON(file = "station_information.json")
test <- as.data.frame(stations[3])
What this looks like is:
> dim(test)
[1] 2 5985
> test[1:27]
data.stations.station_id data.stations.name data.stations.short_name
1 72 W 52 St & 11 Ave 6926.01
2 72 W 52 St & 11 Ave 6926.01
data.stations.lat data.stations.lon data.stations.region_id
1 40.76727 -73.99393 71
2 40.76727 -73.99393 71
data.stations.rental_methods data.stations.capacity
1 KEY 39
2 CREDITCARD 39
data.stations.eightd_has_key_dispenser data.stations.station_id.1
1 FALSE 79
2 FALSE 79
data.stations.name.1 data.stations.short_name.1 data.stations.lat.1
1 Franklin St & W Broadway 5430.08 40.71912
2 Franklin St & W Broadway 5430.08 40.71912
data.stations.lon.1 data.stations.region_id.1 data.stations.rental_methods.1
1 -74.00667 71 KEY
2 -74.00667 71 CREDITCARD
data.stations.capacity.1 data.stations.eightd_has_key_dispenser.1
1 33 FALSE
2 33 FALSE
data.stations.station_id.2 data.stations.name.2 data.stations.short_name.2
1 82 St James Pl & Pearl St 5167.06
2 82 St James Pl & Pearl St 5167.06
data.stations.lat.2 data.stations.lon.2 data.stations.region_id.2
1 40.71117 -74.00017 71
2 40.71117 -74.00017 71
data.stations.rental_methods.2 data.stations.capacity.2
1 KEY 27
2 CREDITCARD 27
data.stations.eightd_has_key_dispenser.2
1 FALSE
2 FALSE
As you can see, this cannot be fixed with a simple transpose t()
or melt()
solution. I am wondering what I am doing wrong on import or conversion to data.frame which leaves me with a dataframe stretched with the index of what should be rows appended to the column names.
I have tried both of these methods but I am left with the same stretched data:
plyr::ldply(stations, data.frame)
do.call(rbind, lapply(stations, data.frame, stringsAsFactors=FALSE))
In the end I would like my output to look like every 9 columns were "cut" and stacked on to the first 9 - so that I am left with 655 rows and 9 columns Any suggestions would be appreciated.
NOTE: I am taking the JSON directly from this link (it is not a large file)
Here is a reproducible example of the first 27 columns, which should be reshaped to a 9 x 3 dataframe:
> dput(df)
structure(list(data.stations.station_id = structure(c(1L, 1L), class = "factor", .Label = "72"),
data.stations.name = structure(c(1L, 1L), class = "factor", .Label = "W 52 St & 11 Ave"),
data.stations.short_name = structure(c(1L, 1L), class = "factor", .Label = "6926.01"),
data.stations.lat = c(40.76727216, 40.76727216), data.stations.lon = c(-73.99392888,
-73.99392888), data.stations.region_id = c(71, 71), data.stations.rental_methods = structure(c(2L,
1L), .Label = c("CREDITCARD", "KEY"), class = "factor"),
data.stations.capacity = c(39, 39), data.stations.eightd_has_key_dispenser = c(FALSE,
FALSE), data.stations.station_id.1 = structure(c(1L, 1L), class = "factor", .Label = "79"),
data.stations.name.1 = structure(c(1L, 1L), class = "factor", .Label = "Franklin St & W Broadway"),
data.stations.short_name.1 = structure(c(1L, 1L), class = "factor", .Label = "5430.08"),
data.stations.lat.1 = c(40.71911552, 40.71911552), data.stations.lon.1 = c(-74.00666661,
-74.00666661), data.stations.region_id.1 = c(71, 71), data.stations.rental_methods.1 = structure(c(2L,
1L), .Label = c("CREDITCARD", "KEY"), class = "factor"),
data.stations.capacity.1 = c(33, 33), data.stations.eightd_has_key_dispenser.1 = c(FALSE,
FALSE), data.stations.station_id.2 = structure(c(1L, 1L), class = "factor", .Label = "82"),
data.stations.name.2 = structure(c(1L, 1L), class = "factor", .Label = "St James Pl & Pearl St"),
data.stations.short_name.2 = structure(c(1L, 1L), class = "factor", .Label = "5167.06"),
data.stations.lat.2 = c(40.71117416, 40.71117416), data.stations.lon.2 = c(-74.00016545,
-74.00016545), data.stations.region_id.2 = c(71, 71), data.stations.rental_methods.2 = structure(c(2L,
1L), .Label = c("CREDITCARD", "KEY"), class = "factor"),
data.stations.capacity.2 = c(27, 27), data.stations.eightd_has_key_dispenser.2 = c(FALSE,
FALSE)), .Names = c("data.stations.station_id", "data.stations.name",
"data.stations.short_name", "data.stations.lat", "data.stations.lon",
"data.stations.region_id", "data.stations.rental_methods", "data.stations.capacity",
"data.stations.eightd_has_key_dispenser", "data.stations.station_id.1",
"data.stations.name.1", "data.stations.short_name.1", "data.stations.lat.1",
"data.stations.lon.1", "data.stations.region_id.1", "data.stations.rental_methods.1",
"data.stations.capacity.1", "data.stations.eightd_has_key_dispenser.1",
"data.stations.station_id.2", "data.stations.name.2", "data.stations.short_name.2",
"data.stations.lat.2", "data.stations.lon.2", "data.stations.region_id.2",
"data.stations.rental_methods.2", "data.stations.capacity.2",
"data.stations.eightd_has_key_dispenser.2"), row.names = c(NA,
-2L), class = "data.frame")
So the output structure should look like this (obviously with the values not NA). Each row represents the appended index number of the original dataframe's column names
> output
data.stations.station_id data.stations.name data.stations.short_name
1 NA NA NA
2 NA NA NA
3 NA NA NA
data.stations.lat data.stations.lon data.stations.region_id
1 NA NA NA
2 NA NA NA
3 NA NA NA
data.stations.rental_methods data.stations.capacity
1 NA NA
2 NA NA
3 NA NA
data.stations.eightd_has_key_dispenser
1 NA
2 NA
3 NA