I could really use some help here with my RStudio.
I am trying out this analysis and seem to have problem converting data type of certain variables.
library(tidyverse)
library(lubridate)
library(ggplot2)
library(magrittr)
Nov2020 <- read_csv("202011-divvy-tripdata.csv")
str(Nov2020)
The output is as of below:
spec_tbl_df [259,716 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ ride_id : chr [1:259716] "BD0A6FF6FFF9B921" "96A7A7A4BDE4F82D" "C61526D06582BDC5" "E533E89C32080B9E" ...
$ rideable_type : chr [1:259716] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
$ started_at : POSIXct[1:259716], format: "2020-11-01 13:36:00" "2020-11-01 10:03:26" "2020-11-01 00:34:05" "2020-11-01 00:45:16" ...
$ ended_at : POSIXct[1:259716], format: "2020-11-01 13:45:40" "2020-11-01 10:14:45" "2020-11-01 01:03:06" "2020-11-01 00:54:31" ...
$ start_station_name: chr [1:259716] "Dearborn St & Erie St" "Franklin St & Illinois St" "Lake Shore Dr & Monroe St" "Leavitt St & Chicago Ave" ...
$ start_station_id : num [1:259716] 110 672 76 659 2 72 76 NA 58 394 ...
$ end_station_name : chr [1:259716] "St. Clair St & Erie St" "Noble St & Milwaukee Ave" "Federal St & Polk St" "Stave St & Armitage Ave" ...
$ end_station_id : num [1:259716] 211 29 41 185 2 76 72 NA 288 273 ...
$ start_lat : num [1:259716] 41.9 41.9 41.9 41.9 41.9 ...
$ start_lng : num [1:259716] -87.6 -87.6 -87.6 -87.7 -87.6 ...
$ end_lat : num [1:259716] 41.9 41.9 41.9 41.9 41.9 ...
$ end_lng : num [1:259716] -87.6 -87.7 -87.6 -87.7 -87.6 ...
$ member_casual : chr [1:259716] "casual" "casual" "casual" "casual" ...
- attr(*, "spec")=
.. cols(
.. ride_id = col_character(),
.. rideable_type = col_character(),
.. started_at = col_datetime(format = ""),
.. ended_at = col_datetime(format = ""),
.. start_station_name = col_character(),
.. start_station_id = col_double(),
.. end_station_name = col_character(),
.. end_station_id = col_double(),
.. start_lat = col_double(),
.. start_lng = col_double(),
.. end_lat = col_double(),
.. end_lng = col_double(),
.. member_casual = col_character()
.. )
- attr(*, "problems")=<externalptr>
As you can see, the 'start_station_id' and 'end_station_id' are both <col_double()> variable type. I need to convert them to character type so I can stack them with other months data.
Nov2020 %>%
mutate(start_station_id=as.character(start_station_id),
end_station_id=as.character(end_station_id))
After applying that step, the output is of below:
# A tibble: 259,716 x 13
ride_id rideable_type started_at ended_at start_station_na~ start_station_id end_station_name
<chr> <chr> <dttm> <dttm> <chr> <chr> <chr>
1 BD0A6FF6~ electric_bike 2020-11-01 13:36:00 2020-11-01 13:45:40 Dearborn St & Er~ 110 St. Clair St & E~
2 96A7A7A4~ electric_bike 2020-11-01 10:03:26 2020-11-01 10:14:45 Franklin St & Il~ 672 Noble St & Milwa~
3 C61526D0~ electric_bike 2020-11-01 00:34:05 2020-11-01 01:03:06 Lake Shore Dr & ~ 76 Federal St & Pol~
4 E533E89C~ electric_bike 2020-11-01 00:45:16 2020-11-01 00:54:31 Leavitt St & Chi~ 659 Stave St & Armit~
5 1C9F4EF1~ electric_bike 2020-11-01 15:43:25 2020-11-01 16:16:52 Buckingham Fount~ 2 Buckingham Fount~
6 7259585D~ electric_bike 2020-11-14 15:55:17 2020-11-14 16:44:38 Wabash Ave & 16t~ 72 Lake Shore Dr & ~
7 91FE5C8F~ electric_bike 2020-11-14 16:47:29 2020-11-14 17:03:03 Lake Shore Dr & ~ 76 Wabash Ave & 16t~
8 9E7A79AD~ electric_bike 2020-11-14 16:04:15 2020-11-14 16:19:33 NA NA NA
9 A5B02C0D~ electric_bike 2020-11-14 16:24:09 2020-11-14 16:51:34 Marshfield Ave &~ 58 Larrabee St & Ar~
10 8234407C~ electric_bike 2020-11-14 01:24:22 2020-11-14 01:31:42 Clark St & 9th S~ 394 Michigan Ave & 1~
# ... with 259,706 more rows, and 6 more variables: end_station_id <chr>, start_lat <dbl>, start_lng <dbl>,
# end_lat <dbl>, end_lng <dbl>, member_casual <chr>
You can see both fields are now of variable, which is what I want.
However, when I run the structure code again, the data type is still as of original: <col_double()>.
str(Nov2020)
spec_tbl_df [259,716 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ ride_id : chr [1:259716] "BD0A6FF6FFF9B921" "96A7A7A4BDE4F82D" "C61526D06582BDC5" "E533E89C32080B9E" ...
$ rideable_type : chr [1:259716] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
$ started_at : POSIXct[1:259716], format: "2020-11-01 13:36:00" "2020-11-01 10:03:26" "2020-11-01 00:34:05" "2020-11-01 00:45:16" ...
$ ended_at : POSIXct[1:259716], format: "2020-11-01 13:45:40" "2020-11-01 10:14:45" "2020-11-01 01:03:06" "2020-11-01 00:54:31" ...
$ start_station_name: chr [1:259716] "Dearborn St & Erie St" "Franklin St & Illinois St" "Lake Shore Dr & Monroe St" "Leavitt St & Chicago Ave" ...
$ start_station_id : num [1:259716] 110 672 76 659 2 72 76 NA 58 394 ...
$ end_station_name : chr [1:259716] "St. Clair St & Erie St" "Noble St & Milwaukee Ave" "Federal St & Polk St" "Stave St & Armitage Ave" ...
$ end_station_id : num [1:259716] 211 29 41 185 2 76 72 NA 288 273 ...
$ start_lat : num [1:259716] 41.9 41.9 41.9 41.9 41.9 ...
$ start_lng : num [1:259716] -87.6 -87.6 -87.6 -87.7 -87.6 ...
$ end_lat : num [1:259716] 41.9 41.9 41.9 41.9 41.9 ...
$ end_lng : num [1:259716] -87.6 -87.7 -87.6 -87.7 -87.6 ...
$ member_casual : chr [1:259716] "casual" "casual" "casual" "casual" ...
- attr(*, "spec")=
.. cols(
.. ride_id = col_character(),
.. rideable_type = col_character(),
.. started_at = col_datetime(format = ""),
.. ended_at = col_datetime(format = ""),
.. start_station_name = col_character(),
.. start_station_id = col_double(),
.. end_station_name = col_character(),
.. end_station_id = col_double(),
.. start_lat = col_double(),
.. start_lng = col_double(),
.. end_lat = col_double(),
.. end_lng = col_double(),
.. member_casual = col_character()
.. )
- attr(*, "problems")=<externalptr>
Am I missing something here? I tried renaming the dataset to a new name after mutating: 'Nov2020_v2' for example, but the result is the same.
Because of this issue I can't proceed with my analysis to stack this dataset up with other months data, where these 2 variables are of character type.
Any help will be greatly appreciated! Thanks!