reading filtered data from socrata in R

Question

does anyone know how to filter data automatically based on date_of_incident from socrata dataset in R in the first step of import to speed up read time?

this is what I have so far

token <- "n15hFiXqJU6DBItiSjA4jWD2U"
PoliceIncidents <- read.socrata("https://www.dallasopendata.com/resource/qv6i-rri7.csv", app_token = token)

#filter police incident data to 2019 to present

PoliceIncidents2019to2020 <- PoliceIncidents %>% filter(servyr > 2018)

here is the source data https://www.dallasopendata.com/Public-Safety/Police-Incidents/qv6i-rri7/data

score 1 · Answer 1 · answered Nov 30 '20 at 05:45

You can use filters in your original query to only pull incidents since 2019. This will speed up the read process, mostly from the server response that won't need to pass as much data. You'll need to use the "API field name" to construct the query.

In this case:

PoliceIncidents <- read.socrata("https://www.dallasopendata.com/resource/qv6i-rri7.csv?servyr > 2018")

score 0 · Answer 2 · answered Nov 27 '20 at 18:21

0

For big csvs, I like the package vroom from tidyverse. It's a lot faster than read_csv. With vroom, it's often easier to swallow the whole thing, then filter.

library(vroom)
library(tidyverse)

df_raw<-vroom('Police_Incidents.csv')
occurence_2019<-df_raw %>%
  filter(`Year1 of Occurrence`>=2019)

This only took like 10 seconds.

answered Nov 27 '20 at 18:21

Joe Erinjeri

1,200
1
7
15

i want it to pull directly from the api though so it can be updated on the server side when i run it in rshiny each time rather than uploading a csv? it's too large right now so i wanted to pull the last 3 months in the import step instead of importing and then filtering – Kristina Paterson Nov 30 '20 at 15:29

reading filtered data from socrata in R

2 Answers2