You don't need stringi()
for this. The fastest way to query the data is to use data.table()
with a key on country
, and use grepl()
to subset the data.
Example using the GNI2014 data from the treemap
package.
library(treemap)
library(data.table)
data(GNI2014)
gni2014table <- data.table(GNI2014)
setkey(gni2014table,"country")
searchText <- "berm"
gni2014table[grepl(searchText,gni2014table$country,ignore.case=TRUE),]
searchText <- "United"
gni2014table[grepl(searchText,gni2014table$country,ignore.case=TRUE),]
...and the output.
> library(treemap)
> library(data.table)
> data(GNI2014)
> gni2014table <- data.table(GNI2014)
> setkey(gni2014table,"country")
> searchText <- "berm"
> gni2014table[grepl(searchText,gni2014table$country,ignore.case=TRUE),]
iso3 country continent population GNI
1: BMU Bermuda North America 67837 106140
>
> searchText <- "United"
> gni2014table[grepl(searchText,gni2014table$country,ignore.case=TRUE),]
iso3 country continent population GNI
1: ARE United Arab Emirates Asia 4798491 44600
2: GBR United Kingdom Europe 62262000 43430
3: USA United States North America 313973000 55200
>
Returning only the column that you want to populate the field on the UI looks like this.
searchText <- "United Arab"
gni2014table[grepl(searchText,gni2014table$country,ignore.case=TRUE),country]
UPDATE 20 Dec 2017: Add code to run microbenchmarks, showing that in first test case lgrepl()
runs 20 ms faster than stringi_detect_fixed()
, and in the second case, stringi_detect_fixed()
is 60 ms faster than lgrepl()
for 100 iterations of the request.
library(treemap)
library(data.table)
library(microbenchmark)
data(GNI2014)
gni2014table <- data.table(GNI2014)
setkey(gni2014table,"country")
searchText <- "berm"
microbenchmark(gni2014table[grepl(searchText,gni2014table$country,
ignore.case=TRUE),])
searchText <- "United Arab"
microbenchmark(gni2014table[grepl(searchText,gni2014table$country,
ignore.case=TRUE),country])
library(stringi)
searchText <- "berm"
microbenchmark(gni2014table[stri_detect_fixed(searchText,
gni2014table$country,
case_insensitive=TRUE),])
searchText <- "United Arab"
microbenchmark(gni2014table[stri_detect_fixed(searchText,
gni2014table$country,case_insensitive=TRUE),])
You'll have to run the code yourself to reproduce the benchmarks, because the output of microbenchmark()
doesn't display easily on SO.
That said, a summarized version of the timings is:
searchText Function Mean (in Microseconds)
------------- -------------------- -----------------------
berm grepl 526.2545
United Arab grepl 583.1789
berm stringi_detect_fixed 545.8772
United Arab stringi_detect_fixed 524.1132