Code from the widyr
section of Text Mining with R Chapter 4 generates deprecated function messages for usage of distinct_()
and tbl_df()
functions. Since there are over 100 lines of code in Chapter 4 of the book, we whittle it down to the relevant section and minimum number of packages needed to replicate the warning messages.
library(dplyr)
library(janeaustenr)
library(tidytext)
austen_section_words <- austen_books() %>%
filter(book == "Pride & Prejudice") %>%
mutate(section = row_number() %/% 10) %>%
filter(section > 0) %>%
unnest_tokens(word, text) %>%
filter(!word %in% stop_words$word)
austen_section_words
library(widyr)
# count words co-occuring within sections
word_pairs <- austen_section_words %>%
pairwise_count(word, section, sort = TRUE)
word_pairs
...generates the following:
> # count words co-occuring within sections
> word_pairs <- austen_section_words %>%
+ pairwise_count(word, section, sort = TRUE)
Warning messages:
1: `distinct_()` is deprecated as of dplyr 0.7.0.
Please use `distinct()` instead.
See vignette('programming') for more help
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
2: `tbl_df()` is deprecated as of dplyr 1.0.0.
Please use `tibble::as_tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
>
> word_pairs
# A tibble: 796,008 x 3
item1 item2 n
<chr> <chr> <dbl>
1 darcy elizabeth 144
2 elizabeth darcy 144
3 miss elizabeth 110
4 elizabeth miss 110
5 elizabeth jane 106
6 jane elizabeth 106
7 miss darcy 92
8 darcy miss 92
9 elizabeth bingley 91
10 bingley elizabeth 91
# … with 795,998 more rows
These messages are generated because widyr::pairwise_count() uses dplyr::distinct_()
, which then calls tbl_df()
.
#' @rdname pairwise_count
#' @export
pairwise_count_ <- function(tbl, item, feature, wt = NULL, ...) {
if (is.null(wt)) {
func <- squarely_(function(m) m %*% t(m), sparse = TRUE, ...)
wt <- "..value"
} else {
func <- squarely_(function(m) m %*% t(m > 0), sparse = TRUE, ...)
}
tbl %>%
distinct_(.dots = c(item, feature), .keep_all = TRUE) %>%
mutate(..value = 1) %>%
func(item, feature, wt) %>%
rename(n = value)
}
We can see the sources of the warnings when we print the warning messages with lifecycle::last_warnings()
.
<deprecated>
message: `tbl_df()` is deprecated as of dplyr 1.0.0.
Please use `tibble::as_tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
backtrace:
9. widyr::pairwise_count(., word, section, sort = TRUE)
10. widyr::pairwise_count_(...)
3. dplyr::distinct_(., .dots = c(item, feature), .keep_all = TRUE)
3. dplyr::mutate(., ..value = 1)
10. widyr:::func(., item, feature, wt)
19. widyr:::new_f(tbl, item, feature, value, ...)
7. widyr:::custom_melt(.)
15. dplyr::tbl_df(.)
>
Version 0.1.3 of widyr
is the current version of the package. To resolve these warning messages, one must replace the reference to dplyr::distinct_()
in widyr::pairwise_count(). Since this is a currently supported R package, to initiate this process one would report an Issue at the widyr Github Issues page.
As noted in the text of the warning message, distinct_()
has been replaced with dplyr::distinct()
, and tbl_df()
has been replaced with tibble::as_tibble()
.
Suppressing the warnings
One can suppress the warnings produced by pairwise_count()
by wrapping it within a suppressWarnings()
function.
library(widyr)
suppressWarnings(
# count words co-occuring within sections
word_pairs <- austen_section_words %>%
pairwise_count(word, section, sort = TRUE))
...and the output:
> suppressWarnings(
+ # count words co-occuring within sections
+ word_pairs <- austen_section_words %>%
+ pairwise_count(word, section, sort = TRUE))
>
> word_pairs
# A tibble: 796,008 x 3
item1 item2 n
<chr> <chr> <dbl>
1 darcy elizabeth 144
2 elizabeth darcy 144
3 miss elizabeth 110
4 elizabeth miss 110
5 elizabeth jane 106
6 jane elizabeth 106
7 miss darcy 92
8 darcy miss 92
9 elizabeth bingley 91
10 bingley elizabeth 91
# … with 795,998 more rows
Appendix
This code was run on version 4.0.2 of R, with the following packages, as reported by sessionInfo()
:
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tidytext_0.2.5 janeaustenr_0.1.5 widyr_0.1.3 tidyr_1.1.1
[5] dplyr_1.0.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 rstudioapi_0.11 magrittr_1.5 tidyselect_1.1.0
[5] lattice_0.20-41 R6_2.4.1 rlang_0.4.7 fansi_0.4.1
[9] stringr_1.4.0 tools_4.0.2 grid_4.0.2 packrat_0.5.0
[13] broom_0.7.0 utf8_1.1.4 cli_2.0.2 ellipsis_0.3.1
[17] assertthat_0.2.1 tibble_3.0.3 lifecycle_0.2.0 crayon_1.3.4
[21] Matrix_1.2-18 purrr_0.3.4 vctrs_0.3.2 tokenizers_0.2.1
[25] SnowballC_0.7.0 glue_1.4.1 stringi_1.4.6 compiler_4.0.2
[29] pillar_1.4.6 generics_0.0.2 backports_1.1.8 pkgconfig_2.0.3