4

I have written a manuscript using bookdown in Rstudio for a specific project that cites references from a bibtex file. This is a single .bib file that I use for many documents, so it is outside my project folder and contains many references that aren't cited in the present manuscript. To make this easier to share, I would like to make a smaller .bib file showing only those references I actually cite in the manuscript.

Other questions have addressed how to do this for:

  1. pure Tex using the citations given in the .aux file. I can generate an .aux file by setting options(tinytex.clean = FALSE), but it doesn't contain any citations.
  2. pandoc/markdown, but I have no idea how one would apply this to Rmarkdown.

Does anyone know of a way to do this for an Rmarkdown document? Thanks!

I am using this YAML header and knitting within Rstudio:

output:
  bookdown::pdf_book:
    keep_tex: yes

Full sessionInfo:

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8       
 [4] LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.6.3  bookdown_0.20   htmltools_0.4.0 tools_3.6.3     yaml_2.2.0     
 [6] Rcpp_1.0.3      rmarkdown_2.3   knitr_1.29      xfun_0.15       digest_0.6.25  
[11] packrat_0.5.0   rlang_0.4.7     evaluate_0.14  
Yihui Xie
  • 28,913
  • 23
  • 193
  • 419
tellis
  • 152
  • 6

1 Answers1

1

Since you write in .Rmd you can use the following R-function to clean up your bib-file:

library(stringr)

clean_bib <- function(input_file, input_bib, output_bib){
  lines <- paste(readLines(input_file), collapse = "")
  entries <- unique(str_match_all(lines, "@([a-zA-Z0-9]+)[,\\. \\?\\!\\]\\;]")[[1]][, 2])

  bib <- paste(readLines(input_bib), collapse = "\n")
  bib <- unlist(strsplit(bib, "\n@"))

  output <- sapply(entries, grep, bib, value = T)
  output <- paste("@", output, sep = "")

  writeLines(unlist(output), output_bib)
}
# now call the function
clean_bib(...)

Just call it in the setup chunk.

What does the function do? It first searches all citations in the input-file, meaning a string starting with @, containing letters and numbers and ending with a comma, dot, question mark, exclamation mark, space or ] -- adjust this to your needs.

Then it constructs a new bib file only containing these entries.

Johannes Titz
  • 972
  • 6
  • 11
  • 1
    Great answer, I made a minor alteration to the regex for `entries` to allow for citations to end with a semicolon where you have a few citations at once: "@([a-zA-Z0-9]+)[,\\. \\?\\!\\]\\;]" – tellis Jul 12 '21 at 12:15
  • Yes, this is helpful. I use commas in my own work, so I did not think of semicolon. I will change the regex in the answer. Thanks! – Johannes Titz Jul 12 '21 at 15:46
  • @JohannesTitz this is great. In my input bib I have two papers with the same author in the same year (Greenwood 2015) though they have different citation keys. It has produced an error and it appears that the two citations have been made into a vector: @c("article{Greenwood2015,\nabstract = {...", "article{Greenwood2015f,\nabstract = {...\nyear = {2015}\n}" ). How can this be avoided? Only one of the articles was used in my document – Mark Davies Mar 06 '22 at 21:27
  • What would be great if this could be generalized to take a vector of `.bib` files, which is my common use case. Under `Latex` I have used a perl `aux2bib` script for this purpose, but it would be nicer to have an R solution. – user101089 May 27 '22 at 19:52