2

I'm working on a book project in RStudio (2022.07.2 Build 576) using quarto (v. 1.1.189) and pandoc (v. 2.19.2). I am compiling only to HTML.

Yesterday, rendering the book failed with the message below, but without any indication where this might be.

pandoc.exe: Cannot decode byte '\x93': Data.Text.Internal.Encoding.decodeUtf8: Invalid UTF-8 stream

(This character seems to be an open 'smart quote', represented in LaTeX by "``")

I've searched for non-ascii characters in all my source files, but find none.

> files <- list.files(pattern = "*.qmd")
> for(f in files) {tools::showNonASCIIfile(f)}
> bibs <- list.files("bib", pattern = "*.bib")
> for(f in bibs) {tools::showNonASCIIfile(f)}
> R <- list.files("R", pattern = "*.R", full.names = TRUE)
> for(f in R) {tools::showNonASCIIfile(f)}

What else can I do to track down this error? I tried commenting out all chapters except for index.qmd and get the same result. I also tried removing nearly all the text from index.qmd, same. Is it possible that quarto has written something evil in a file in .quarto?

pandoc returning "Cannot decode byte '\xf9'" mentions this, but in the context of downloading files from a website. Another SO query, Pandoc in papaja won't decode byte \xc6 mentions using iconv but I have no idea how to use this in RStudio

Edit: It was suggested to me that there might be something weird in the .quarto/_freeze folder, but I deleted this, and then the entire .quarto/ folder. The error still persists.

I still have no idea how to find the source of the problem.

Update: I filed this pandoc issue #8884. It was suggested that I upgrade pandoc to the latest version, 3.1.2, which I did. The same error continues, but my console shows a bit more detail:

processing file: index.qmd
1/6 [unnamed-chunk-1]   
2/6                     
3/6 [unnamed-chunk-2]   


processing file: ./flatland.qmd
1/5                       
2/5 [fig-flatland-spheres]
3/5                       
4/5 [fig-1D-4D]           
5/5                       
4/6                     
5/6 [pollen-eureka-code]
6/6                     
output file: index.knit.md

pandoc.exe: Cannot decode byte '\x93': Data.Text.Internal.Encoding.decodeUtf8: Invalid UTF-8 stream

So, the problem seems to stem from index.qmd. Yet, if I reduce index.qmd to just one line,

# Preface {.unnumbered}

I still get the same error:

Rendering:
[1/1] index.qmd
pandoc.exe: Cannot decode byte '\x93': Data.Text.Internal.Encoding.decodeUtf8: Invalid UTF-8 stream
user101089
  • 3,756
  • 1
  • 26
  • 53

1 Answers1

1

Question to self: Did you check non-ascii in any files outside your project folder but imported?

Ugh: I had referenced one .bib file by its location

bibliography:
  - bib/references.bib
  - bib/R-refs.bib
  - bib/packages.bib
  - "C:/Dropbox/localtexmf/bibtex/bib/timeref.bib"

That one did contain a string using smart-quotes.

Correcting that solved the problem. Still I think pandoc could be more helpful, by providing the filename and location where such an error occurs. I filed this issue: https://github.com/jgm/pandoc/issues/8884

user101089
  • 3,756
  • 1
  • 26
  • 53