4

I have hard time to get my table printed with diacritics via knitr package and pandoc. I believe the Name.md file is produced correctly, but gives me error at the pandoc level. What I'm doing wrong? Without diacritics it works perfectly.

Here is example and steps I follow:

Replicate table in R

SampleTable <- data.frame(Nazov=c("Kratkodobé záväzky (TA)","Dlhodobé záväzky 
                                 (LA)","Záväzky celkovo (TA)"))

I run *.Rmd file to create Name.md file

```{r, echo=FALSE, dpi=600, fig.width=12, fig.height=15, fig.cap="Finančná štruktúra"}
   print(xtable(SampleTable))
```

Convert .md into .pdf

knit("Name.rmd")


system(paste("pandoc -V geometry:margin=1in -o", "Report", ".pdf ", "Name", ".md", 
              sep=""))

EDIT: The error:

pandoc.exe: Cannot decode byte '\x20': Data.Text.Encoding.decodeUtf8: Invalid UTF-8
            stream

Warning message:
running command 'pandoc -V geometry:margin=1in -oReport7.pdf ReportNew.md' had status 1
Maximilian
  • 4,177
  • 7
  • 46
  • 85
  • @AnandaMahto: I've posted the error above in EDIT. – Maximilian Sep 16 '13 at 15:58
  • I'm not sure what you mean by "specifying"? I have had problem with referencing but this has been sorted via saving the Rmd file "saving with encoding" and saved it with spec UTF-8. This sorted the referencing below the table only (see above='''{r, fig.cap=""}. – Maximilian Sep 16 '13 at 16:18
  • 1
    From the Pandoc man page: *Pandoc uses the UTF-8 character encoding for both input and output. If your local character encoding is not UTF-8, you should pipe input and output through iconv: `iconv -t utf-8 input.txt | pandoc | iconv -f utf-8`*. See also [here](https://github.com/jgm/pandoc/issues/709) and [here](http://tex.stackexchange.com/questions/97843/how-to-interpret-message-invalid-utf-8-stream-when-trying-to-convert-a-tex-fil) for some more ideas. – A5C1D2H2I1M1N2O1R2T1 Sep 16 '13 at 16:23
  • The above example is exact replica and in fact you can produce such a .Rmd/.md file yourself. If you manage to produce pdf with the above example you found the solution. Could you be more precise about piping the iconv? Where to put it in my post above? – Maximilian Sep 16 '13 at 16:44
  • 1
    It's not *really* an exact replica, is. For starters, you seem to be using the `xtable` package, so I assume you have `library(xtable)` somewhere in your Rmd file. Regarding the `iconv` comment, you would have to do that at the command line or a system call on your .md file before using Pandoc. – A5C1D2H2I1M1N2O1R2T1 Sep 16 '13 at 16:50
  • Ok, I'm going to post it somewhere and will provide the link to it. Thank you. – Maximilian Sep 16 '13 at 16:57
  • Link here: http://www.filetolink.com/e764b77c16 – Maximilian Sep 16 '13 at 17:05
  • I'm sorry, but this requires to have login. Not so good. – Maximilian Sep 16 '13 at 17:06

1 Answers1

3

After viewing your file in a text editor like "geany" which lets you see the file encoding easily (File > Properties), you'll see that the file encoding is ISO-8859-1.

However, as mentioned on the Pandoc man page:

Pandoc uses the UTF-8 character encoding for both input and output. If your local character encoding is not UTF-8, you should pipe input and output through iconv:

iconv -t utf-8 input.txt | pandoc | iconv -f utf-8

As such, what I did at my terminal was (assuming you've changed to the directory your .md file is stored in):

iconv -f ISO-8859-1 -t UTF-8 md_file.md > new.md
pandoc new.md -o test.pdf

If you wish to do this from R, paste together the commands as you have done in your existing question.

Here's the output I got:

enter image description here

Note: I should mention that I am on Ubuntu and iconv is fairly standard in Unix systems.

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • Great. But, how to create the new.md? I have tried this: system(paste("iconv -f ISO-8859-1 -t UTF-8 oldFile.md","newFile",".md", sep="")), doesn't work. I don't really understand the sign ">" in command: md_file.md > new.md. Now I see that the command iconv might not be in my Windows 7. – Maximilian Sep 16 '13 at 18:01
  • @Max, The `>` sign means to take the output from the left side and write it to a new file called whatever we put on the right side. And, yes, I would just run these on separate lines. But why do you want to do this from within R? Why not just switch to the console for a few seconds? I haven't tested your `system` commands, but basically, the `-f` switch = "from", the `-t` switch = "to", then you specify your input file (here, "md_file.md") and output file (here, "new.md") (in case you wanted to paste them together in a similar manner). – A5C1D2H2I1M1N2O1R2T1 Sep 16 '13 at 18:07
  • Here source for further info: http://www.gnu.org/savannah-checkouts/gnu/libiconv/documentation/libiconv-1.13/iconv.1.html – Maximilian Sep 16 '13 at 18:11
  • Still, do you know how to execute the iconv command witin R? I have tried system(paste("iconv -f ISO-8859-1 -t UTF-8 Report1.md > ReportRRR.md", sep="")), but getting error. Basically, I will use this reporting too often so it would be welcomed to have this within R without switching to console. – Maximilian Sep 16 '13 at 20:19
  • R does have an `iconv()` function. – Yihui Xie Sep 16 '13 at 23:17
  • @Yihui, I know that, but I don't know where there problem is occurring. It seemed most straightforward based on the two files they uploaded to just use `iconv` directly. – A5C1D2H2I1M1N2O1R2T1 Sep 17 '13 at 00:19
  • @Yihui: Could you please post the command to convert within R base? Thanks. – Maximilian Sep 19 '13 at 09:05
  • @Max Sure. Assuming your file encoding is ISO-8859-1, you can `con=file('old_file.md',encoding='ISO-8859-1'); x=readLines(con,encoding='UTF-8'); writeLines(x,'new_file.md',useBytes=TRUE)`. However, I think it will be easier to resave your Rmd file with UTF-8 encoding, and knit it in RStudio, then you get a UTF-8 encoded output md file automatically without all the fuss. – Yihui Xie Sep 19 '13 at 13:54