I'm using openCPU and knitr to generate custom feedback after surveys. To this end, I basically let survey developers specify rmd files. In this use case, the survey developers are trusted, but the survey takers may not be.
I'm now thinking about XSS. It's not a big worry as user feedback will of course usually only be displayed to the user who entered the data on display, but of course characters like '<' will be used for non-malicious reasons and I'd like to think ahead and explore some of the trials and tribulations of freely mixing R with web apps.
Knitr and R generally were not made with untrusted users and XSS in my mind. OpenCPU rectifies many security issues with running AppArmored-R as an API, but I wonder whether a maximum-flexibility approach like mine can also be proofed.
Possible points at which one might separate trusted and untrusted markup:
- Before knitting, i.e. I pass escaped user data to the rmd-file. Drawback: An oblivious survey dev might unescape it accidentally or because it's annoying in some context.
- During knitting. This would be ideal, I guess, but I don't know if it's possible, especially if a survey dev could potentially alter chunk settings.
- After knitting. I think it's impossible to separate trusted and untrusted markup post-hoc.
Some code to paste into OpenCPU's knitr app:
```{r}
good_userdata = 'I like brackets [].'
bad_userdata = 'some text should not be
[linked](javascript:location.href=\'http://example.com?secrets\';), <s>struck</s> or __bold__'
escape_html = highr:::escape_html
escape_md <- function(x){
x <- gsub('\\[', '\\\\[', x);
x <- gsub('_', '\\\\_', x);
x
}
good_userdata_escaped = escape_md(escape_html(good_userdata))
bad_userdata_escaped = escape_md(escape_html(bad_userdata))
```
## let's say survey devs wants to print text like this
```{r}
cat(good_userdata_escaped)
cat(bad_userdata_escaped) # doesn't know about text like this
```
## gets annoyed, does
```{r}
good_userdata_escaped <- gsub('\\\\', '', good_userdata_escaped);
bad_userdata_escaped <- gsub('\\\\', '', bad_userdata_escaped);
```
##
so that this looks nice
```{r}
cat(good_userdata_escaped)
```
## later renders the same text inline, so that is evaluated as markdown
`r good_userdata_escaped # doesn't look dangerous`
`r bad_userdata_escaped`
Edit 2
Sorry, I had provided only some HTML tags, thinking possible XSS attacks were obvious. Michel Fortin had some examples on his page.