2

I've read the threads and package updates for encoding issues with Shiny, but I have a (difficult-to-reproduce example) database-driven Shiny app which is fumbling some special characters.

In my postgresql database I see correctly my Swedish river, "Upper Umeälven River", which - when I filter it back to the Shiny interface with dplyr: names.rivers <- filter(tbl.rivers, Country == "Sweden") ...becomes "Upper Umeälven River" in R.

I'm using UTF-8 encoding locally; I guess I'm losing something on the exchange with the database.

Sys.getlocale() [1] "LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252"

Apologies again for the lack of example, it's ONLY an issue pulling from the database. I suspect I'm missing a flag on some sanitizing function someplace, but need some help getting pointed the right direction.

Carl
  • 5,569
  • 6
  • 39
  • 74
Jeff
  • 78
  • 1
  • 8
  • You are connecting to the DB with `dplyr`? – Carl Aug 10 '16 at 14:50
  • Hi @Carl, yes, connecting and filtering with dplyr per https://cran.r-project.org/web/packages/dplyr/vignettes/databases.html. – Jeff Aug 10 '16 at 15:47
  • Does the issue only appear with `shiny` or do you see the problem anytime you query the DB with `dplyr` – Carl Aug 10 '16 at 15:54
  • Just tested with both ```library(RPostgreSQL)``` and ```library(DBI)``` and I get the same result -- not just in ```shiny```, but R @Carl. So it's not a ```dplyr``` issue in fact. I still can't seem to find the encoding declaration. – Jeff Aug 10 '16 at 16:09

2 Answers2

1

In your code page 1252 Windows Latin 1 the rendering for the 'ä' in Upper Umeälven River is to the code point 0xE4 (binary 11100100).

The Upper Umeälven River in the same code page has the two octets 0xC3A4 (XXX00011 XX100100).

However, if you consider the UTF-8 encoding rules of the code point, the significant bits are exactly the same.

Somewhere there is an inadvertent, or erroneous, character encoding taking place that transposes the character into UTF-8, but still considers the string to have the Windows Latin 1 code page.

Perhaps the data is already being received in UTF-8 and you can change the code page to receiving code page to reflect that. There may be a silent transformation happening somewhere further back, and no indication of this.

Pekka
  • 3,529
  • 27
  • 45
  • Thanks @Pekka for the links -- after a brief panic (I read postgresql installations used to default to latin-1 encoding) I found that, at least in 9.4/9.5, I lucked out: ```ENCODING = 'UTF8'``` – Jeff Aug 10 '16 at 23:39
  • (For anyone following along, in pgAdmin, the SQL panel shows the data definition language to DROP or CREATE, which includes the encoding spec.). So, I won't spend the day rebuilding a DB! However I'm not much closer to cleaning up my interface. Beginning to feel like I'm missing something simple. – Jeff Aug 10 '16 at 23:52
1

As suspected, the answer was simple: iconv(vector.to.convert, "UTF-8")

My "learnings":

  1. Encodings of the source file, the database, and data streams are not the same thing;
  2. I spent time making sure the data sources had been created in the correct encoding, ignoring the (implicit?) conversion of the datastream;
  3. This page helped: http://shiny.rstudio.com/articles/unicode.html

My understanding is a bit shallow, but - frankly - I'm not digging deeper into the world of character encoding for the moment. I hope it helps someone else avoid the error!

Jeff
  • 78
  • 1
  • 8
  • 1
    Hey @Jeff, I'm hoping you've dug your way deeper into the world of character encoding. I'm in exactly the same situation, and `iconv()` solves the issues as well. But I'm not exactly sure what the issue is, your answer is not completely self explanatory. Running `iconv()` on every column does not seem like the most elegant solution, so I'm looking for a way to solve the general issue. – Ratnanil Nov 25 '17 at 18:52
  • Hi @Ratnanil - for our case the iconv() populates just one control, and on a (relatively) small resultset (generally fewer than 100 values); it's been a while, but I didn't remember digging deeper. You're pushing postgresql queries to R? There have so many changes in R vis databases, to be honest I assumed this was an internationalization bug, and I'm surprised it's still cropping up. Have you verified it shows up with other libraries? – Jeff Nov 27 '17 at 11:35
  • No I haven't tested it with other libraries yet, I'll try that when I get the time. I wrote a quick function with runs `iconv` on all columns in a `data.frame`. This makes the workflow somewhat acceptable. – Ratnanil Nov 29 '17 at 09:57