5

I have vector of characters and I want to make sure all elements of the vector have the same length. Hence I fill short elements up with spaces, like this:

vec <- c("fjdlksa01dada","rau","sjklf")
x <- sprintf("%-15s", vec)
nchar(x)
# returns
[1] 15 15 15

like answers to my previous question suggested. This is fine but it seems to have trouble with umlauts. For example if my vector looks like this:

vec2 <- c("fjdlksa01dada","rauü","sjklf")
y <- sprintf("%-15s", vec)
nchar(y)
# returns
[1] 15 14 15

I am running R on Mac OS X (10.6). How can I fix this?

EDIT: Note, I am not looking to fix the output of nchar because it is correct. The problem is that sprintf looses the umlaut.

EDIT: Update R, changed to DWins locale - no change at all. But:

vec2 <- c("fjdlksa01dada","rauü","sjklf")
Encoding(vec2)
# returns
[1] "unknown" "UTF-8"   "unknown"

strange.

Community
  • 1
  • 1
Matt Bannert
  • 27,631
  • 38
  • 141
  • 207
  • Unable to reproduce on a Mac running 10.5.8/Rv2.14.1 with > Sys.getlocale() = "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8". – IRTFM Feb 15 '12 at 13:11
  • That's very interesting. Do you have -by chance - a manual / link how to install other locales? Plus, I should update R, still running 2.13.2 – Matt Bannert Feb 15 '12 at 13:49
  • Update to 2.14.1 did not help :( – Matt Bannert Feb 15 '12 at 14:03
  • There is a question that was just addressed on rhelp this morning where the poster said she had the same locale settings as you reprote. They are non-standard since 'UTF-8' is not valid and Brian Ripley was wondering how they got that way. Sys.setlocale() is the function to use to change them. – IRTFM Feb 15 '12 at 14:44
  • Is forcing the enoding to something other than UTF-8 acceptable to you? As in `Encoding(vec2) <- "latin1"`. – Richie Cotton Feb 15 '12 at 15:16
  • @Richie UTF-8 is still the way to go but, yes it would be "acceptable" :) . Tried that too. But I get strange characters when I do that and once I gsub them the vector is back to UTF-8. – Matt Bannert Feb 15 '12 at 15:55

2 Answers2

1

There is probably a cleaner way... but this works:

sapply(vec, function(x){
      paste(x, paste(rep(" ", 13-nchar(x)), collapse=""), "")
      })

(see comment below for the [non]-explication for the 13)

nico
  • 50,859
  • 17
  • 87
  • 112
  • hmm, if I run this, all my elements are 17 characters long, but I only want to append spaces up until 15 characters of total length. Note also, that I am not interested in length in the end (just posted nchar that y'all don't have to count) – want the vector elements themselves. – Matt Bannert Feb 15 '12 at 11:58
  • @ran2: true... bizarre... it obviously works by changing 15 for 13... but I am not sure why. The result of sapply is a vector of elements, not the length anyways – nico Feb 15 '12 at 12:48
  • +1 for the hack so far, cause it helps. Still though I´d like to find out how to really fix this the sprintf way. – Matt Bannert Feb 15 '12 at 14:20
1

I found this on the ?sprintf page:

If any element of fmt or any character argument is declared as UTF-8, the element of the result will be in UTF-8 and have the encoding declared as UTF-8. Otherwise it will be in the current locale's encoding.

The input takes its locale from Rgui's locale (i think); see below.

On windows it fortunately already prints:

> vec2 <- c("fjdlksa01dada","rauü","sjklf")
> y <- sprintf("%-15s", vec)
> nchar(y)
[1] 15 15 15

I think on MacOs you can achieve this with opening R like the following, but i dont have any Mac here to actually test this:

Rgui --encoding=utf-8
Bernd Elkemann
  • 23,242
  • 4
  • 37
  • 66
  • I guess `options("encoding")` would be of help, too. – Roman Luštrik Feb 15 '12 at 12:54
  • good thought. Unfortunately I already use utf-8, within R Studio at least all my scripts are saved to UTF-8 and my locale is set to "C/UTF-8/C/C/C/C". But nice to know in Windows sprintf works correctly. – Matt Bannert Feb 15 '12 at 12:56
  • @ran2 R Studio? Hmm. Have you tried running the code in `Rgui --encoding=utf-8`? If it works in Rgui then you know it's R Studio's fault and know where to look for more options – Bernd Elkemann Feb 15 '12 at 13:04
  • Hmm just checked that, same result on the terminal… so not R Studio's bad here. – Matt Bannert Feb 15 '12 at 13:16
  • In the terminal? that's not the same as in Rgui because the strings first get piped through the terminal then. – Bernd Elkemann Feb 15 '12 at 13:18
  • Realized that in the meantime and checked it in R gui, sorry to say it failed too. – Matt Bannert Feb 15 '12 at 14:31
  • 1
    If it matters, it does fail in Linux too (RStudio and terminal, can't test Rgui) – nico Feb 15 '12 at 15:02
  • 1
    hmm let me see if i can find out something, i have a linux machine here, so at least i can test workarounds myself, thanks nico for mentioning that – Bernd Elkemann Feb 15 '12 at 15:30