6

I would like to shorten the values of one column of my data.frame. Right now, each value consists of many letters, such as

df$col1
[1] AHG    ALK    OPH   BCZ   LKH    QRQ    AAA   VYY

what I need is only the first letter:

df$col1
[1] A    A    O   B   L    Q    A   V

I have read other entries that suggested to use gsub, stri_replace_all_charclass, or strsplit. But I am afraid I need help to implement this.

PikkuKatja
  • 1,101
  • 3
  • 13
  • 21

5 Answers5

11

You can use strtrim

df$col1 <- strtrim(df$col1, 1)
rmuc8
  • 2,869
  • 7
  • 27
  • 36
3

The stringr package is great:

require(stringr)

df <- data.frame(col1 = c("AHG", "ALK", "OPH", "BCZ", "LKH", "QRQ", "AAA", "VYY"))

str_sub(df$col1, 1, 1)

[1] "A" "A" "O" "B" "L" "Q" "A" "V"
r.bot
  • 5,309
  • 1
  • 34
  • 45
  • 2
    Why do you need a package here? The syntax is exactly the same as for the base function `substr`. – Roland Mar 11 '15 at 10:51
1

What you need is the substring function:

df$col1 <- substr(df$col1, 1, 1)
justQuest
  • 11
  • 3
1

I agree with Robin. using the substr or substring function will directly do the trick without having to install any package.

df$col1 <- substr(df$col1, 1, 1)

or df$col1 <- substring(df$col1,1,1)

use syntax substr (target vector, start place, stop place)

Aayush Agrawal
  • 184
  • 1
  • 6
0

In case you want to shorten the (string) values of an entire dataframe df, you can use:

apply( df, 2, strtrim, 4)

This shortens all strings to 4. Comes very handy for pretty-printing dataframes, too.

untill
  • 1,513
  • 16
  • 20