Extract the Chr number from the column

Question

I have a data frame that has a column containing the chromosome details (1 to 22). I would like to create another column with only Chr numbers

Try `stringr::str_extract(c("chr6_GL", "chr18"), "(?<=^chr)\\d+")`. — Martin Gal, Oct 12 '21 at 13:59

Bast_B · Accepted Answer · 2021-10-12T14:41:24.800

2

Without a reproducible example it will be hard to answer. Using stringr package and regex you may achieve what you are searching for but you need to know all possibilities. Maybe if there is only underscore between what you want and annoying information, you can solve your problem using str_split and "_" as pattern parameter. Please refer to https://stackoverflow.com/help/how-to-ask

library(stringr)
df <- data.frame(chromosome = c("chr6_GL000253v2_alt", "chr6_GL000254v2_alt",
                                "chr6_GL000255v2_alt", "chr6_GL000256v2_alt", "chr4", "chr11",
                                "chr8", "chr12", "chr2", "chr12", "chr4", "chr6", "chr15", "chr4",
                                "chr2"))
df$chromosome_fixed=str_split(df$chromosome,"_",simplify = T)[,1]

edited Oct 12 '21 at 14:41

answered Oct 12 '21 at 13:57

Bast_B

143
6

While I agree, I think this should be a comment instead of an answer. You could actually create a small example and show your (working) solution for this problem. – Martin Gal Oct 12 '21 at 14:37
1

Thanks Martin, you are right, I edited my answer accordingly – Bast_B Oct 12 '21 at 14:41
Bast & Martin, thank you very much. This helped me for sure. The new column has Chr in that like Chr1, Chr14 etc --> I just want to get new column with numbers 1,14,2,16 etc – Mahan Oct 13 '21 at 09:54
You can remove "chr" by doing a `substring` substring(df$chromosome_fixed ,3) – Bast_B Oct 13 '21 at 14:34

lovalery · Answer 2 · 2021-10-12T14:53:13.467

Please find below a solution with the package data.table:

REPREX

Code

library(data.table)
library(stringr)

DT[, Chr_ID := lapply(.SD, str_extract,"(?<=^chr)\\d+"), .SDcols = "chromosome"]

Output

DT
#>              chromosome Chr_ID
#>  1: chr6_GL000253v2_alt      6
#>  2: chr6_GL000254v2_alt      6
#>  3: chr6_GL000255v2_alt      6
#>  4: chr6_GL000256v2_alt      6
#>  5:                chr4      4
#>  6:               chr11     11
#>  7:                chr8      8
#>  8:               chr12     12
#>  9:                chr2      2
#> 10:               chr12     12
#> 11:                chr4      4
#> 12:                chr6      6
#> 13:               chr15     15
#> 14:                chr4      4
#> 15:                chr2      2

Your data

DT <- data.table(chromosome = c("chr6_GL000253v2_alt", "chr6_GL000254v2_alt",
                 "chr6_GL000255v2_alt", "chr6_GL000256v2_alt", "chr4", "chr11",
                 "chr8", "chr12", "chr2", "chr12", "chr4", "chr6", "chr15", "chr4",
                 "chr2"))
DT
#>              chromosome
#>  1: chr6_GL000253v2_alt
#>  2: chr6_GL000254v2_alt
#>  3: chr6_GL000255v2_alt
#>  4: chr6_GL000256v2_alt
#>  5:                chr4
#>  6:               chr11
#>  7:                chr8
#>  8:               chr12
#>  9:                chr2
#> 10:               chr12
#> 11:                chr4
#> 12:                chr6
#> 13:               chr15
#> 14:                chr4
#> 15:                chr2

^{Created on 2021-10-12 by the reprex package (v2.0.1)}

You are just extracting the 4th character, so `chr11` is transformed into `1`. I doubt this is a correct solution. — Martin Gal, Oct 12 '21 at 14:33
Oops! You are of course right @Martin Gal. Sorry about that. I just edited my answer based on your previous comment. Thanks again for your feedback — lovalery, Oct 12 '21 at 14:55

score -1 · Answer 3 · answered Oct 12 '21 at 14:03

-1

Since you didn't share the data. I've created similar column and extracted the numbers to a new column called Number:

#Populate a dummy table
df = pd.DataFrame(data=['chr6_GL','chr6_GL00','chr4','chr11','chr8','chr12'], columns=['Data'])
#Extract the numbers using regex and assign it to a new column called 'Number'
df['Numbers']=df['Data'].str.extract(r'chr([0-9]*)')

Data Numbers

answered Oct 12 '21 at 14:03

pyzer

122
7

This looks like *Python* code but this is a R related question. – Martin Gal Oct 12 '21 at 14:31

Extract the Chr number from the column

3 Answers3