String split based on index

Question

I am having a DF which consist of a column having alpha numeric values.I want to split those values and store it in separate columns.

I am having a data frame which is having a column with alpha numeric values. I want to split that value and store it to new column as shown below in the sample.

str<-c("1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077", "1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077", "1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077", "1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077", "1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077", "1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077")

Output:


AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077

G. Grothendieck · Answer 1 · 2019-05-08T13:47:39.770

Using one line of sample output find the field widths. This is prefaced with 4 since the first 4 characters of the input seem to be missing from the sample output. Then use that in read.fwf. If you really did not want the first 4 characters of the input to appear in the output then replace the read.fwf line with read.fwf(textConnection(str), widths)[-1]. No packages are used.

sample.out <- "AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077"
widths <- c(4, sapply(read.table(text = sample.out, as.is = TRUE), nchar))

read.fwf(textConnection(str), widths)

giving:

    V1   V2       V3   V4     V5   V6 V7                                      V8
1 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
2 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
3 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
4 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
5 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
6 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077

akrun · Accepted Answer · 2019-05-08T13:43:36.500

One option would be to use separate from tidyverse

library(tidyverse)
tibble(col1 = str) %>%
   separate(col1, into = paste0("col", 0:7), c(4, 8, 16, 20, 26, 30, 32)) %>% 
   select(-1)
# A tibble: 6 x 7
#  col1  col2     col3  col4   col5  col6  col7                                   
#  <chr> <chr>    <chr> <chr>  <chr> <chr> <chr>                                  
#1 AA00  100BC300 AA01  111000 AA02  99    F40400F4053DF40C0000F4030000F40680F4077
#2 AA00  100BC300 AA01  111000 AA02  99    F40400F4053DF40C0000F4030000F40680F4077
#3 AA00  100BC300 AA01  111000 AA02  99    F40400F4053DF40C0000F4030000F40680F4077
#4 AA00  100BC300 AA01  111000 AA02  99    F40400F4053DF40C0000F4030000F40680F4077
#5 AA00  100BC300 AA01  111000 AA02  99    F40400F4053DF40C0000F4030000F40680F4077
#6 AA00  100BC300 AA01  111000 AA02  99    F40400F4053DF40C0000F4030000F40680F4077

Or another option is without any packages with base R by creating a delimiter based on position and then read with read.csv

read.csv(text = sub("^.{4}(.{4})(.{8})(.{4})(.{6})(.{4})(.{2})(.*)", 
    "\\1,\\2,\\3,\\4,\\5,\\6,\\7", str), header = FALSE, 
        stringsAsFactors = FALSE)
#   V1       V2   V3     V4   V5 V6                                      V7
#1 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#2 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#3 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#4 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#5 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#6 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077

First solution that was provided by you is working for me.... thanks a lot.... — NiMbuS, May 08 '19 at 14:21

String split based on index

2 Answers2