0

I am having a DF which consist of a column having alpha numeric values.I want to split those values and store it in separate columns.

I am having a data frame which is having a column with alpha numeric values. I want to split that value and store it to new column as shown below in the sample.

str<-c("1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077", "1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077", "1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077", "1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077", "1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077", "1001AA00100BC300AA01111000AA0299F40400F4053DF40C0000F4030000F40680F4077")

Output:


AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077

NiMbuS
  • 87
  • 2
  • 9

2 Answers2

2

Using one line of sample output find the field widths. This is prefaced with 4 since the first 4 characters of the input seem to be missing from the sample output. Then use that in read.fwf. If you really did not want the first 4 characters of the input to appear in the output then replace the read.fwf line with read.fwf(textConnection(str), widths)[-1]. No packages are used.

sample.out <- "AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077"
widths <- c(4, sapply(read.table(text = sample.out, as.is = TRUE), nchar))

read.fwf(textConnection(str), widths)

giving:

    V1   V2       V3   V4     V5   V6 V7                                      V8
1 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
2 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
3 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
4 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
5 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
6 1001 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

One option would be to use separate from tidyverse

library(tidyverse)
tibble(col1 = str) %>%
   separate(col1, into = paste0("col", 0:7), c(4, 8, 16, 20, 26, 30, 32)) %>% 
   select(-1)
# A tibble: 6 x 7
#  col1  col2     col3  col4   col5  col6  col7                                   
#  <chr> <chr>    <chr> <chr>  <chr> <chr> <chr>                                  
#1 AA00  100BC300 AA01  111000 AA02  99    F40400F4053DF40C0000F4030000F40680F4077
#2 AA00  100BC300 AA01  111000 AA02  99    F40400F4053DF40C0000F4030000F40680F4077
#3 AA00  100BC300 AA01  111000 AA02  99    F40400F4053DF40C0000F4030000F40680F4077
#4 AA00  100BC300 AA01  111000 AA02  99    F40400F4053DF40C0000F4030000F40680F4077
#5 AA00  100BC300 AA01  111000 AA02  99    F40400F4053DF40C0000F4030000F40680F4077
#6 AA00  100BC300 AA01  111000 AA02  99    F40400F4053DF40C0000F4030000F40680F4077

Or another option is without any packages with base R by creating a delimiter based on position and then read with read.csv

read.csv(text = sub("^.{4}(.{4})(.{8})(.{4})(.{6})(.{4})(.{2})(.*)", 
    "\\1,\\2,\\3,\\4,\\5,\\6,\\7", str), header = FALSE, 
        stringsAsFactors = FALSE)
#   V1       V2   V3     V4   V5 V6                                      V7
#1 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#2 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#3 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#4 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#5 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
#6 AA00 100BC300 AA01 111000 AA02 99 F40400F4053DF40C0000F4030000F40680F4077
akrun
  • 874,273
  • 37
  • 540
  • 662