0

I have a Large numeric (46201 elements, 3.3 Mb) in R.

 tdm_pairs.matrix <- as.matrix(tdm_pairs)
 top_pairs <- colSums(tdm_pairs.matrix)
 head(sort(top_pairs, decreasing = T),2)

  i know  i dont i think   i can  i just  i want 
 46      42      41      31      30      28 

I tried this to split up each:

 unlist(strsplit(as.character(top_pairs)," ")) 
 "46" "42" "41" "31" "30" "28"

I'm looking to split each of this up, so the output would be similar to this:

 "i" "know" "46"
 "i" "dont" "42"
smci
  • 32,567
  • 20
  • 113
  • 146
jKraut
  • 2,325
  • 6
  • 35
  • 48

2 Answers2

2

Since your file is large, you might want to use stringi

library(stringi)
data.frame(stri_split_fixed(names(top_pairs), " ", simplify=T),
    count=top_pairs, row.names=seq_along(top_pairs))

#   X1   X2 count
# 1  i know    46
# 2  i dont    42
Rorschach
  • 31,301
  • 5
  • 78
  • 129
1

Something like this?

> top_pairs <- structure(c(46, 42), .Names = c("i know", "i dont"))
> do.call(rbind, strsplit(paste(names(top_pairs), top_pairs), " "))
     [,1] [,2]   [,3]
[1,] "i"  "know" "46"
[2,] "i"  "dont" "42"

or if you want to keep numeric values you can convert to data frame using tidyr:

> library(magrittr)
> library(tidyr)
> data.frame(names=names(top_pairs), count=top_pairs) %>%
    separate(names, into=c("name1", "name2"), sep=" ") %>%
    set_rownames(NULL)

  name1 name2 count
1     i  know    46
2     i  dont    42
zero323
  • 322,348
  • 103
  • 959
  • 935