1

I'm relatively new R user and still learning the basics.

I have a named list xx those entries looks like this:

> xx[100:105]
$`15LOX-1`
[1] "207328_at"

$`16.1`
[1] "215946_x_at"

$`16.2`
[1] NA

$`16.3A5`
[1] "200983_x_at" "200984_s_at" "200985_s_at" "212463_at"   "228748_at"  

$`160-KD`
[1] "201224_s_at" "201225_s_at"

$`1600019D15Rik`
[1] "218465_at"   "222642_s_at" "225492_at"   "235907_at"   "238831_at"  

I would like to save it to a text file with two columns - Key and Value. If several strings correspond to the same key they should be in different rows. Double-quote symbols are not required.

In addition, how I can avoid NA values to be saved?

Please help.

yuk
  • 19,098
  • 13
  • 68
  • 99

4 Answers4

4

Recreate test data:

xx <- structure(list(
    `15LOX-1` = "207328_at", 
    `16.1` = "215946_x_at", 
    `16.2` = NA, 
    `16.3A5` = c("200983_x_at", "200984_s_at", "200985_s_at", "212463_at", "228748_at"), 
    `160-KD` = c("201224_s_at", "201225_s_at" ), 
    `1600019D15Rik` = c("218465_at", "222642_s_at", "225492_at", "235907_at", "238831_at")),
   .Names = c("15LOX-1", "16.1", "16.2", "16.3A5", "160-KD", "1600019D15Rik"))

First, remove all the NA values:

xx[is.na(xx)] <- NULL

Now, create a temporary variable that stores the length of each element in x:

tmp <- sapply(xx, function(xt)length(xt))

Now use rep to create the key (i.e. repeat the names of x, each time the length of the associated element), and use a combination of unlist and unname to create the values:

data.frame(
    key = rep(names(tmp), times=unname(tmp)),
    value = unlist(unname(xx))
)

This produces:

             key       value
1        15LOX-1   207328_at
2           16.1 215946_x_at
3         16.3A5 200983_x_at
4         16.3A5 200984_s_at
5         16.3A5 200985_s_at
6         16.3A5   212463_at
7         16.3A5   228748_at
8         160-KD 201224_s_at
9         160-KD 201225_s_at
10 1600019D15Rik   218465_at
11 1600019D15Rik 222642_s_at
12 1600019D15Rik   225492_at
13 1600019D15Rik   235907_at
14 1600019D15Rik   238831_at

Finally, use write.csv(x, file=...) or your favourite write function to save the data to file.

Andrie
  • 176,377
  • 47
  • 447
  • 496
  • Works for me. Here's dput on an object that matches original data: > dput(xx[100:105]) structure(list(`15LOX-1` = "207328_at", `16.1` = "215946_x_at", `16.2` = NA, `16.3A5` = c("200983_x_at", "200984_s_at", "200985_s_at", "212463_at", "228748_at"), `160-KD` = c("201224_s_at", "201225_s_at" ), `1600019D15Rik` = c("218465_at", "222642_s_at", "225492_at", "235907_at", "238831_at")), .Names = c("15LOX-1", "16.1", "16.2", "16.3A5", "160-KD", "1600019D15Rik")) – IRTFM Jun 28 '11 at 21:57
  • @DWin, Fab. Can you do me a favour and edit my question and paste directly? The reason is that SO is too clever for its own good - all of the backticks are interpreted as special characters and disappear in translation when I paste your into my editor. – Andrie Jun 28 '11 at 22:03
  • @Andrie: OK, I did but it looks pretty much the same. Maybe you can fix it from the editor window? – IRTFM Jun 28 '11 at 22:07
  • @Andrie: Sorry for the delay. I couldn't access the site for the whole day for some reason. Thanks a lot for your help. I accepted this answer since it uses basic R and also creates the dataframe that I will need. Thanks. – yuk Jun 30 '11 at 04:45
4

reshape2 package can do this with the melt function. Using the data from Andrie:

require(reshape2)
 > melt(x)
  value L1
1    a1  A
2    b1  B
3  <NA>  C
4    d1  D
5    d2  D
6    d3  D
7    d4  D
8    d5  D

A few things are a not exactly as you want here. First, the columns are in reverse order which may or may not be an issue. Second, the names associated with the columns are not exactly as you wanted, again may not be an issue. Third, the NA value is still present, which is an issue based on your question. I'd use complete.cases() to address that issue and give it some appropriate names. Maybe something like this:

out <- melt(x)[, 2:1] #Reverse the key - value columns
out <- out[complete.cases(out) ,] #Subset only complete cases
names(out) <- c("Key", "Value")  #New column names

> out
  Key Value
1   A    a1
2   B    b1
4   D    d1
.....
Chase
  • 67,710
  • 18
  • 144
  • 161
  • +1 This is cool. Although I frequently use reshape2, I didn't know it would do this with a list. – Andrie Jun 28 '11 at 22:17
3

Using Andrie's test data, here's a kind of slick way to do this using the reshape package (or `reshape2'):

x <- list(
    A = "a1",
    B = "b1",
    C = NA,
    D = paste("d", 1:5, sep=""))

Next, melt has a list method!

> melt(x)
  value L1
1    a1  A
2    b1  B
3  <NA>  C
4    d1  D
5    d2  D
6    d3  D
7    d4  D
8    d5  D

Then we can pull out the NAs using complete.cases or something equivalent:

rs <- melt(x)
rs <- rs[complete.cases(x),]
colnames(rs) <- c('value','key')
joran
  • 169,992
  • 32
  • 429
  • 468
  • `melt()` is in the `reshape` and `reshape2` package(s), not `plyr`. +1 for beating me to the punch though! – Chase Jun 28 '11 at 21:44
2

I would do something like this.

#Create a matrix
z <- cbind(key=rep(names(xx), sapply(xx, length)), value = unlist(xx))
#Remove NA
z <- z[!is.na(z[,2]),]
#Write to textfile
write.table(z, "filename.txt", row.names= F)

You can look at the help of write.table to see the other available options.

Here's the result as asked by Andrie

>z
                key             value        
15LOX-1        "15LOX-1"       "207328_at"  
16.1           "16.1"          "215946_x_at"
16.3A51        "16.3A5"        "200983_x_at"
16.3A52        "16.3A5"        "200984_s_at"
16.3A53        "16.3A5"        "200985_s_at"
16.3A54        "16.3A5"        "212463_at"  
16.3A55        "16.3A5"        "228748_at"  
160-KD1        "160-KD"        "201224_s_at"
160-KD2        "160-KD"        "201225_s_at"
1600019D15Rik1 "1600019D15Rik" "218465_at"  
1600019D15Rik2 "1600019D15Rik" "222642_s_at"
1600019D15Rik3 "1600019D15Rik" "225492_at"  
1600019D15Rik4 "1600019D15Rik" "235907_at"  
1600019D15Rik5 "1600019D15Rik" "238831_at"  

HTH

Luciano Selzer
  • 9,806
  • 3
  • 42
  • 40
  • This probably works, but you can save me some effort to test it by pasting the results of your code in your answer. – Andrie Jun 28 '11 at 21:14