4

I have a list of elemental compositions, each element in it's own row. Sometimes these elements have a zero.

   C H N O S
1  5 5 0 0 0
2  6 4 1 0 1
3  4 6 2 1 0

I need to combine them so that they read, e.g. C5H5, C6H4NS, C4H6N2O. This means that for any element of value "1" I should only take the column name, and for anything with value 0, the column should be skipped altogether.

I'm not really sure where to start here. I could add a new column to make it easier to read across the columns, e.g.

   c C h H n N o O s S
1  C 5 H 5 N 0 O 0 S 0
2  C 6 H 4 N 1 O 0 S 1
3  C 4 H 6 N 2 O 1 S 0

This way, I just need the output to be a single string, but I need to ignore any zero values, and drop the one after the element name.

Sotos
  • 51,121
  • 6
  • 32
  • 66
HarD
  • 183
  • 9

5 Answers5

5

And here a base R solution:

df = read.table(text = "
C H N O S
5 5 0 0 0
6 4 1 0 1
4 6 2 1 0
", header=T)

apply(df, 1, function(x){return(gsub('1', '', paste0(colnames(df)[x > 0], x[x > 0], collapse='')))})
[1] "C5H5"    "C6H4NS"  "C4H6N2O"

paste0(colnames(df)[x > 0], x[x > 0], collapse='') pastes together the column names where the row values are bigger than zero. gsub then removes the ones. And apply does this for each row in the data frame.

bobbel
  • 1,983
  • 6
  • 21
2

Here's a tidyverse solution that uses some reshaping:

df = read.table(text = "
C H N O S
5 5 0 0 0
6 4 1 0 1
4 6 2 1 0
", header=T)

library(tidyverse)

df %>%
  mutate(id = row_number()) %>%                      # add row id
  gather(key, value, -id) %>%                        # reshape data
  filter(value != 0) %>%                             # remove any zero rows
  mutate(value = ifelse(value == 1, "", value)) %>%  # replace 1 with ""
  group_by(id) %>%                                   # for each row
  summarise(v = paste0(key, value, collapse = ""))   # create the string value

# # A tibble: 3 x 2
#      id v      
#   <int> <chr>  
# 1     1 C5H5   
# 2     2 C6H4NS 
# 3     3 C4H6N2O
AntoniosK
  • 15,991
  • 2
  • 19
  • 32
2

Assume that the input matrix m is as given reproducibly in the Note at the end -- convert it to a matrix if it is a data frame using as.matrix.

Now create a matrix the same shape as m with just the letters so now lets contains the letters and m contains the numbers. Then paste the letters and numbers together and replace those cells for which the number is zero with the empty string. Also replace any cells for which the number is 1 with just the letter. Finally paste each row together. No packages are used and no loops or *apply are used.

lets <-  t(replace(t(m), TRUE, colnames(m)))
mm <- paste0(lets, m)
mm <- replace(mm, m == 0, "")
mm <- ifelse(m == 1, lets, mm)
do.call("paste0", as.data.frame(mm))
## [1] "C5H5"    "C6H4NS"  "C4H6N2O"

Note

the input matrix m in reproducible form is assumed to be:

m <- matrix(c(5, 6, 4, 5, 4, 6, 0, 1, 2, 0, 0, 1, 0, 1, 0), 3, 5,
  dimnames = list(NULL, c("C", "H", "N", "O", "S")))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

Another idea that avoids the apply with margin 1,

gsub('1', '', sapply(split(df, 1:nrow(df)), function(i) 
                                 paste(paste0(names(i)[i != 0], i[i != 0]), collapse = '')))

#        1         2         3 
#   "C5H5"  "C6H4NS" "C4H6N2O"
Sotos
  • 51,121
  • 6
  • 32
  • 66
0

Another option

library(dplyr)
#Get indices of all non-zero numbers in the dataframe
inds <- which(df!=0, arr.ind = TRUE)

#Create a dataframe with row index, column index and value at that position
vals <- data.frame(inds, val = df[inds])

#For each row paste the name of the column and value together and then replace 1
vals %>%
  group_by(row) %>%
  summarise(chemical = paste0(names(df)[col], val,collapse = "")) %>%
  mutate(chemical = gsub("[1]", "", chemical))

#   row chemical
#  <int> <chr>   
#1     1 C5H5    
#2     2 C6H4NS  
#3     3 C4H6N2O 
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213