Going from a list of elements to chemical formula

Question

I have a list of elemental compositions, each element in it's own row. Sometimes these elements have a zero.

   C H N O S
1  5 5 0 0 0
2  6 4 1 0 1
3  4 6 2 1 0

I need to combine them so that they read, e.g. C5H5, C6H4NS, C4H6N2O. This means that for any element of value "1" I should only take the column name, and for anything with value 0, the column should be skipped altogether.

I'm not really sure where to start here. I could add a new column to make it easier to read across the columns, e.g.

   c C h H n N o O s S
1  C 5 H 5 N 0 O 0 S 0
2  C 6 H 4 N 1 O 0 S 1
3  C 4 H 6 N 2 O 1 S 0

This way, I just need the output to be a single string, but I need to ignore any zero values, and drop the one after the element name.

bobbel · Accepted Answer · 2018-10-18T12:40:07.883

5

And here a base R solution:

df = read.table(text = "
C H N O S
5 5 0 0 0
6 4 1 0 1
4 6 2 1 0
", header=T)

apply(df, 1, function(x){return(gsub('1', '', paste0(colnames(df)[x > 0], x[x > 0], collapse='')))})
[1] "C5H5"    "C6H4NS"  "C4H6N2O"

paste0(colnames(df)[x > 0], x[x > 0], collapse='') pastes together the column names where the row values are bigger than zero. gsub then removes the ones. And apply does this for each row in the data frame.

edited Oct 18 '18 at 12:40

answered Oct 18 '18 at 12:27

bobbel

1,983
6
21

Good one, was going the same route. btw, no need for `return`. – zx8754 Oct 18 '18 at 12:39
@zx8754 True, but I find the whole idea of functions returning stuff without return statements a bit scary. – bobbel Oct 18 '18 at 12:42

score 2 · Answer 2 · answered Oct 18 '18 at 12:30

Here's a tidyverse solution that uses some reshaping:

df = read.table(text = "
C H N O S
5 5 0 0 0
6 4 1 0 1
4 6 2 1 0
", header=T)

library(tidyverse)

df %>%
  mutate(id = row_number()) %>%                      # add row id
  gather(key, value, -id) %>%                        # reshape data
  filter(value != 0) %>%                             # remove any zero rows
  mutate(value = ifelse(value == 1, "", value)) %>%  # replace 1 with ""
  group_by(id) %>%                                   # for each row
  summarise(v = paste0(key, value, collapse = ""))   # create the string value

# # A tibble: 3 x 2
#      id v      
#   <int> <chr>  
# 1     1 C5H5   
# 2     2 C6H4NS 
# 3     3 C4H6N2O

G. Grothendieck · Answer 3 · 2018-10-18T13:36:20.393

Assume that the input matrix m is as given reproducibly in the Note at the end -- convert it to a matrix if it is a data frame using as.matrix.

Now create a matrix the same shape as m with just the letters so now lets contains the letters and m contains the numbers. Then paste the letters and numbers together and replace those cells for which the number is zero with the empty string. Also replace any cells for which the number is 1 with just the letter. Finally paste each row together. No packages are used and no loops or *apply are used.

lets <-  t(replace(t(m), TRUE, colnames(m)))
mm <- paste0(lets, m)
mm <- replace(mm, m == 0, "")
mm <- ifelse(m == 1, lets, mm)
do.call("paste0", as.data.frame(mm))
## [1] "C5H5"    "C6H4NS"  "C4H6N2O"

Note

the input matrix m in reproducible form is assumed to be:

m <- matrix(c(5, 6, 4, 5, 4, 6, 0, 1, 2, 0, 0, 1, 0, 1, 0), 3, 5,
  dimnames = list(NULL, c("C", "H", "N", "O", "S")))

score 1 · Answer 4 · answered Oct 18 '18 at 12:42

Another idea that avoids the apply with margin 1,

gsub('1', '', sapply(split(df, 1:nrow(df)), function(i) 
                                 paste(paste0(names(i)[i != 0], i[i != 0]), collapse = '')))

#        1         2         3 
#   "C5H5"  "C6H4NS" "C4H6N2O"

score 0 · Answer 5 · answered Oct 18 '18 at 12:47

Another option

library(dplyr)
#Get indices of all non-zero numbers in the dataframe
inds <- which(df!=0, arr.ind = TRUE)

#Create a dataframe with row index, column index and value at that position
vals <- data.frame(inds, val = df[inds])

#For each row paste the name of the column and value together and then replace 1
vals %>%
  group_by(row) %>%
  summarise(chemical = paste0(names(df)[col], val,collapse = "")) %>%
  mutate(chemical = gsub("[1]", "", chemical))

#   row chemical
#  <int> <chr>   
#1     1 C5H5    
#2     2 C6H4NS  
#3     3 C4H6N2O

Going from a list of elements to chemical formula

5 Answers5

Note