0

I have the following list df in R:

structure(list(disease = structure(c(1L, 1L), .Label = "Barcelona", class = "factor"), 
    `<18` = structure(list(0.193103448275862, 
        0.0445344129554656), .Names = c(NA_character_, NA_character_
    )), `19-25` = structure(list(0.0413793103448276, 
        0.345748987854251), .Names = c(NA_character_, NA_character_
    )), `26-64` = structure(list(0.448275862068966, 0.167611336032389), .Names = c(NA_character_, 
    NA_character_)), `46-64` = structure(list(0.0344827586206897, 
        0.00647773279352227), .Names = c(NA_character_, NA_character_
    )), `>65` = structure(list(0.282758620689655, 
        0.435627530364373), .Names = c(NA_character_, NA_character_
    )), type = structure(1:2, .Label = c("Clinical Trial", "Real-World"
    ), class = "factor")), class = "data.frame", row.names = c(NA, 
-2L))

I want to rearrange the dataframe, so that I can get each value by city, flat and agegroup using melt. However, I get an extra column as an output:

melt(df)
           city           type           variable      value          NA
1  Barcelona       flat                  <18           0.19310345 0.044534413
2  Barcelona       house                 <18           0.19310345 0.044534413
3  Barcelona       flat                  19 - 25       0.04137931 0.345748988
4  Barcelona       house                 19 - 25       0.04137931 0.345748988
5  Barcelona       flat                  26 - 45       0.44827586 0.167611336
6  Barcelona       house                 26 - 45       0.44827586 0.167611336
7  Barcelona       flat                  46 - 64       0.03448276 0.006477733
8  Barcelona       house                 46 - 64       0.03448276 0.006477733
9  Barcelona       flat                  > 65          0.28275862 0.435627530
10 Barcelona       house                 > 65          0.28275862 0.435627530

Is there any way of not having the NA column and getting unique values in the value column?

Daniel
  • 471
  • 3
  • 8
  • 18
  • 1
    Can you please make your problem reproducible? When I read in the data you give in your question and run `reshape2::melt()` on it, I get the expected output with no `NA` column. (And please use `dput()` to share data so it is copy/pasteable, it's annoying to have to clean up line breaks to read in your data.) – Gregor Thomas May 11 '21 at 15:34
  • Ah, I'm glad you used `dput()`. The issue is that your columns are `list`s, not `numeric`. – Gregor Thomas May 11 '21 at 16:18

1 Answers1

3

The issue is that your measure columns are list class, not numeric class. If we convert them to numeric, melt will work fine. (I show one way to do it, but it might be better to go earlier in your workstream and prevent the columns from being created as lists in the first place... this is definitely what you should do if my code that works on your sample data runs into issues on the larger data. tidyr::unnest may be able to help in that case.)

sapply(df, class)
#  disease      <18    19-25    26-64    46-64      >65     type 
# "factor"   "list"   "list"   "list"   "list"   "list" "factor" 

list_cols = sapply(df, is.list)

df[list_cols] = lapply(df[list_cols], unlist)

reshape2::melt(df, id.vars = c("disease", "type"))
#      disease           type variable       value
# 1  Barcelona Clinical Trial      <18 0.193103448
# 2  Barcelona     Real-World      <18 0.044534413
# 3  Barcelona Clinical Trial    19-25 0.041379310
# 4  Barcelona     Real-World    19-25 0.345748988
# 5  Barcelona Clinical Trial    26-64 0.448275862
# 6  Barcelona     Real-World    26-64 0.167611336
# 7  Barcelona Clinical Trial    46-64 0.034482759
# 8  Barcelona     Real-World    46-64 0.006477733
# 9  Barcelona Clinical Trial      >65 0.282758621
# 10 Barcelona     Real-World      >65 0.435627530
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294