1

I have a very similar question to this: How to identify which columns are not “NA” per row in a dataframe?

I only want to have the results in multiple columns and I need to have the actual values aswell. Its important that it also works when the values are not numeric, but characters! I have another df where there are stings instead of numbers!

     AFA  AFI  AII  AMA  AMI  AMU  BFA  BFI  BFU  BII
 1: 0.79   NA   NA 0.58   NA   NA   NA   NA 0.75   NA
 2:   NA   NA   NA   NA   NA 0.78   NA -0.5   NA   NA
 3:   NA   NA   NA   NA   NA   NA 0.79 -0.5   NA   NA
 4:   NA   NA   NA   NA   NA   NA   NA -0.5   NA   NA
 5:   NA   NA   NA   NA 0.63   NA   NA   NA   NA   NA
 6:   NA   NA   NA   NA   NA   NA 0.83   NA   NA   NA
 7: 0.63   NA   NA   NA   NA   NA   NA   NA   NA 0.82
 8:   NA   NA   NA   NA 0.63   NA   NA   NA   NA   NA
 9:   NA   NA 0.54 0.59   NA   NA   NA   NA   NA   NA
10:   NA 0.51   NA   NA   NA   NA   NA   NA   NA   NA

Output:

   V1_Code V1_Val V2_Code V2_Val V3_Code V3_Val
1:     AFA   0.79     AMA   0.58     BFU   0.75
2:     AMU   0.78     BFI   -0.5      NA     NA
3:     BFA   0.79     BFI   -0.5      NA     NA
       and so on....

This is my df:

    structure(list(AFA = c(0.79, NA, NA, NA, NA, NA, 0.63, NA, NA, 
    NA), AFI = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.51), AII = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, 0.54, NA), AMA = c(0.58, NA, NA, 
    NA, NA, NA, NA, NA, 0.59, NA), AMI = c(NA, NA, NA, NA, 0.63, 
    NA, NA, 0.63, NA, NA), AMU = c(NA, 0.78, NA, NA, NA, NA, NA, 
    NA, NA, NA), BFA = c(NA, NA, 0.79, NA, NA, 0.83, NA, NA, NA, 
    NA), BFI = c(NA, -0.5, -0.5, -0.5, NA, NA, NA, NA, NA, NA), BFU = c(0.75, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA), BII = c(NA, NA, NA, NA, 
NA, NA, 0.82, NA, NA, NA)), row.names = c(NA, -10L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x000001fc6b791ef0>)
Kimster
  • 23
  • 3

3 Answers3

3
library(dplyr)
library(tidyr)

mydf |>
  mutate(id = row_number()) |> 
  pivot_longer(-id, names_to = "Code", values_to = "Val") |> 
  drop_na() |> 
  group_by(id) |> 
  mutate(col_num = row_number()) |> 
  ungroup() |> 
  pivot_wider(id_cols = id, values_from = c(Code, Val), names_from = col_num,
              names_glue = "V{col_num}_{.value}") |> 
  select(-id) |> 
  relocate(sort(tidyselect::peek_vars()))
    
# A tibble: 10 x 6
   V1_Code V1_Val V2_Code V2_Val V3_Code V3_Val
   <chr>    <dbl> <chr>    <dbl> <chr>    <dbl>
 1 AFA       0.79 AMA       0.58 BFU       0.75
 2 AMU       0.78 BFI      -0.5  NA       NA   
 3 BFA       0.79 BFI      -0.5  NA       NA   
 4 BFI      -0.5  NA       NA    NA       NA   
 5 AMI       0.63 NA       NA    NA       NA   
 6 BFA       0.83 NA       NA    NA       NA   
 7 AFA       0.63 BII       0.82 NA       NA   
 8 AMI       0.63 NA       NA    NA       NA   
 9 AII       0.54 AMA       0.59 NA       NA   
10 AFI       0.51 NA       NA    NA       NA   
Phil
  • 7,287
  • 3
  • 36
  • 66
  • This solution, although closer to what the author wrote, is not very good. It seems that the intention of the author is to get rid of the `NA` value and to further analyze and process the data. Converting the data into a simple 16x2 data frame will allow further simple transformations, e.g. using `group_by` and `summarise`. – Marek Fiołka Aug 28 '21 at 19:08
  • Both of you did a good job. Phil's answer is excellent as he also addresses the second part as shown in the output of OP. To get the same output as Marek Fiotka , Phils 4 rows of code will provide the same result. All in all, both of you very good done with +1 for both. Personally Phil's answer is best! – TarJae Aug 28 '21 at 19:19
3

Here is an other approach:

  1. Basic concept is to concatenate the column names into a new column Code and the same for the values to Val .
  2. Then split the concatenated values with strsplit and unnest to bring them into own rows.
  3. So far this could be done shorter with pivot_longer as implemented already by Phil and Marek Fiotka.
  4. Assign new id_group value with mutate(group_id = row_number()) after grouping by id
  5. Then pivot_wider with glueing the names.
library(tidyverse)

    df %>% 
        mutate(id = row_number(),
               across(-id, ~case_when(!is.na(.) ~ cur_column()), .names = 'name_{col}'),
               across(1:10, as.character)) %>%
        unite(Code, starts_with('name'), na.rm = TRUE, sep = ' ') %>% 
        unite(Val, AFA:BII, na.rm = TRUE, sep = " ") %>% 
        mutate(across(-id, ~strsplit(as.character(.), " "))) %>% 
        unnest(cols = c(Code, Val)) %>% 
        group_by(id) %>% 
        mutate(group_id = row_number()) %>% 
        pivot_wider(id_cols = id, values_from = c(Code, Val), names_from = group_id,
                    names_glue = "V{group_id}_{.value}") %>% 
        ungroup() %>% 
        select(V1_Code, V1_Val, V2_Code, V2_Val, V3_Code, V3_Val, -id)

Output:

   V1_Code V1_Val V2_Code V2_Val V3_Code V3_Val
   <chr>   <chr>  <chr>   <chr>  <chr>   <chr> 
 1 AFA     0.79   AMA     0.58   BFU     0.75  
 2 AMU     0.78   BFI     -0.5   NA      NA    
 3 BFA     0.79   BFI     -0.5   NA      NA    
 4 BFI     -0.5   NA      NA     NA      NA    
 5 AMI     0.63   NA      NA     NA      NA    
 6 BFA     0.83   NA      NA     NA      NA    
 7 AFA     0.63   BII     0.82   NA      NA    
 8 AMI     0.63   NA      NA     NA      NA    
 9 AII     0.54   AMA     0.59   NA      NA    
10 AFI     0.51   NA      NA     NA      NA   
TarJae
  • 72,363
  • 6
  • 19
  • 66
1

Do it this way:

  1. turn it into tibble
  2. Convert to long form
  3. Filter using the is.na function
  4. Continue as you like

Code below

library(tidyverse)
df = structure(list(AFA = c(0.79, NA, NA, NA, NA, NA, 0.63, NA, NA, NA), 
                    AFI = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.51), 
                    AII = c(NA,NA, NA, NA, NA, NA, NA, NA, 0.54, NA), 
                    AMA = c(0.58, NA, NA, NA, NA, NA, NA, NA, 0.59, NA), 
                    AMI = c(NA, NA, NA, NA, 0.63, NA, NA, 0.63, NA, NA), 
                    AMU = c(NA, 0.78, NA, NA, NA, NA, NA, NA, NA, NA), 
                    BFA = c(NA, NA, 0.79, NA, NA, 0.83, NA, NA, NA, NA), 
                    BFI = c(NA, -0.5, -0.5, -0.5, NA, NA, NA, NA, NA, NA), 
                    BFU = c(0.75, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                    BII = c(NA, NA, NA, NA, NA, NA, 0.82, NA, NA, NA)), 
               row.names = c(NA, -10L), class = c("data.table","data.frame"))

df %>% tibble() %>% 
  pivot_longer(everything(), names_to = "key", values_to = "val") %>% 
  filter(!is.na(val))

output

# A tibble: 16 x 2
   key     val
   <chr> <dbl>
 1 AFA    0.79
 2 AMA    0.58
 3 BFU    0.75
 4 AMU    0.78
 5 BFI   -0.5 
 6 BFA    0.79
 7 BFI   -0.5 
 8 BFI   -0.5 
 9 AMI    0.63
10 BFA    0.83
11 AFA    0.63
12 BII    0.82
13 AMI    0.63
14 AII    0.54
15 AMA    0.59
16 AFI    0.51
Marek Fiołka
  • 4,825
  • 1
  • 5
  • 20
  • Hi, thank you for your effort. But I really need to keep the same number of rows and add the results in the same row where they have been before. It might not make sense here in this example, but this is not the whole dataframe. – Kimster Aug 28 '21 at 20:01