Validation check on final dataframe to flag irregular calls

Question

so I have a code (see below) which takes allelic discrimination files from BioRad software and runs to generate a final genotype variable.

All wells should have a final genotype except for A01, A02, B01, B02, C01, C02, D01 and D02 (listed as "NA")

The attached dataset has a sample in A10 which reads a blank and has a missing '' value. I need to add a portion of code to generate a new variable ('flag') which flags any wells with no value.

###Partial Code for converting call to allele variants###


###G1-1###

##Load in G1-1 file from PCR###

APOL_1_Allelic_Discrimination_G1_1 <- read.csv("admin_2019-03-17 08-04-00_BR007717_PLATE6_G1-1_SAMPLES41-80_3-17-2019 -  Allelic Discrimination Results_ADSheet.csv")
attach(APOL_1_Allelic_Discrimination_G1_1)
drops <- c("X","Sample","Type","RFU1","RFU2")
G1_1 <- APOL_1_Allelic_Discrimination_G1_1[ , !(names(APOL_1_Allelic_Discrimination_G1_1) %in% drops)]


G1_1 <- G1_1 %>% mutate(G1_1_1 = case_when(Call == "Allele 1" ~ "G1^{S342G}", Call == "Allele 2" ~ "+", Call == "Heterozygote" ~ "G1^{S342G}", Call == "No Call" ~ "Blank"),
                        G1_1_2 = case_when(Call == "Allele 1" ~ "G1^{S342G}", Call == "Allele 2" ~ "+", Call == "Heterozygote" ~ "+", Call == "No Call" ~ "Blank"))

G1_1$Call <- NULL

###G1-2###

##Load in G1-2 file from PCR###

APOL_1_Allelic_Discrimination_G1_2 <- read.csv("admin_2019-03-17 04-59-11_BR007717_PLATE5_G1-2_SAMPLES41-80_3-17-2019 -  Allelic Discrimination Results_ADSheet.csv")
attach(APOL_1_Allelic_Discrimination_G1_2)
drops <- c("X","Sample","Type","RFU1","RFU2")
G1_2 <- APOL_1_Allelic_Discrimination_G1_2[ , !(names(APOL_1_Allelic_Discrimination_G1_2) %in% drops)]

G1_2 <- G1_2 %>% mutate(G1_2_1 = case_when(Call == "Allele 1" ~ "+", Call == "Allele 2" ~ "G1^{I384M}", Call == "Heterozygote" ~ "G1^{I384M}", Call == "No Call" ~ "Blank"),
                        G1_2_2 = case_when(Call == "Allele 1" ~ "+", Call == "Allele 2" ~ "G1^{I384M}", Call == "Heterozygote" ~ "+", Call == "No Call" ~ "Blank"))

G1_2$Call <- NULL

###G2###

##Load in G2 file from PCR###

APOL_1_Allelic_Discrimination_G2 <- read.csv("admin_2019-03-17 01-41-46_BR007717_PLATE4_G2_SAMPLES41-80_3-17-2019 -  Allelic Discrimination Results_ADSheet.csv")
attach(APOL_1_Allelic_Discrimination_G2)
drops <- c("X","Sample","Type","RFU1","RFU2")
G2 <- APOL_1_Allelic_Discrimination_G2[ , !(names(APOL_1_Allelic_Discrimination_G2) %in% drops)]

G2 <- G2 %>% mutate(G2_1 = case_when(Call == "Allele 1" ~ "G2", Call == "Allele 2" ~ "+", Call == "Heterozygote" ~ "G2", Call == "No Call" ~ "Blank"),
                        G2_2 = case_when(Call == "Allele 1" ~ "G2", Call == "Allele 2" ~ "+", Call == "Heterozygote" ~ "+", Call == "No Call" ~ "Blank"))

G2$Call <- NULL

###Merge G1-1, G1-2 and G2 together###

G1 <- join(G1_1,G1_2,by="Well")

G1_G2 <- join(G1,G2,by="Well")

Dataset (dput)


structure(list(Well = structure(1:10, .Label = c("A01", "A02", 
"A03", "A04", "A05", "A06", "A07", "A08", "A09", "A10"), class = "factor"), 
    G1_1_1 = structure(c(2L, 2L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 
    1L), .Label = c("+", "Blank", "G1^{S342G}"), class = "factor"), 
    G1_1_2 = structure(c(2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), .Label = c("+", "Blank"), class = "factor"), G1_2_1 = structure(c(2L, 
    2L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 2L), .Label = c("+", "Blank", 
    "G1^{I384M}"), class = "factor"), G1_2_2 = structure(c(2L, 
    2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("+", "Blank"
    ), class = "factor"), G2_1 = structure(c(2L, 2L, 1L, 1L, 
    1L, 1L, 3L, 3L, 1L, 1L), .Label = c("+", "Blank", "G2"), class = "factor"), 
    G2_2 = structure(c(2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
    ), .Label = c("+", "Blank"), class = "factor"), Final.genotype.of.APOL1 = structure(c(NA, 
    NA, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 1L), .Label = c("", "G0/G0", 
    "G1^{GM}/G2"), class = "factor"), no.APOL1.Risk.Alleles = c(NA, 
    NA, 1L, 1L, 1L, 1L, NA, NA, 1L, NA), X1.APOL1.Risk.Alleles = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA), X2.APOL1.Risk.Alleles = c(NA, 
    NA, NA, NA, NA, NA, 1L, 1L, NA, NA)), row.names = c(NA, 10L
), class = "data.frame")

(1) Please try to reduce your problem, that's a lot of code that I suspect is not all relevant to this question. Do we really need to know all of the formation code just to answer the question of "how to deal with missingness"? (2) It's typically better (translation: easier for us) if your data is in a consumable format, not one that requires us to recreate it manually. Better alternatives include literal `data.frame(...)` or dump a relevant portion out with `dput(head(x,10))`. — r2evans, Jan 27 '20 at 22:06
@r2evans I have cut the code down and also used `dput` to show data in a consumable format — , Jan 27 '20 at 22:14
That's a good start, Jordan, can you update your sample data so that it includes an instance of `F05`? Yes, providing a small yet representative (of the question) dataset can be difficult, it's really helpful. Thanks! — r2evans, Jan 27 '20 at 22:21
@r2evans I have updated sample data to show an instance similar to F05 (now showing at A10) — , Jan 28 '20 at 15:54
Okay (do you need to update the text of the question to change from "F05" to "A10"?). Do you mean the empty string in `Final.genotype.of.APOL1`? Do you need something like `with(APOL_1_Allelic_Discrimination_G1_1, is.na(Final.genotype.of.APOL1) | !nzchar(Final.genotype.of.APOL1))`? — r2evans, Jan 28 '20 at 15:56
@r2evans I will change F05 to A10 and yes I need to flag when there is an empty string within Final.genotype.of.APOL1 — , Jan 28 '20 at 16:35
@r2evans could you take a look at this question - no one has responded to it -https://stackoverflow.com/questions/59989724/generating-summary-table-at-bottom-of-dataframe — , Jan 31 '20 at 23:56

Validation check on final dataframe to flag irregular calls

0 Answers0