I have 2 different data frames of the following format:
DF1 -
v1 v2 v3 v4 v5
a 1 2 +
b 5 2 + +
c 5 2 + +
d 4 3 + +
e 1 5 + +
f 3 5
g 4 2
h 3 1
i 5 5 + +
DF2 -
v1 v2 v3 v4
a 1 2 +
b 5 2 + +
c 5 2 +
d 4 3 +
e 1 5 +
f 3 5
g 4 2
h 3 1
i 5 5 +
My script gives a scatter plot of v1 & v2, but first I remove rows that have at least one "+" in v3-v4 or v3-v5.
My data frames can be bigger with more v1-v2 pairs, but always have either the v3-v4 or v3-v5 columns with "+". I adjust the code manually to specify columns to plot and which rows to remove depending on the DF format I am working on.
It works well but I wanted to make the script more interactive as follows:
# Select v3-v4 or v3-v5 via interactive gui to give vector of column headers.
remove.vars.vector <- select.list(names(DF), # Select columns as vector of column header names via interactive gui.
multiple = TRUE, # Can choose multiple columns.
title = "Choose variables to remove from data set", # Title on gui.
graphics = TRUE) # Allow launch of gui.
# Return columns from DF with this vector of column headers.
remove.vars.subset <- DF[remove.vars.vector]
# Return rows that have at least one "+" in v3-v4 or v3-v5.
remove.vars.subset.+ <- subset(DF, remove.vars.subset == "+")
# Removes all rows that contain >=1 NA.
complete.data.+ <- remove.vars.subset.+[complete.cases(remove.vars.subset.+), ]
# Combine by rows "complete.data.+" with DF.
combo.list <- rbind(DF,complete.data.+)
# Remove duplicate rows from combined data frame.
complete.data <- combo.list[!duplicated(combo.list, fromLast = FALSE) & !duplicated(combo.list, fromLast = TRUE),]
Problem: The above code doesn't completely strip the data frame of rows that contain at least one "+" in v3-4 or v3-5. The problem appears to be these lines:
# Return rows that have at least one "+" in v3-v4 or v3-v5.
remove.vars.subset.+ <- subset(DF, remove.vars.subset == "+")
I also get a number of rows at the end with only NA in every cell hence complete.cases in the next line of code.
The final data frame therefore still contains some rows with "+" in v3-4 or v3-5.
Question:
Is there a better way to subset rows in a data frame using a vector of column headers that may contain "+" in their rows?
Thank you in advance.
EDIT - 09/08/2016 - 18:54 I just noticed something that I didn't clarify about my data frames. Some of the rows don't have "+" in v3-v4 or v3-v5. These are the rows that I eventually want to keep so I can plot the scatter. I've edited data frames accordingly. I'm just looking at answers to try and understand them. I'm quite new to R still.