0

In a comlex dataframe I am having a column with a net recalled salary inclusive NAs that I want to exclude plus a column with the year when the study was conducted ranging from 1992 to 2010, more or less like this:

q32 pgssyear
2000 1992
1000 1992
NA   1992
3000 1994
etc.

If I try to draw a boxplot like:

boxplot(dataset$q32~pgssyear,data=dataset, main="Recalled Net Salary per Month (PLN)",
    xlab="Year", ylab="Net Salary") 

it seems to work, however NAs might distort the calculations, so I wanted to get rid of them:

boxplot(na.omit(dataset$q32)~pgssyear,data=dataset, main="Recalled Net Salary per Month (PLN)",
    xlab="Year", ylab="Net Salary") 

Then I get a warning message that the length of pgsyear and q32 do not match, most likely cause I removed NAs from q32, so I tried to shorten the pgsyear, so that it does not include the rows that correspond to NAs from the q32 column:

   pgssyearprim <- subset(dataset$pgssyear, dataset$q32!= NA )

however then the pgsyearprim gets treated as a factor variable:

pgssyearprim
factor(0)       

and I get the same warning message if I introduce it to the boxplot formula...

Levels: 1992 1993 1994 1995 1997 1999 2002 2005 2008 2010
DatamineR
  • 10,428
  • 3
  • 25
  • 45
Asiack
  • 47
  • 8

1 Answers1

0

Of course they wouldn't ... you removed some of the data only from the LHS with na.omit(dataset$q32)~pgssyear. Instead use !is.na(dataset$q32) as a subset argument

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Somehow creating a separate dataset without NAs in q32 (dataset_n) and introducing it into the boxplot command worked best. – Asiack Nov 16 '14 at 14:35