12

I have a csv file as shown below that I read into R using read.csv, where column C has 12/30 empty values. I want to work out the max of each column, but the R function "max" returns "NA" when used on column C. How do I get R to ignore the empty/NA values, I cannot see an "rm.na" in read.csv?

data<-data.frame(read.csv("test.csv"))

data

A   B   C   
1   5   6
15  2   3
8   3   3
7   5   4
5   3   8
4   1   4
5   3   4
2   2   10
4   3   8
6   5   2
1   4   4
10  8   4
0   6   0
7   3   8
5   3   3
13  12  13
6   0   0
0   0   2
5   2   NA
7   3   NA
1   8   NA
11  1   NA
1   4   NA
0   7   NA
4   5   NA
3   10  NA
2   0   NA
6   4   NA
0   19  NA
1   5   NA

> max(C)
[1] NA
smci
  • 32,567
  • 20
  • 113
  • 146
moadeep
  • 3,988
  • 10
  • 45
  • 72

4 Answers4

16
    data<-na.omit(data)

then

    max(data)

If you do not wish to change the data frame then

    max(na.omit(data))
Anurag Priyadarshi
  • 1,113
  • 9
  • 17
9

you have two options that i can think of

 apply(data,2,max,na.rm=TRUE); # this will remove the NA's from columns that contain them

OR

apply(na.omit(data),2,max); ## this will remove the NA rows from the data frame and then calculate the max values
Aditya Sihag
  • 5,057
  • 4
  • 32
  • 43
1

I'd suggest to remove the NA after reading like others have suggested. If, however, you insist on reading only the non-NA lines you can use the bash tool linux to remove them and create a new file:

grep -Ev file_with_NA.csv NA > file_without_NA.csv

If you run linux or mac, you already have this tool. On windows, you have to install MinGW or Cygwin to get the tools.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
1

You should be able to use

max(x,na.rm=TRUE)
NathanOliver
  • 171,901
  • 28
  • 288
  • 402
Jess2332
  • 11
  • 1