0

I have a dataframe where I would like to remove specific rows. I would like to remove the row where there is the "Référence" word and the 3 rows under the "référence" row. See my example here.

I think I have to use grepl function.

Thank you for your help.

Max.

zx8754
  • 52,746
  • 12
  • 114
  • 209
MaxB
  • 139
  • 8

3 Answers3

0

From the example it seems like you want to remove rows with NAs. That's easily done by using na.omit:

df <- data.frame(
  x = c(NA, 1, 2, 3, 4),
  y = c(10, NA, 18, 22, NA)
)
df
   x  y
1 NA 10
2  1 NA
3  2 18
4  3 22
5  4 NA

Now omit all rows with missing values:

df2 <- na.omit(df)
df2
  x  y
3 2 18
4 3 22

Note however that na.omit removes not only cells with NA but the entire row!

Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
0

You could subset the data like so to remove all rows with the word Référence:

data <-data[!(data$column1 == "Référence"),]

Obviously, substitue data for the name of your dataframe your using.

Pryore
  • 510
  • 9
  • 22
  • Thank you.yes I would like to remove the row with the word "reference" but also the 3 rows under. Is it possible? Thank you. – MaxB Dec 13 '18 at 11:39
0

You should use grep, not grepl. When you use grep you get the row indexes that match the pattern, while with grepl you get a boolean vector. You could do:

rowIndexes = grep(x = df$col1, pattern = "refer")

df = df[-c(rowIndexes, rowIndexes+1, rowIndexes+2),]

Example:

> df
          a   b  c   d  e
1     00100  44  5  69 fr
2     refer  34 35   7 df
3  thisalso  46 15 167 as
4   thistoo  46 15 167 as
5     00100  11  5  67 uu
6     00100 563 25  23 tt
7     00100  44  5  69 fr
8     refer  34 35   7 df
9  thisalso  46 15 167 as
10  thistoo  11  5  67 uu
11    00100 563 25  23 tt
12    00100  44  5  69 fr
13    refer  34 35   7 df
14 thisalso  46 15 167 as
15  thistoo  11  5  67 uu
16    00100 563 25  23 tt
17    00100 563 25  23 tt
18    00100 563 25  23 tt

> rowIndexes = grep(x = df$col1, pattern = "refer")
> df = df[-c(rowIndexes, rowIndexes+1, rowIndexes+2),]

> df

       a   b  c  d  e
1  00100  44  5 69 fr
5  00100  11  5 67 uu
6  00100 563 25 23 tt
7  00100  44  5 69 fr
11 00100 563 25 23 tt
12 00100  44  5 69 fr
16 00100 563 25 23 tt
17 00100 563 25 23 tt
18 00100 563 25 23 tt

Generalization

If you want to remove N lines after o before a set of specific lines, do:

rowIndexes = grep(x = df$col1, pattern = "refer")
N = 2
indexesToRemove = sapply(rowIndexes, function(x){ x + (0:N) })
df = df[-indexesToRemove, ]

where N is an integer. If N is positive it will remove N rows after the lines with "refer". If N is negative, this will remove N previous rows.

R. Schifini
  • 9,085
  • 2
  • 26
  • 32
  • Thank you very much, this is exactly what I wanted! In the same way, how I could use this function to say "remove all rows UNTIL this word, or FROM this word"? Thank you. – MaxB Dec 14 '18 at 15:22