1

I have a data frame with both character and numeric columns. In some of the numeric columns, I would like to test if the value is grater than 1 and if so, I would like to change it to 1.

I have managed to turn all different from 0 values to 1 but that includes characters and a column I want to leave untouched as well...

Example data frame:

> species<- c("Pinus halepensis", "Majorana syriaca", "Iris
> palaestina","Velezia fasciculata") 
> rarness<- c("F", "CC", "F", "O")
> endangered<-c(0,0,0,6.8) plot1<- c(1,2,1,1) plot2<- c(0,1,0,0)
> df<-as.data.frame(cbind(species, rarness, endangered, plot1, plot2))

This does not work for some reason:

Test<-df %>%
  mutate(plot1 = ifelse(plot1 > 1, 1, plot1))

This works but changes characters as well

df[df>1]<-1

I would like the columns "plot1" and "Plot2" to consist only of characters 0 and 1 while others remain the same.

Thanks!

Idan

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Idan
  • 43
  • 6
  • Welcome to SO! Why are you saying the mutate option doesn't work? Do you get an error message or is the outcome not what you expected? Keep in mind you need `library(dplyr)` first. – Sven May 26 '19 at 08:26
  • Hi, I used library(tidyverse) bus did get a warning message: "In Ops.factor(plot1, 1) : ‘>’ not meaningful for factors". besides, if this did work, if would apply only to plot1 whereas I need to apply this to many columns in my real dataset... Thanks, Idan. – Idan May 26 '19 at 10:41

2 Answers2

0

Mostly, that is because you have factors in your dataframe columns. You need to change the columns to numeric first before converting them to 1/0.

library(dplyr)

df %>%
 mutate_at(vars(plot1, plot2), ~as.integer(as.numeric(as.character(.)) > 1))

#              species rarness endangered plot1 plot2
#1    Pinus halepensis       F        0.0     0     0
#2    Majorana syriaca      CC        0.0     1     0
#3     Iris palaestina       F        0.0     0     0
#4 Velezia fasciculata       O        6.8     0     0

Or similar using base R would be

df[4:5] <- lapply(df[4:5], function(x) as.integer(as.numeric(as.character(x)) > 1))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

You can also make a copy before applying you condition. You need to specify the columns to process only those column. If you have just two columns, you can do it manualy like this:

# Create copy
test <- df

# Update specific column
test$plot1[(as.numeric(test$plot1)) > 1]  <- 1
test$plot2[(as.numeric(test$plot2)) > 1]  <- 1
test
#               species rarness endangered plot1 plot2
# 1    Pinus halepensis       F          0     1     0
# 2    Majorana syriaca      CC          0     1     1
# 3     Iris palaestina       F          0     1     0
# 4 Velezia fasciculata       O        6.8     1     0

Generalisation:

Now, suppose you want to process a set of columns. You can re-use the previous tips in a function that you apply to all columns. I suggest you to have a look at the apply family. Here a nice explanation. In our task, lapply seems appropriated (doc).

# Your dataframe
species<- c("Pinus halepensis", "Majorana syriaca", "Iris palaestina","Velezia fasciculata") 
rarness<- c("F", "CC", "F", "O")
endangered<-c(0,0,0,6.8)
plot1<- c(1,2,1,1)
plot2<- c(0,1,0,0)
df<- as.data.frame(cbind(species, rarness, endangered, plot1, plot2))

# Extend the dataframe with new random columns for the example
df2 <- data.frame(replicate(10,sample(-5:5,4,rep=TRUE)))
df[names(df2)] <-  df2
df 
#               species rarness endangered plot1 plot2 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# 1    Pinus halepensis       F          0     1     0 -4 -2  4  4  0  5 -1 -5  3   2
# 2    Majorana syriaca      CC          0     2     1  5 -3 -2  3  3 -1  0  5  2   4
# 3     Iris palaestina       F          0     1     0 -1  2 -2  5  3  2  3  3 -1  -3
# 4 Velezia fasciculata       O        6.8     1     0 -5 -3  4  5  5 -4  4 -5 -4  -3


# Create copy
test <- df

# Function to apply at each column
set_threshold <- function(col){
  col <- as.numeric(col);
  col[col > 1]  <- 1;
  return (col);
}

# Select all column names after the index 4
col_names <- tail(names(test),-3)
col_names
# [1] "plot1" "plot2" "X1"    "X2"    "X3"    "X4"    "X5"    "X6"    "X7"    "X8"    "X9"    "X10"  

# Process each column
test[col_names] <- lapply(test[col_names], FUN = set_threshold)
test
#               species rarness endangered plot1 plot2 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# 1    Pinus halepensis       F          0     1     1 -4 -2  1  1  0  1 -1 -5  1   1
# 2    Majorana syriaca      CC          0     1     1  1 -3 -2  1  1 -1  0  1  1   1
# 3     Iris palaestina       F          0     1     1 -1  1 -2  1  1  1  1  1 -1  -3
# 4 Velezia fasciculata       O        6.8     1     1 -5 -3  1  1  1 -4  1 -5 -4  -3

I use tail to select all the columns names after the index 4 (e.g. remove all element until index 3) (doc). A discussion on how to subset a list.

Alexandre B.
  • 5,387
  • 2
  • 17
  • 40
  • 1
    Thanks for your answer! But in fact, I have many columns (~35) in my original data set. is there a way to address all the relevant columns using a range or their number? – Idan May 26 '19 at 09:18
  • How do you want to select the columns to process ? All the columns after `plot1 plot2` ? – Alexandre B. May 26 '19 at 09:43
  • all the columns after endangered (so plot1 and plot2 are also included) – Idan May 26 '19 at 10:18
  • Look at the update. Here, I select all columns after a column index with the `tail` function. Feel free to design your own function to select the appropriate columns to process. – Alexandre B. May 26 '19 at 10:38
  • All worked besides for reassigning the new values back in to the df when using my data. I got this error warning: "Error in `[<-.data.frame`(`*tmp*`, col_names, value = list(GeneralObs = c(1, : duplicate subscripts for columns". Do you know what went wrong? – Idan May 26 '19 at 11:29
  • The code above works on my computer without warning. When does this warning is raised ? – Alexandre B. May 26 '19 at 11:35
  • when I try it with my original data. Is there a way to upload a csv file? – Idan May 26 '19 at 11:55
  • If you have problem with reading `csv` file, have a look at the `read.csv` function as suggested in [this discussion](https://stackoverflow.com/questions/13265153/how-do-i-import-a-csv-file-in-r/13265177). Feel free to open a new question if you're stuck and accept one of this post if that answer. – Alexandre B. May 26 '19 at 12:07