I want to write a function that will create n random samples of a data set without replacement.
In this example I am using the iris data set. The iris data set has 150 observations and say I want 10 samples.
My attempt:
#load libraries
library(dplyr)
# load the data
data(iris)
head(iris)
# name df
df = iris
# set the number of samples
n = 10
# assumption: the number of observations in df is divisible by n
# set the number of observations in each sample
m = nrow(df)/n
# create a column called row to contain initial row index
df$row = rownames(df)
# define the for loop
# that creates n separate data sets
# with m number of rows in each data set
for(i in 1:n){
# create the sample
sample = sample_n(df, m, replace = FALSE)
# name the sample 'dsi'
x = assign(paste("ds",i,sep=""),sample)
# remove 'dsi' from df
df = df[!(df$row %in% x$row),]
}
When I run this code I get what I want. I get the random samples named ds1,ds2,...,ds10.
Now when I try to turn it into a function:
samplez <- function(df,n){
df$row = rownames(df)
m = nrow(df)/n
for(i in 1:n){
sample = sample_n(df, m, replace = FALSE)
x = assign(paste("ds",i,sep=""),sample)
df = df[!(df$row %in% x$row),]
}
}
Nothing happens when I execute 'samplez(iris,10)'. What am I missing?
Thanks