How to sample small set of images for training a model from a large images folder in r?

Question

I have a very large folder of images (train_dir), as well as a CSV file containing the class labels for each of those images(train_df). Because the data is huge, I'd like to take only a sample of images (say 25%) along with labels(train_df); How would I be doing this in R Programming?

My "train_dir" folder has around 150,000 images = ('1.png','2.png',....) and my CSV file looks something similar to CSV file - train_df

What would be the approach to go about making r-script that can do this?

Maybe `n<- nrow(train_df);i<- sample(n,n*0.25)` and then use the index `i` to subset `train_df` and select the corresponding image files. — Rui Barradas, May 02 '20 at 04:24
I can work around sampling my CSV file but I find it really challenging to split a folder of images. How do I go about doing that? — Horseman1901, May 02 '20 at 04:26
By to split a folder do you mean to move the files to another folder? Or to read just those files? — Rui Barradas, May 02 '20 at 04:28

score 0 · Answer 1 · answered May 02 '20 at 05:13

0

Something along the lines of the following code will

get an subset of the row numbers, to serve as an index into train_df;
Subset train_df, and get a sample of PNG filenames. Since column "id" is a factor, convert it to character.
To each filename, apply a read PNG function. In this case I have used png::readPNG, but others can be used in the same way.

The code then becomes the following.

perc  <- 0.25
n <- nrow(train_df)
i <- sample(n, n*perc)
png_filenames <- as.character(train_df[i, "id"])

png_files <- lapply(png_filenames, function(x){
  png::readPNG(x, native = TRUE)
})

answered May 02 '20 at 05:13

Rui Barradas

70,273
8
34
66

Thank you for your time but that's not exactly what I need. I just need to sample 1000 images to work with from the total 150,000 images in my 'train_dir' folder. – Horseman1901 May 02 '20 at 17:05
@Horseman1901 That's even simpler, instead of `perc<-0.25` do `m <- 1000` and use it where the code has `n*perc`. – Rui Barradas May 02 '20 at 17:40

How to sample small set of images for training a model from a large images folder in r?

1 Answers1