0

I have the following file with URLs in it. The Idea is to download image from URL, get a 6 color palette, get the color names and percentages and bind them all together in a list alongside product number. But I get the "too many files" error.

library(readxl)
library(jpeg)
library(scales)
library(plotrix)
library(gridExtra)
library(dplyr)
library(data.table)
dataset = read_excel("C:/Temp/Product.xlsx", sheet = "All")
datalist = list()
nRowsDf <- nrow(dataset)
avector <- as.vector(dataset$URL)
varenummer <- as.vector(dataset$Varenr)
for (i in 1:nRowsDf) {  
  tryCatch({
#Convert this from Data.frame to Vector
Sku <- as.vector(varenummer[[i]])
download.file(avector[[i]], paste(Sku,".jpg" ,sep = ""), mode = "wb")
painting <- readJPEG(paste(Sku,".jpg" ,sep = ""))

dimension <- dim(painting)
painting_rgb <- data.frame(
  x = rep(1:dimension[2], each = dimension[1]),
  y = rep(dimension[1]:1, dimension[2]),
  R = as.vector(painting[,, 1]), #slicing array into RGB Channels
  G = as.vector(painting[,, 2]),
  B = as.vector(painting[,, 3])
)


k_means = kmeans(painting_rgb[, c("R", "G", "B")], algorithm = "Lloyd", centers = 6, iter.max = 300)
test = (sapply(rgb(k_means$centers), color.id))

Color = lapply(test, `[[`, 1)
Values = k_means$size
Percentage = k_means$size / sum(k_means$size)
Final = do.call(rbind, Map(data.frame, Color = lapply(test, `[[`, 1), Values = k_means$size, ProductNumber = Sku, Percentage = Percentage))
Final$i <- i #  iteration 
datalist[[i]] <- Final # add iteration to list
big_data = rbindlist(datalist)
#grid.table(big_data)
write.table(big_data, file = "myDF.csv", sep = ",", col.names = TRUE, append = TRUE)


#R = Final[with(Final, order(-Percentage)),]
}, error = function(e) { closeAllConnections() })
closeAllConnections() 

}

Code stops after downloading around 266 unique JPEG images.

This code downloads only JPG files, if another file type is return it will simply ignore it.

Error :

Error in file(file, ifelse(append, "a", "w")) : 
cannot open the connection
In addition: Warning message:
In file(file, ifelse(append, "a", "w")) :
cannot open file 'myDF.csv': Too many open files

If I remove the trycatch i get these:

Error in download.file(avector[[i]], "image.jpg", mode = "wb") : 
cannot open destfile 'image.jpg', reason 'Too many open files'
  • It is too late at night to read this block of code. but my suggestion is to explicitly close each connection after reading the file in. See `?file` for a bunch of details on this. A second thought is that version 3.5 has encountered some issues that may be related to your problem. If I remember correctly, these potentially related issues have been solved in the development version. R 3.5.1 is scheduled for release in the first week of July. – lmo Jun 08 '18 at 00:43
  • If you don't need to save the files off, you could use `tempfile(fileext=".jpg")` instead of image.jpg to avoid possibly overwriting the new file with dummy names. I don't know if that would solve your specific problem or not. – Mark Jun 08 '18 at 13:26
  • It looks as though some part of the code is opening a connection but forgetting to close it. I don't see that in the code you posted, so it might be a bug in one of the R functions. A likely candidate is that `file()` call in the error message. What does `traceback()` show about where it's being called? – user2554330 Jun 08 '18 at 13:34
  • Actually, it probably happened earlier, but is suppressed by your `tryCatch`. Try running without that, or at least print the message and some diagnostic info when it catches an error. – user2554330 Jun 08 '18 at 13:42
  • > traceback() 2: file(file, ifelse(append, "a", "w")) 1: write.table(big_data, file = "myDF.csv", sep = ",", col.names = T, append = T) > – Patricio Lobos Jun 08 '18 at 13:44
  • All is pointing to a file connection error, does anyone know how to alter the part of the code so it closes all open files in the loop? – Patricio Lobos Jun 08 '18 at 20:22
  • @user3460688 Have you tried removing `tryCatch`? That's hiding the real problem. You should close connections when you're done with them, not en masse. – user2554330 Jun 09 '18 at 00:13
  • Yes, but still Error in download.file(avector[[i]], "image.jpg", mode = "wb") : cannot open destfile 'image.jpg', reason 'Too many open files' – Patricio Lobos Jun 09 '18 at 08:43
  • The commands that open files in your loop are `download.file` and `readJPEG`. If I do a loop like yours that calls those 500 times, I don't get the error, but my system might have a different open file limit than yours. Can you try it with just one or the other in the loop, and nothing else? – user2554330 Jun 11 '18 at 11:29
  • I can do batches of 200 max. You were lucky with 500. My problem is that set size is 56000 URLs. and that means to open and close R 280 times.... that will take ….time! – Patricio Lobos Jun 11 '18 at 12:39

1 Answers1

0

The code had an error or better said an unnecessary step, that keep open connections until it reach the limit impose by "file".

By simply removing the iteration steps and rbind datalist, it run flawless.

Below the modified version.

for (i in 1:nRowsDf) {
tryCatch({
    #Convert this from Data.frame to Vector

    Sku <- as.vector(varenummer[[i]]) #for testing use 23406
    download.file(avector[[i]], paste(Sku, ".jpg", sep = ""), mode = "wb")
    # painting <- readJPEG(paste(Sku,".jpg" ,sep = ""))

    painting = load.image(paste(Sku, ".jpg", sep = ""))
    dimension <- dim(painting)
    painting_rgb <- data.frame(
  x = rep(1:dimension[2], each = dimension[1]),
  y = rep(dimension[1]:1, dimension[2]),
  R = as.vector(painting[,, 1]), #slicing our array into three
  G = as.vector(painting[,, 2]),
  B = as.vector(painting[,, 3])
)


    k_means = kmeans(painting_rgb[, c("R", "G", "B")], algorithm = "Lloyd", centers = 6, iter.max = 300)
test = (sapply(rgb(k_means$centers), color.id))

    Color = lapply(test, `[[`, 1)
Values = k_means$size
Percentage = k_means$size / sum(k_means$size)
Final = do.call(rbind, Map(data.frame, Color = lapply(test, `[[`, 1), Values =     k_means$size, ProductNumber = Sku, Percentage = Percentage))
    #Final$i <- i # maybe you want to keep track of which iteration produced it?
    #datalist[[i]] <- Final # add it to your list
    #big_data = rbindlist(datalist)
    #grid.table(big_data)
    write.table(Final, file = "myDF.csv", sep = ",", col.names = TRUE, append = TRUE)


    #R = Final[with(Final, order(-Percentage)),]
}, error = function(e) { closeAllConnections() })
 closeAllConnections()

}