4

I'm trying to read all the files (+10.000) in a directory, but when i've processed about 1400 files I get a 'too many open files' error. I've added an explicit call to the garbage collector, but that doesn't seem to do much for my problem. I checked the source for ioutil package and ReadFile uses defer file.Close() internally (as expected). So what is going wrong here?

    const TEMPLATE = `{ "source_db" : "CDARCHIEF", "doc_type" : "%s", "referentie" : "%s", "bestandsnaam" : "%s", "tekst" : "%s" }`
const MAPPING = `{ ... }`

var DIR, DOCTYPE, URL string

func init() {
    flag.StringVar(&DIR, "d", "./", "de directory met de ge-ocrde bestanden")
    flag.StringVar(&DOCTYPE, "t", "AG", "document type [ AG, CO, NN ]")
    flag.StringVar(&URL, "url", "...", "url voor de juiste index")
}

func main() {
    flag.Parse()
    fmt.Println("CD Archive Importer")
    importDocuments()
}

func importDocuments() {
    logfile, _ := os.Create("./importer.log")
    defer logfile.Close()

    files, _ := ioutil.ReadDir(DIR)
    error_counter := 0

    for i, file := range files {
        if math.Mod(float64(i), 400.0) == 0.0 {
            runtime.GC()
            fmt.Println("Running garbage collector")
        }

        fmt.Printf("Importing ( %d / %d ) [ errors: %d ]\r", i+1, len(files), error_counter)

        contents, err := ioutil.ReadFile(DIR + "/" + file.Name())
        if err != nil {
            error_counter = error_counter + 1
            logfile.WriteString(fmt.Sprintf("[ERROR/IO] : %s | %s\n", file.Name(), err))
            continue
        }

        contents_string := strings.Replace(string(contents), "\n", " ", -1)
        contents_string = strings.Replace(contents_string, "\"", " ", -1)
        contents_string = strings.Replace(contents_string, "\\", " ", -1)

        referentie := strings.Trim(file.Name(), ".txt")
        message := strings.NewReader(fmt.Sprintf(TEMPLATE, DOCTYPE, referentie, file.Name(), contents_string))

        resp, error := http.Post(URL, "application/json", message)
        if error != nil {
            error_counter = error_counter + 1
            logfile.WriteString(fmt.Sprintf("[ERROR/NET] : %s | %s | %s\n", file.Name(), resp.Status, error))
            continue
        }
            defer resp.Body.Close()

        if resp.StatusCode != 201 {
            body, _ := ioutil.ReadAll(resp.Body)
            error_counter = error_counter + 1
            logfile.WriteString(fmt.Sprintf("[ERROR/ES] : %s | %s | %s\n", file.Name(), resp.Status, string(body)))
        }

    }

    fmt.Println("\nDone!")
}

I know that there is a somewhat similar question from about 2 years ago, but that didn't have a useful answer for my issue.

commesan
  • 526
  • 5
  • 12
  • thanks for pointing that out @phihag. That was a slight oversight in preparing the code for the question. My original problem still exists though. – commesan Apr 29 '14 at 11:13
  • Mmm, I can't reproduce this problem. When you run [this example program](https://gist.github.com/phihag/11397245), do you get the same error? – phihag Apr 29 '14 at 11:18
  • 1
    What platform are you on? Are you sure your REST related code is not creating any file descriptors? – Dmitri Goldring Apr 29 '14 at 11:28
  • When i ran my own example, it also didn't give an error. So there's something else going wrong. I thought it would be in the example part, because i'm not opening any file else where. I've edited the example code with the whole program. Still can't figure out what's going wrong. – commesan Apr 29 '14 at 11:49
  • I've added the REST part to the question @DmitriGoldring but I don't think I'm creating massive amounts of file descriptors. Or am I overlooking something? – commesan Apr 29 '14 at 12:01
  • 1
    You're not closing your `resp.Body`. See the [http](http://golang.org/pkg/net/http/) docs, second example in the overview. – Dmitri Goldring Apr 29 '14 at 12:10
  • I had typed resp.Close() and gave an error so I thought resp didn't need to be closed. Didn't look at the docs careful enough to see I needed to use resp.Body.Close(). However, with that change I still get the same error @DmitriGoldring. – commesan Apr 29 '14 at 12:26
  • 4
    Ahh, the defer won't help as you aren't returning from the function until the loop is done. Try doing a direct `resp.Body.Close()` on the line after your `ioutil.ReadAll`. – Dmitri Goldring Apr 29 '14 at 12:41
  • Was just trying without the defer. Success! Thanks for help @DmitriGoldring. – commesan Apr 29 '14 at 12:48

1 Answers1

0

You may want to consider using filepath.Walk. I have successfully used it on 10k+ files without trouble. Alternatively you could dig into the source and see if they do anything different with regards to resource management.

Also the for loop seems a bit baroque, you can just use integers and the % operator.

for i := 0; i < 1000000; i += 1 {
    if i % 5000 == 0 {
        fmt.Println(i)
    }
}
Dmitri Goldring
  • 4,176
  • 2
  • 24
  • 28
  • Thanks for the `%` operator tip. I'm just getting my feet wet writing go. Too much deep diving in the packages made overlook the obvious method. – commesan Apr 29 '14 at 11:56
  • Though the error was in not closing the resp.Body correctly. Using `filepath.Walk` would have got me up and running quicker since the `defer` would have work. And since Dmitri helped me to fix this in the comments i'll mark this answer as accepted. Credit where credit's due. – commesan Apr 29 '14 at 12:58