3

I am instantiating a class in cython. I want to declare one of the instance which is an array of float values once by computing it with a given function and save it in a binary file using c functions. For the further call of my class if the input file does already exist, the instance would be declared by reading the array from the file otherwise it would compute it again.

import numpy as np
cimport numpy as np
cimport cython

from libc.stdio cimport FILE, fopen, fwrite, fscanf, fclose, fseek, SEEK_END, ftell, stdout, stderr
cdef extern from "math.h":
    double exp(double) nogil
    double log(double) nogil

cdef class Hit(object):
    cdef public double[::1] zs, Da
    cdef char* path

    def __cinit__(self, zs=None, path=None):
        if path is None:
           raise ValueError("Could not find a path to the file which contains the table of distances")
        else:
           self.path=path
        if zs is None:
           raise ValueError("You must give an array which contains the steps!")
        self.zs=zs
        cdef Py_ssize_t i, N
        N=len(self.zs)
        cdef FILE *ptr_fr
        cdef FILE *ptr_fw
        cdef double[::1] ptr_d = np.empty((N,))
        ptr_fr = fopen(self.path, "rb")
        ptr_fw = fopen(self.path, "wb")
        if (ptr_fr==NULL):
           print "I/O Error: cannot open file {}".format( self.path)  
           for i from N > i >= 0:
               ptr_d[i]=log(self.zs[i]+1.) /(1- self.zs[i])**0.5    
           if (ptr_fw == NULL):
               print "Unable to open file!\n"
           else:
               print "Opened file successfully for writing.\n"
               fwrite(<void*>&ptr_d[0], sizeof(double), N, ptr_fw)
               fclose(ptr_fw)
               self.Da = ptr_d

        else: 
           for i from N > i >= 0:
               fscanf(ptr_fr,"%f", &ptr_d[i])

           fclose(ptr_fr)
           self.Da = ptr_d

When I run my code for the second time the values that returns from reading a file to a pointer is correct, however I think the way I allocated the pointer to a memoryview has a problem since all the values in self.Da instance are zeros. Any suggestion ?!!

Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
Dalek
  • 4,168
  • 11
  • 48
  • 100
  • 1
    As a side note: Do you really want to do that? Cython is a great tool when you would like to speed up time critical code or interface with external C code. For everything else normal python is probably a much better choice. as it is less error prone and much more readable. – cel Apr 29 '15 at 17:38
  • @cel It is a small part of my code which is written in cython. I need to access to this array as fast as possible if it has been computed once. I tried with python I/O functions. I think using c and binary file would be the fastest option. – Dalek Apr 29 '15 at 17:44
  • One small point: scanf needs "%lf" when reading doubles (this is different to printf!). – DavidW Apr 29 '15 at 20:19
  • Also, is it worth looking at http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html (and related) numpy functions for loading and saving your data. They should be both quick and painless (and do the whole array in one go) – DavidW Apr 29 '15 at 20:25
  • @DavidW I updated my question and I used pointer, but there is problem where I assigned the value of the pointer to memortview. – Dalek Apr 29 '15 at 21:02
  • I didn't recommend using pointers to store your data! That seems a much more difficult way of doing things than using numpy arrays. Unfortunatly I don't have a computer with Cython on infront of me so I don't think I can help further. I'm a little surprised Christian's answer didn't work though. – DavidW Apr 29 '15 at 21:13
  • @DavidW the problems seems to be when the file in `path` doesn't exist previously. My answer works when the file is previously created. I'm looking why it is still not working in all cases. – Cristián Antuña Apr 29 '15 at 21:17
  • Actually - one further thought: fscanf makes no sense for binary files, right? – DavidW Apr 29 '15 at 21:22
  • fscanf reads from the stream and stores in the format you put. I wouldn't say that it doesn't make sens, but you have to be careful that you always know the format of the data you are reading. – Cristián Antuña Apr 29 '15 at 21:24
  • @DavidW I have fixed the bug of writing and reading an array to a file. It works perfectly now. On the other hand, the way I allocate the pointer to a memoryview still has a problem because the values in `self.Da` are almost zero. – Dalek Apr 30 '15 at 11:14

2 Answers2

6

The answer to my own question is:

import numpy as np
cimport numpy as np
cimport cython
from libc.stdio cimport FILE, fopen, fwrite, fscanf, fclose, fprintf, fseek, ftell, SEEK_END, rewind, fread
from numpy cimport float64_t
from libc.stdlib cimport malloc, free
cdef extern from "math.h":
    double exp(double) nogil
    double log(double) nogil

cdef class Hit(object):
    cdef public double[::1] zs,  Da
    cdef char* path

    @cython.boundscheck(False)
    @cython.cdivision(True)
    @cython.wraparound(False)
    @cython.nonecheck(False)
    def __cinit__(self, path=None, zs=None):
        if path is None:
           raise ValueError("Could not find a path to the file which contains the table of distances")
        else:
           self.path=path

        if zs is None:
           raise ValueError("You must give an array which contains the steps!")
        self.zs=zs

        cdef Py_ssize_t i, N, lSize
        N=len(np.ascontiguousarray(self.zs))
        print "Input file should have ",N
        cdef FILE *ptr_fr
        cdef FILE *ptr_fw
        cdef size_t result
        cdef double *ptr_d= <double *>malloc(N * sizeof(double))
        ptr_fr = fopen(self.path, "rb")        
        if (ptr_fr==NULL):
           print "I/O Error: cannot open file {}".format( self.path) 
           for i from N > i >= 0:
               ptr_d[i]=log(self.zs[i])
               print ptr_d[i]
           ptr_fw = fopen(self.path, "wb")
           if (ptr_fw == NULL):
               print "Unable to open file!\n"

           else:

               print "Opened file successfully for writing.\n"
               fwrite(ptr_d, sizeof(double), N, ptr_fw)
               fclose(ptr_fw)
               self.Da = np.asarray(<double[:N]>ptr_d)
        else: 
           fseek (ptr_fr , 0 , SEEK_END)
           lSize = ftell (ptr_fr)
           print lSize
           rewind (ptr_fr)
           result=fread(ptr_d,sizeof(double),lSize ,ptr_fr )
           for i from N > i >= 0:
               print ptr_d[i]
           fclose(ptr_fr)
           self.Da = np.asarray(<double[:N]>ptr_d)
        print np.ascontiguousarray(self.Da)

        free(ptr_d)
Dalek
  • 4,168
  • 11
  • 48
  • 100
  • Glad you found your answer; I tried mine and it didn't crash on my PC so I couldn't be any hel beyond that point. (it used the first version, with memoryviews insted of pointers). – Cristián Antuña Apr 30 '15 at 12:23
  • @ CristiánAntuña I don't know exactly why it crashed! Thanks for the help! – Dalek Apr 30 '15 at 12:30
4

You should change your line:

ptr_fw = fopen(self.path, "wb")

For

ptr_fw = fopen(self.path, "ab")

See http://www.cplusplus.com/reference/cstdio/fopen/

[w option] write: Create an empty file for output operations. If a file with the same name already exists, its contents are discarded and the file is treated as a new empty file.

While

[a option] append: Open file for output at the end of a file. Output operations always write data at the end of the file, expanding it. Repositioning operations (fseek, fsetpos, rewind) are ignored. The file is created if it does not exist.

The brackets are mine. You were deleting the content in path once you opened it. So the next time, you opened an empty file.

UPDATE: (I did both of these changes on your first implementation)

Plus, change your

fwrite(<void*>&ptr_d[0], sizeof(double), N, ptr_fw)

for:

       for i in range(N):
           fprintf(ptr_fw, "%lf\n", ptr_d[N-i-1])

You were reading formated but writting unformatted (remember to cimport fprintf from stdio, and the change above is also important).

  • Or move it to after "if ptr_fr==NULL:". This is obviously the right answer though. – DavidW Apr 29 '15 at 20:31
  • @CristiánAntuña there is something wrong with the way I use `fwrite` for the pointer. Because the file are basically empty even I used your answer. Moreover, I am not sure the way I allocated a pointer to a memoryview is correct or not?!! – Dalek Apr 29 '15 at 20:47
  • @CristiánAntuña The code crashed with the change you have suggested in the updated answer. The error is:`*** glibc detected *** python: free(): corrupted unsorted chunks: 0x000000000104ddc0 *** ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x7db26)[0x7fb6501d7b26] /lib/x86_64-linux-gnu/libc.so.6(fclose+0x155)[0x7fb6501c77a5] ./Test.so(+0x182a1)[0x7fb64ef862a1] ./Test.so(+0x19e48)[0x7fb64ef87e48] ` – Dalek Apr 30 '15 at 10:00