1

I am trying to use "from itertools import islice" in order to read a number of lines at a time from a *.las file using the liblas module. (my goal is reading chunk-bychunk)

following the question: Python how to read N number of lines at a time

islice() can be used to get the next n items of an iterator. Thus, list(islice(f, n)) will return a list of the next n lines of the file f. Using this inside a loop will give you the file in chunks of n lines. At the end of the file, the list might be shorter, and finally the call will return an empty list.

I used the the following code:

from numpy import nonzero
from liblas import file as lasfile
from itertools import islice


chunkSize = 1000000

f = lasfile.File(inFile,None,'r') # open LAS
while True:
    chunk = list(islice(f,chunkSize))
    if not chunk:
        break
    # do other stuff

but i have this problem:

len(f)
2866390

chunk = list(islice(f, 1000000))
len(chunk)
**1000000**
chunk = list(islice(f, 1000000))
len(chunk)
**1000000**
chunk = list(islice(f, 1000000))
len(chunk)
**866390**
chunk = list(islice(f, 1000000))
len(chunk)
**1000000**

when the file f arrives in the end the islice restart to read the file.

Thanks for any suggestions and help. It's very appreciate

codeforester
  • 39,467
  • 16
  • 112
  • 140
Gianni Spear
  • 7,033
  • 22
  • 82
  • 131

2 Answers2

2

It seems like it would be easy enough to write a generator to yield n lines at a time:

def n_line_iterator(fobj,n):
    if n < 1:
       raise ValueError("Must supply a positive number of lines to read")

    out = []
    num = 0
    for line in fobj:
       if num == n:
          yield out  #yield 1 chunk
          num = 0
          out = []
       out.append(line)
       num += 1
    yield out  #need to yield the rest of the lines 
mgilson
  • 300,191
  • 65
  • 633
  • 696
  • Thanks but in "if num = n:" there is a problem File "", line 16 if num = n: ^ SyntaxError: invalid syntax – Gianni Spear Oct 08 '12 at 14:23
  • @Gianni -- Sorry, I was out of town for a week and apparently forgot how to code. That was an error. I've already updated and fixed that one. Let me know if you find any more. – mgilson Oct 08 '12 at 14:24
  • no problem and thanks!!! I have really bad moments with libals module. I can not read in chunks http://stackoverflow.com/questions/12769353/python-suggestions-to-improve-a-chunk-by-chunk-code-to-read-several-millions-of it's two days i am trying :( – Gianni Spear Oct 08 '12 at 14:29
  • Sorry, I'm confused -- Does the updated code I posted fail in some way? – mgilson Oct 08 '12 at 14:33
2

Change the sourcecode of file.py that belongs to the liblas package. Currently __iter__ is defined as (src on github)

def __iter__(self):
    """Iterator support (read mode only)

      >>> points = []
      >>> for i in f:
      ...   points.append(i)
      ...   print i # doctest: +ELLIPSIS
      <liblas.point.Point object at ...>
    """
    if self.mode == 0:
        self.at_end = False
        p = core.las.LASReader_GetNextPoint(self.handle)
        while p and not self.at_end:
            yield point.Point(handle=p, copy=True)
            p = core.las.LASReader_GetNextPoint(self.handle)
            if not p:
                self.at_end = True
        else:
            self.close()
            self.open()

You see that when file is at end it is closed and opened again, so iteration starts again at the beginning of the file.

Try to remove the last else block after the while, so the right code for the method should be:

def __iter__(self):
    """Iterator support (read mode only)

      >>> points = []
      >>> for i in f:
      ...   points.append(i)
      ...   print i # doctest: +ELLIPSIS
      <liblas.point.Point object at ...>
    """
    if self.mode == 0:
        self.at_end = False
        p = core.las.LASReader_GetNextPoint(self.handle)
        while p and not self.at_end:
            yield point.Point(handle=p, copy=True)
            p = core.las.LASReader_GetNextPoint(self.handle)
            if not p:
                self.at_end = True
halex
  • 16,253
  • 5
  • 58
  • 67
  • Thanks halex, i have really bad moments with liblas. yesterday and today I am trying to read in chunck but lasfile.File type is breaking all iterator conventions. I am following all exemples in google but always there is a new problem. see please: http://stackoverflow.com/questions/12769353/python-suggestions-to-improve-a-chunk-by-chunk-code-to-read-several-millions-of – Gianni Spear Oct 08 '12 at 14:35
  • This is a very interesting design. However, you should be able to exhaust the iterator, I still don't understand why it loops around rather than requiring a new loop to restart it... – mgilson Oct 08 '12 at 14:36
  • they create libals module in order to read *.las file in Python. *.las file is a special format to store "laser data" called LiDAR http://en.wikipedia.org/wiki/LIDAR. The las file is a ASPRS LIDAR data exchange format where ASPRS is the American Society of Photogrammetry and Remote Sensing – Gianni Spear Oct 08 '12 at 14:41
  • example: yesterday i tried for i in xrange(0,len(f), chunkSize): chunk = f[i:i+chunkSize] the normal way i use to read text file in chunk but nothing. When arrive in the end of the file i get a message error – Gianni Spear Oct 08 '12 at 14:44