how can I load a single tif image in parts into numpy array without loading the whole image into memory?

Question

so There is a 4GB .TIF image that needs to be processed, as a memory constraint I can't load the whole image into numpy array so I need to load it lazily in parts from hard disk. so basically I need and that needs to be done in python as the project requirement. I also tried looking for tifffile library in PyPi tifffile but I found nothing useful please help.

Consider using `pyvips` - it is very frugal on memory. Maybe add the `vips` tag to attract the right people. Also try and be more explicit about your processing. — Mark Setchell, Dec 04 '19 at 19:55

jcupitt · Accepted Answer · 2019-12-09T13:16:09.540

pyvips can do this. For example:

import sys
import numpy as np
import pyvips

image = pyvips.Image.new_from_file(sys.argv[1], access="sequential")

for y in range(0, image.height, 100):
    area_height = min(image.height - y, 100)
    area = image.crop(0, y, image.width, area_height)
    array = np.ndarray(buffer=area.write_to_memory(),
                       dtype=np.uint8,
                       shape=[area.height, area.width, area.bands])

The access option to new_from_file turns on sequential mode: pyvips will only load pixels from the file on demand, with the restriction that you must read pixels out top to bottom.

The loop runs down the image in blocks of 100 scanlines. You can tune this, of course.

I can run it like this:

$ vipsheader eso1242a-pyr.tif 
eso1242a-pyr.tif: 108199x81503 uchar, 3 bands, srgb, tiffload_stream
$ /usr/bin/time -f %M:%e ./sections.py ~/pics/eso1242a-pyr.tif
273388:479.50

So on this sad old laptop it took 8 minutes to scan a 108,000 x 82,000 pixel image and needed a peak of 270mb of memory.

What processing are you doing? You might be able to do the whole thing in pyvips. It's quite a bit quicker than numpy.

score 1 · Answer 2 · answered Dec 09 '19 at 11:39

1

import pyvips
img = pyvips.Image.new_from_file("space.tif", access='sequential')
out = img.resize(0.01, kernel = "linear")
out.write_to_file("resied_image.jpg")

if you want to convert the file to other format have a smaller size this code will be enough and will help you do it without without any memory spike and in very less time...

answered Dec 09 '19 at 11:39

Sudhanshu Shivam

11
1

I downloaded a file from of 1.6 GB and it works without any spike in memory usage – Sudhanshu Shivam Dec 09 '19 at 11:40
also you don't load it to numpy before conversion that will be better – Sudhanshu Shivam Dec 09 '19 at 11:41

how can I load a single tif image in parts into numpy array without loading the whole image into memory?

2 Answers2