0

I have large file 9600x7000 pixel jpg file I am trying to see if I can do a edge detection. I tried loading the large (25Mb) file using:

from PIL import Image
image = Image.open("C:\\pathtofile\\test-tac.jpg")
image.show()

However python interpreter will crash. I am using Pycharm running Python 2.7.

So, I used a GDAL (used for large GEO refererencing files) to load the file. It will load the file into memory without any problem.

#reference http://www.gdal.org/gdal_tutorial.html
import gdal
from gdalconst import *

dataset = gdal.Open("C:\\pathtofile\\test-tac.jpg", GA_ReadOnly )
if dataset is None:
   print "error loading file in gdal"

This will load file. However, I am trying to run following edge detection on it:

from matplotlib import pyplot as plt

from skimage import data
from skimage.feature import corner_harris, corner_subpix, corner_peaks
from skimage.transform import warp, AffineTransform
from skimage.draw import ellipse

# running corner Harris on the image object to detect image corners. 
#(reference http://scikit-image.org/docs/dev/auto_examples/plot_corner.html)
coords = corner_peaks(corner_harris(image), min_distance=3) #5
coords_subpix = corner_subpix(image, coords, window_size=13)

plt.gray()
plt.imshow(image, interpolation='nearest')
plt.plot(coords[:, 1], coords[:, 0], '.b', markersize=9)  # dots
plt.plot(coords_subpix[:, 1], coords_subpix[:, 0], '+r', markersize=15) # +
plt.plot(coords_subpix[:, 1][1], coords_subpix[:, 0][1], '*r', markersize=20)  #X_Point1=Subpix[:,1][1], Y_Point1=Subpix[:,0][1]

N=len(coords_subpix[:,0])
labels = ['point{0}'.format(i) for i in range(N)]

#Label corners in image
for label, x, y in zip(labels, coords_subpix[:,1], coords_subpix[:,0]):
   plt.annotate(label,
    xy=(x,y), xytext = (-10,10),
    textcoords = 'offset points', ha = 'right', va = 'bottom',
    bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
    arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))

   plt.axis((0, 9672, 7272, 0))            # (Y_start, Y_Stop, X_Stop, X_Start) ((0, 9672, 7272, 0))
   plt.show()

This would work if I generate image using following code:

 tform = AffineTransform(scale=(1.3, 1.1), rotation=1, shear=0.8,
                    translation=(210, 50))
 image = warp(data.checkerboard(), tform.inverse, output_shape=(350, 350))
 rr, cc = ellipse(310, 175, 10, 100)
 image[rr, cc] = 1
 image[180:230, 10:60] = 1
 image[230:280, 60:110] = 1

My problem is I am not understanding Python much about the data format from the 'image' variable versus dataset variable generated by GDAL. My end goal is to be able to run edge detection on large (10000x7000) pixel jpg image using Python scikit-image library. If there is better way that GDAL to read large jpg images I am open to it.

If I set:

image=dataset

and run it, I get following error:

coords = corner_peaks(corner_harris(image), min_distance=3) #5
 File "C:\Python27\lib\site-packages\skimage\feature\corner.py", line 171, in corner_harris
 Axx, Axy, Ayy = _compute_auto_correlation(image, sigma)
 File "C:\Python27\lib\site-packages\skimage\feature\corner.py", line 54, in _compute_auto_correlation
 if image.ndim == 3:
 AttributeError: 'Dataset' object has no attribute 'ndim'

This error message points that I am not understanding the datatype between dataset and image variables.

type(dataset)

Gives:

<class 'osgeo.gdal.Dataset'>

and

type(image)

Gives:

(350,350) float64.

For your large source file use: http://www.lib.utexas.edu/maps/tpc/txu-pclmaps-oclc-22834566_a-2c.jpg to give it a try.

user914425
  • 16,303
  • 4
  • 30
  • 41

2 Answers2

2

All scikit-image algorithms require Numpy arrays as input. You therefore need to convert your dataset variable to an ndarray. The easiest way to do this is to use the gdal plugin to read the file (or look at the plugin source -- it shows how to do the conversion).

Stefan van der Walt
  • 7,165
  • 1
  • 32
  • 41
  • If GDAL can open the file with `ds = gdal.Open()` you can simply do `image = ds.ReadAsArray()`. It will give you a Numpy array, i think the layout will be `[bands, y, x]`, so perhaps some reshaping is needed. – Rutger Kassies Dec 23 '13 at 11:39
  • @Rutger Kassies Please provide a working example with Numpy array and gdal.Open(). I am new thins python image processing. – user914425 Nov 03 '15 at 19:53
0
import cv2
image = cv2.imread('txu-pclmaps-oclc-22834566_a-2c.jpg')

opencv can load the image without a problem. Although i suspect loading the image was not the problem. The first step of your algorithm was peeking at around 6 gigs of memory usage. So if you're not on a 64bit version of python it will crash. Also there looks to be something wrong with your code. When i tried to run it, it failed at the second function.

ValueError: operands could not be broadcast together with shapes (0,13) (13,13)

This was with the (8003, 10859, 3) image. I also tried with just one channel with the same error message.

M4rtini
  • 13,186
  • 4
  • 35
  • 42
  • I tried this on my machine with 8GB of memory. I am getting MemoryError. I need help using the GDAL library for large files. How do I check if my python is 32 bit vs 64 bit? – user914425 Dec 21 '13 at 01:10
  • open the command line and type python. The first lines printed out should tell you the python version and architecture. – M4rtini Dec 21 '13 at 12:43