0

I'm beginning to play with GeoPySpark and am implementing an example notebook.

I successfully retrieved the images

!curl -o /tmp/B01.jp2 http://sentinel-s2-l1c.s3.amazonaws.com/tiles/32/T/NM/2017/1/4/0/B01.jp2
!curl -o /tmp/B09.jp2 http://sentinel-s2-l1c.s3.amazonaws.com/tiles/32/T/NM/2017/1/4/0/B09.jp2
!curl -o /tmp/B10.jp2 http://sentinel-s2-l1c.s3.amazonaws.com/tiles/32/T/NM/2017/1/4/0/B10.jp2

Here is the script:

import rasterio
import geopyspark as gps
import numpy as np

from pyspark import SparkContext

conf = gps.geopyspark_conf(master="local[*]", appName="sentinel-ingest-example")
pysc = SparkContext(conf=conf)

jp2s = ["/tmp/B01.jp2", "/tmp/B09.jp2", "/tmp/B10.jp2"]
arrs = []

for jp2 in jp2s:
    with rasterio.open(jp2) as f: #CRASHES HERE
        arrs.append(f.read(1))

data = np.array(arrs, dtype=arrs[0].dtype)
data

The script crashes where I placed the marker here, with the following error:

RasterioIOError: '/tmp/B01.jp2' not recognized as a supported file format.

I copy-pasted the example code exactly, ad in the Rasterio docs it even uses .jp2 files in examples.

I'm using the following version of Rasterio, installed with pip3. I do not have Anaconda installed (messes up my Python environments) and do not have GDAL installed (it refuses to, that would be the topic of another question if it is my only solution)

Name: rasterio
Version: 1.1.0
Summary: Fast and direct raster I/O for use with Numpy and SciPy
Home-page: https://github.com/mapbox/rasterio
Author: Sean Gillies
Author-email: sean@mapbox.com
License: BSD
Location: /usr/local/lib/python3.6/dist-packages
Requires: click-plugins, snuggs, numpy, click, attrs, cligj, affine
Required-by: 

Why does it refuse to read .jp2 files? Is there maybe a way to convert them to something usable? Or do you know of any example files similar to these ones in an acceptable format?

Jessica Chambers
  • 1,246
  • 5
  • 28
  • 56
  • If you do not have GDAL installed then that's the problem because RasterIO is basically a wrapper around GDAL. Confirm GDAL is really not installed: `>>> gdal.VersionInfo()`. – LuisTavares Oct 30 '19 at 11:16
  • @LuísTavares I can confirm, I get this output: `NameError: name 'gdal' is not defined` – Jessica Chambers Oct 30 '19 at 13:59
  • 1
    Now that's really the topic of another question! GDAL should be installed before Rasterio. Follow the installation guidelines ( I'm not sure you are using Linux or Mac) https://rasterio.readthedocs.io/en/stable/installation.html#dependencies – LuisTavares Oct 30 '19 at 14:17
  • @LuísTavares I had a feeling it would be.. I did try to install it but encountered many errors. I'll try a different route for now though, thanks :) – Jessica Chambers Oct 30 '19 at 14:40

1 Answers1

1

I was stuck in the same situation.

I used the pyvips package and it's resolved.

import pyvips

image = pyvips.Image.new_from_file("000240.jp2") image.write_to_file("000240.jpg")
Edward Ji
  • 745
  • 8
  • 19