Getting basic stats from Np.array within a for loop in python

Question

I don't have a lot of python experience and I'm trying something rather complicated for me, so excuse my messy code. I have a few arrays that were generated with rasterio from raster layers (tif), and ultimately I want to get some basic statistics from each raster layer and append it to a data frame. I'm trying to get it as automated as possible since I have a lot of layer to go through. another obstacle was getting the column name to change according to each raster. I managed to work almost everything out, the problem is when I insert it into a for loop, instead of stats values, I get this: <built-in method values of dict object at 0x00.. would appreciate help solving that.

import rasterio
from osgeo import gdal
import numpy as np
import pandas as pd

#open all files **I have a lot of folders like that one to open
#Grifin data read
Gr_1A_hh_path = r"E:\SAOCOM\1A1B\Images\Griffin\130122\Source\Data\gtc-acqId0000705076-a-sm9-2201150146-hh-h.tif"
Gr_1A_hh = rasterio.open(Gr_1A_hh_path)

Gr_1A_vv_path = r"E:\SAOCOM\1A1B\Images\Griffin\130122\Source\Data\gtc-acqId0000705076-a-sm9-2201150146-vv-h.tif"
Gr_1A_vv = rasterio.open(Gr_1A_vv_path)

Gr_1A_vh_path = r"E:\SAOCOM\1A1B\Images\Griffin\130122\Source\Data\gtc-acqId0000705076-a-sm9-2201150146-vh-h.tif"
Gr_1A_vh = rasterio.open(Gr_1A_vh_path)

Gr_1A_hv_path = r"E:\SAOCOM\1A1B\Images\Griffin\130122\Source\Data\gtc-acqId0000705076-a-sm9-2201150146-hv-h.tif"
Gr_1A_hv = rasterio.open(Gr_1A_hv_path)

#reading all the rasters as arrays
array_1A_hh= Gr_1A_hh.read()
array_1A_vv= Gr_1A_vv.read()
array_1A_vh= Gr_1A_vh.read()
array_1A_hv= Gr_1A_hv.read()

#creating a dictionary so that each array would have a name that would be used as column name
A2 = {
   "HH":array_1A_hh,
   "VV":array_1A_vv,
   "VH":array_1A_vh,
   "HV":array_1A_hv}

df= pd.DataFrame(index=["min","max","mean","medien"])
for name, pol in A2.items():
   for band in pol:
       stats = {
       "min":band.min(),
       "max":band.max(),
       "mean":band.mean(),
       "median":np.median(band)}
       df[f"{name}"]=stats.values

OUTPUT:
df
                                                      HH  ...                                                 HV
min     <built-in method values of dict object at 0x00...  ...  <built-in method values of dict object at 0x00...
max     <built-in method values of dict object at 0x00...  ...  <built-in method values of dict object at 0x00...
mean    <built-in method values of dict object at 0x00...  ...  <built-in method values of dict object at 0x00...
medien  <built-in method values of dict object at 0x00...  ...  <built-in method values of dict object at 0x00...

You're storing the method instead of calling it. Something like `stats.values()` should already fix that. Note that GDAL also already has a `GetStatistics` method for each band, that will get you the min, max, mean & std very fast. — Rutger Kassies, Jul 08 '22 at 07:46
@Rutger Kassies, what do you mean? where should I use the stats.value()? Thanks for your comment. — Naama, Jul 10 '22 at 08:13
there's only 1 place where you use `df[f"{name}"]=stats.values`. That's why the result shows the method of the dict, instead of the contents of the dict. ` — Rutger Kassies, Jul 11 '22 at 06:55

paime · Accepted Answer · 2022-07-11T06:07:20.587

Considering you have a dict of images:

import numpy as np
import pandas as pd

vmin, vmax = 0, 255
C, H, W = 2, 64, 64

images_names = ["HH", "VV", "VH", "HV"]
images = {
    im_name: np.random.randint(vmin, vmax, size=(C, H, W))
    for im_name in images_names
}

And a bunch of functions to compute stats on a per band basis:

stats_functions = {
    "min": lambda band: band.min(),
    "max": lambda band: band.max(),
    "mean": lambda band: band.mean(),
    "median": lambda band: np.median(band),
}

You can first construct a dict of statistics:

images_stats = {
    im_name: {
        band_idx: {
            stat_name: stat_func(band)
            for stat_name, stat_func in stats_functions.items()
        }
        for band_idx, band in enumerate(im)
    }
    for im_name, im in images.items()
}

And then convert it to a pandas DataFrame:

images_stats_df = pd.concat(
    {
        im_name: pd.DataFrame(im_stats)
        for im_name, im_stats in images_stats.items()
    },
    axis="columns",
)

Which gives:

>>> images_stats_df
                HH                      VV                      VH                     HV
                 0           1           0           1           0          1           0           1
min       0.000000    0.000000    0.000000    0.000000    0.000000    0.00000    0.000000    0.000000
max     254.000000  254.000000  254.000000  254.000000  254.000000  254.00000  254.000000  254.000000
mean    127.070557  126.082764  126.483643  127.737061  127.270996  128.89502  128.814209  124.610352
median  129.000000  127.000000  126.000000  127.000000  127.000000  130.00000  129.000000  122.000000

Edit: What constructing the images dict might look like in your particular case:

images_paths = {
    "HH": "path/to/image_HH.tif",
    "VV": "path/to/image_VV.tif",
    "VH": "path/to/image_VH.tif",
    "HV": "path/to/image_HV.tif",
}

images = {
    im_name: rasterio.open(im_path).read()
    for im_name, im_path in images_paths
}

That look great. Thank you. However, I did run into treble when trying to change your `np.random.randint(vmin, vmax, size=(C, H, W))` in the first part to my arrays. I need to pull a different array for each im_name in the dictionary, but when I tried something like: `for pol in A1: Images = { im_name: pol for im_name in pol_name}` I get the same array for every key (im_name). any suggestions how to overcome that? — Naama, Jul 10 '22 at 12:13
From your line `im_name: pol for im_name ...` it is expected that you assign the same `pol` for every `im_name`. Hasn't your `A2` dict the same structure than my `images` dict ? If not please provide a way for me to reproduce the `A2` data because obviously I can't read your files. — paime, Jul 10 '22 at 19:03
I managed to overcome that by first creating a dict of my arrays with names as keys: `A2 = { "HH":array_1A_hh, "VV":array_1A_vv, "VH":array_1A_vh, "HV":array_1A_hv}` and then use the first part of your answer as: `Images = { im_name: pol for im_name, pol in A2.items()}` And now it works perfect. thank you so much! — Naama, Jul 11 '22 at 05:50
Alright so `A2` was good already because because the last line of code is effectively just making a copy of `A2` into `Images`. I added an edit with a way for you to construct the `images` dict from file paths. — paime, Jul 11 '22 at 06:05

Getting basic stats from Np.array within a for loop in python

1 Answers1