-1

I tried to use great expectations for data quality purpose

I am running my jobs in AWS EMR cluster and I am trying to launch great expectations job on AWS EMR as well

I have bootstrap script for installation dependencies on a cluster. It looks like this

#!/bin/bash
sudo yes | sudo yum install python3-devel
sudo python3 -m pip install --upgrade pip
sudo python3 -m pip install cython

sudo python3 -m pip install boto3==1.26.37
sudo python3 -m pip install great-expectations==0.15.36

I saw that all dependencies was installed correctly based on log outputs, but then job started I got the following error

ImportError: this version of pandas is incompatible with numpy < 1.17.3
your numpy version is 1.16.5.
Please upgrade numpy to >= 1.17.3 to use this pandas version

I tried to uninstall numpy and install it manually via pip in bootstrap script like this but it didn't help

sudo python3 -m pip uninstall --yes numpy

I don't understand why it happens

Liu Piu
  • 35
  • 1
  • 1
  • 7

2 Answers2

0
sudo python3 -m pip install numpy==1.17.3
Ahmed Kolsi
  • 211
  • 2
  • 7
  • I tried to do the following `sudo python3 -m pip uninstall --yes numpy sudo python3 -m pip install numpy==1.23.5 sudo python3 -m pip install great-expectations==0.15.36` It didn't affect somehow the final result – Liu Piu Dec 27 '22 at 11:16
0

Usage of EMR of newer version solved problem.

Liu Piu
  • 35
  • 1
  • 1
  • 7
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 28 '22 at 13:47