1

I have data like this:

[   {
        "name": "Apple",
        "price": 1,
        "type": "Food"
    },
    {
        "name": "Apple",
        "price": 0.90,
        "type": "Food"
    },
    {
        "name": "Apple",
        "price": 1000,
        "type": "Computer"
    },
    {
        "name": "Apple",
        "price": 900,
        "type": "Computer"
    }
]

Using the Great Expectations automatic profile, a valid range for price would be 0.90 to 1,000. Is it possible to have it slice on the type dimension, so food would be 0.90 to 1 and computer would be 900 to 1000? Or would I need to transform the data first using dbt? I know the column that will create the dimension, but I don't know the particular values.

Also, same question on differences between rows. Like if they had a timestamp, instead of 900 to 1000, it validates -100 for the change in value.

steve76
  • 302
  • 2
  • 9
  • How many types do you have that you'd need to group by? 2,10,100? – sgdata Jun 06 '22 at 15:26
  • You've tagged dbt; are you using Great Expectations in Python, or the dbt port, https://github.com/calogica/dbt-expectations? – tconbeer Jun 06 '22 at 19:19
  • @tconbeer ge in python. dbt is part of the DAG – steve76 Jun 07 '22 at 00:36
  • @sgdata I don't know. It comes from a feed that I poll periodically, and I'm looking for drastic changes. Perhaps it would be best to do some more transforming, and create a table of percentage change. – steve76 Jun 07 '22 at 02:14

1 Answers1

0

I used this approach to first load the data in a pandas data frame:

https://discuss.greatexpectations.io/t/how-can-i-use-the-return-format-unexpected-index-list-to-select-row-from-a-pandasdataset/70/2

steve76
  • 302
  • 2
  • 9