0

This way I wanted to ask a question about AWS Sagemaker. I must confess that I'm quite a newbee to the subject and therefor I was very happy with the SageMaker Canvas app. It works really easy and gives me some nice results.

First of all my model. I try to predict solar power production based on the time (dt), the AWS IoT Thingname (thingname), clouds percentage (clouds) and temperature (temp). I have a csv filled with data measured by IoT things

clouds + temp + dt + thingname => import

dt,clouds,temp,import,thingname
2022-08-30 07:45:00+02:00,1.0,0.1577,0.03,***
2022-08-30 08:00:00+02:00,1.0,0.159,0.05,***
2022-08-30 08:15:00+02:00,1.0,0.1603,0.06,***
2022-08-30 08:30:00+02:00,1.0,0.16440000000000002,0.08,***
2022-08-30 08:45:00+02:00,,,0.09,***
2022-08-30 09:00:00+02:00,1.0,0.17,0.12,***
2022-08-30 09:15:00+02:00,1.0,0.1747,0.13,***
2022-08-30 09:30:00+02:00,1.0,0.1766,0.15,***
2022-08-30 09:45:00+02:00,0.75,0.1809,0.18,***
2022-08-30 10:00:00+02:00,1.0,0.1858,0.2,***
2022-08-30 10:15:00+02:00,1.0,0.1888,0.21,***
2022-08-30 10:30:00+02:00,0.75,0.1955,0.24,***

In AWS SageMaker canvas I upload the csv and build the model. All is very easy and when I use the predict tab I upload a CSV where the import column is missing and containing API weather data for some future moment:

dt,thingname,temp,clouds
2022-09-21 10:15:00+02:00,***,0.1235,1.0
2022-09-21 10:30:00+02:00,***,0.1235,1.0
2022-09-21 10:45:00+02:00,***,0.1235,1.0
2022-09-21 11:00:00+02:00,***,0.1235,1.0
2022-09-21 11:15:00+02:00,***,0.12689999999999999,0.86
2022-09-21 11:30:00+02:00,***,0.12689999999999999,0.86
2022-09-21 11:45:00+02:00,***,0.12689999999999999,0.86
2022-09-21 12:00:00+02:00,***,0.12689999999999999,0.86
2022-09-21 12:15:00+02:00,***,0.1351,0.69
2022-09-21 12:30:00+02:00,***,0.1351,0.69
2022-09-21 12:45:00+02:00,***,0.1351,0.69

From this data SageMaker Canvas predicts some real realistic numbers, from which I assume the model is nicely build. So I want to move this model to my Greengrass Core Device to do predictions on site. I found the best model location using the sharing link to the Junyper notebook.

From reading in the AWS docs I seem to have a few options to run the model on an edge device:

  • Run the Greengrass SageMaker Edge component and run the model as a component and write an inference component
  • Run the SageMaker Edge Agent yourself
  • Just download the model yourself and do your thing with it on the device

Now it seems that SageMaker used XGBoost to create the model and I found the xgboost-model file and downloaded it to the device.

But here is where the trouble started: SageMaker Canvas never gives any info on what it does with the CSV to format it, so I have really no clue on how to make a prediction using the model. I get some results when I try to open the same csv file I used for the Canvas prediction, but the data is completely different and not realistic at all

# pip install xgboost==1.6.2
import xgboost as xgb

filename = f'solar-prediction-data.csv'
dpredict = xgb.DMatrix(f'{filename}?format=csv')
model = xgb.Booster()
model.load_model('xgboost-model')
result = model.predict(dpredict)
print('Prediction result::')
print(result)

I read that the column order matters, the CSV may not contain a header. But it does not get close to the SageMaker Canvas result.

I also tried using pandas:

# pip install xgboost==1.6.2
import xgboost as xgb
import pandas as pd

filename = f'solar-prediction-data.csv'
df = pd.read_csv(filename, index_col=None, header=None)

dpredict = xgb.DMatrix(df, enable_categorical=True)

model = xgb.Booster()
model.load_model('xgboost-model')
result = model.predict(dpredict, pred_interactions=True)
print('Prediction result::')
print('===============')
print(result)

But this last one always gives me following error:

ValueError: DataFrame.dtypes for data must be int, float, bool or category.  When
categorical type is supplied, DMatrix parameter `enable_categorical` must
be set to `True`. Invalid columns:dt, thingname

To be honest, I'm completely stuck and hope someone around here can give me some advice or clue on how I can proceed.

Thanks! Kind regards

Hacor

Hans Cornelis
  • 63
  • 1
  • 7

1 Answers1

0

Hacor, Canvas autoML creates artifacts, including python feature engineering code and the feature engineering model. You can access them for the best model, under the artifact tab.

Canvas artifacts

Canvas feature engineering python code (.py file) example

Danny
  • 1
  • 1
  • Answer includes only references and is not helping the user. Links could be just shared in the comments to the question. – KingJulian Sep 29 '22 at 21:09