4

I have import my dataset into h2o flow, I have one column which is categorical type, I wanna convert this into numerical data type.

If I use pandas for this task I'll do like this,

df['category_column'] = df['category_column'].astype('category')
df['category_column'] = df['category_column'].apply(lambda x: x.cat.codes)

How to do this in h2o flow,

I tried following,

  1. while parsing data i changed Data type to numeric from enum but data shows · like this.
  2. I tried convert to numeric option, But it didn't work as I wish.

I don't know whether I'm going in right direction or not. Please help me to solve this issue.

Update on question as suggested:

Why GLM forced me to use numerical column?

Error evaluating cell

My dataset looks like this:

enter image description here

When I use GLM to build model and, I is my response_column i'm getting following error

Error calling POST /3/ModelBuilders/glm with opts {"model_id":"glm-e2ed0066-636c-4c71-bf8...

ERROR MESSAGE: Illegal argument(s) for GLM model: glm-e2ed0066-636c-4c71-bf8c-04525eb05002. Details: ERRR on field: _response: Regression requires numeric response, got categorical. For more information visit: http://jira.h2o.ai/browse/TN-2

Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111
  • The accepted answer seems to be showing the opposite of what was asked (numeric to enum, not enum to numeric-of-the-category-codes)? (Though I cannot think of a case where numeric-of-the-category-codes would be better than having it as the enum type, which is maybe why it cannot be done from Flow?) – Darren Cook Jun 21 '18 at 07:12
  • @DarrenCook - You are right. But when I try to use GLM model it won't accepts enum type. that's why I would like to convert this enum into numeric. – Mohamed Thasin ah Jun 21 '18 at 07:15
  • Can you show some examples of your data? I wonder if your question is actually: "H2O has mistakenly recognized my numeric column as enum when importing?" (If the data is genuinely categorical then maybe the question should be: "I'm being forced to use a GLM on categorical data, what are my options?") – Darren Cook Jun 21 '18 at 07:28
  • @DarrenCook - Updated to the question I was using simple IRIS dataset for model. – Mohamed Thasin ah Jun 21 '18 at 07:53

2 Answers2

3

if you are using H2O's python api you can convert numeric columns to enum using .asfactor() for example df['my_colummn'] = df['my_colummn'].asfactor()

In flow after you import the dataset you will see a data type drop-down menu next to each column name where you can convert the data type to enum by selecting enum from the drop-down menu. You can also do this after you have parsed the dataset when you view the data; there is a hyperlink within each row that you can click on to convert the data type from numeric to enum.

please see the documentation for more details: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/flow.html#parsing-data

Lauren
  • 5,640
  • 1
  • 13
  • 19
2

To run GLM on categorical data, set the family to "multinomial" (or "binomial" when there are only two classes).

enter image description here

Darren Cook
  • 27,837
  • 13
  • 117
  • 217