Suppose I have a data file that has entries that look like this
0.00,2015-10-21,1,Y,798.78,323793701,6684,0.00,Q,H2512,PE0,1,0000
I would like to use this as an input to an mxnet model (basic Feed Forward Multi-layer Perecptron). A single input record has data types, in the order show above
float,date,int,categorical,float,int,int,float,categorical,categorical,categorical,int, float
each record is a meaningful representation of a specific entity. how do I represent this sort of data to mxnet? also, to complicate things a bit, suppose I want to one-hot encode the categorical columns? And what if each record has these fields, in the order show, but repeated multiple times in some cases such that each record may have a different length?
The docs are great for the basic cases where you have input data that is all of the same type and can all be loaded into the same input without any transformation but how to handle this case?
Update: adding some additional details. to keep this as simple as possible, let's say I just want to feed this into a simple network. something like:
my $data = mx->symbol->Variable("data");
my $fc = mx->symbol->FullyConnected($data, num_hidden => 1);
my $softmax=mx->symbol->SoftmaxOutput(data => $fc, name => "softmax");
my $module = mx->mod->new(symbol => $softmax);
in the simple case of the data being all one type and not requiring much in the way of pre-processing I then could just do something along the lines of
$module->fit(
$train_iter,
eval_data => $eval_iter,
optimizer => "adam",
optimizer_params=>{learning_rate=>0.001},
eval_metric => "mse",
num_epoch => 25
);
where $train_iter
is a simple NDArray iterator over the training data. (Well, with the Perl API it's not exactly an NDArray, but has complete parity with that interface so it is conceptually identical).