Graphlab
is mostly used for computing tabular and graph based datasets, and have high scalability
and performance
. In graphlab.linear_regression.create
, graphlab
have inbuilt feature of understanding the type of data and giving most suitable method of linear regression
for optimizing results. For Example, for numeric data of target and feature both, most of the time, graphlab
takes Newtons Method
of linear regression. Similarly, depending on the dataset, understands the need and gives method accordingly.
Now, about preprocessing, graphlab
only takes SFrame
for learning that need to be parsed correctly before any learning. While creating an SFrame
, unprocessed and error creating data are always reflected and throws an error. So, in order to go through any learning, you need to have a clean data. If SFrame
accepts the data, and also your chosen target and feature for learning that you want, you are good to go but pre-processing
and cleaning data
is always recommended. Also, its always a good practice to do feature engineering
before any learning algorithm, and redefining data types before learning is always recommended for accuracy.
About your point on how data is treated in Graphlab
, I would say, it depends!. Some datasets are tabular and are treated accordingly and some in graph structure. Graphlab performs very well when comes to regression tree
and boosted classifiers
which follows decision tree
concept and are quite time and resource consuming in other libraries than graphlab
.
For me, graphlab
performed very well while creating recommendation engine where I had dataset of nodes and edges and boosted tree classifier
with 18 iterations too worked flawless in quite scalable time and I must say, even for tree structured data, graphlab
performs very well. I hope this answer helps.