Q-1. How to change data of a corpus to appropriate format for training with 'caret' package?
First of all, i would like to give you some environments for this question and i will be show you where i am stuck.
Environments
This is corpus that is called
rt
. (R Code)
require(tm)
require(tm.corpus.Reuters21578) # to load data
data(Reuters21578)
rt<-Reuters21578
And the training Document-Term-Matrix is created from training corpus called
dtmTrain
. (R Code)
dtmTrain <- DocumentTermMatrix(rtTrain)
I have totally 10 classes for this project. The classes are in the metadatas of each document.
c("earn","acq","money-fx","grain","crude","trade","interest","ship","wheat","corn")
I have created a data frame from rt which has (documents x classes). It is called
docLabels
.
Docs earn acq money-fx grain crude trade interest ship wheat corn
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0
5 0 0 0 1 0 0 0 0 1 1
6 0 0 0 1 0 0 0 0 1 1
I assume that everything is clear so far.
Problem
I have a document-term-matrix which has datas and a data frame which has classes as you can see. Eventually, How can i merge these two data objects for training with 'caret' package?
Q-2. How to train multiclass data with 'caret' package?
If we change the data appropriately, after that, how to train the data with caret package?
This is from caret package documentation.
## S3 method for class 'formula'
train(form, data, ..., weights, subset, na.action, contrasts = NULL)
So, what should be the form ?