0

I have a task wherein I need to predict a continuous variable, odometer reading based on text field that has the issues faced by customer. This field is not a drop down menu but is updated using customer's verbatim. So I need to predict odometer reading based on the text field that has problems faced by customers. For ex:

**Text**                     **Odometer Reading**
Clutch problem               20,000 
Axle Issue                   150,000

Edit:

I am building a linear model using unigram. But I get this warning when I am performing data pre-processing:

> corp <- Corpus(VectorSource(ISSUES$CUSTOMER_VOICE))
> 
> corp <- tm_map(corp,tolower)
Warning message:
In tm_map.SimpleCorpus(corp, tolower) : transformation drops documents
> corp <- tm_map(corp,removePunctuation)
Warning message:
In tm_map.SimpleCorpus(corp, removePunctuation) :
transformation drops documents
> corp <- tm_map(corp,removeWords,stopwords('english'))
Warning message:
In tm_map.SimpleCorpus(corp, removeWords, stopwords("english")) :
transformation drops documents
> corp <- tm_map(corp,stemDocument)
Warning message:
In tm_map.SimpleCorpus(corp, stemDocument) : transformation drops documents

Could someone please tell me how to fix this warning.

Karthik S
  • 11,348
  • 2
  • 11
  • 25
  • You should narrow this question down, and show what efforts at coding you have made for the first part. Question about which algorithm to use for analytics are out of scope for SO. Find another venue for that. – IRTFM Dec 14 '18 at 06:59

1 Answers1

1

It is just one way to do But this may not be a optimal solution for Text column do textminig to get unigrams and bigrams and then convert them to DTM matrix and then use any Linear model to predict the Odometer Reading

I hope this may solve your issue

Rahul Varma
  • 550
  • 5
  • 23