Classification using text mining - by values versus keywords

Question

I have a classification problem that is highly correlated to economics by city. I have unstructured data in free text such as population, median income, employment, etc. Is it possible to use text mining to understand the values in the text and make a classification. Most text mining articles if have read use keyword or phrase count to make classification. I would like to be able to make classifications by the meaning of the text versus the frequency of the text. Is this possible?

BTW, I currently use RapidMiner and R. Not sure if this would work with either of these?

Thanks in advance, John

score 0 · Answer 1 · answered Sep 17 '13 at 21:45

Yes, this probably is possible.

But no, I cannot give you a simple solution, you will have to collect a lot of experience and experiment yourself. There is no push-button magic solution that works for everybody.

As your question is overly broad, I don't think there will be a better answer than "Yes, this might be possible", sorry.

score 0 · Answer 2 · answered Sep 27 '13 at 00:12

You could think of these as two separate problems.

Extract information from unstructured data.
Classification

There are several approaches to mine specific features from the text. On the other hand you could also use directly use bag of words approach for classification directly and see the results. Depending on your problem, a classifier could potentially learn from just the text features.

You could also use PCA or something similar to find all the important features and then run mining process to extract those features.

All of this depends on your problem which is too broad and vague.

Classification using text mining - by values versus keywords

2 Answers2