I am trying to build a SVM model on a text corpus. For this I built DocumentTermMatrix with following control parameters:
control <- list(stopwords = TRUE,
removePunctuation = TRUE,
removeNumbers = TRUE,
minDocFreq = 2,
stemming = TRUE,
weighting = function(x) weightTfIdf(x, normalize = FALSE))
After this, when I look into the created DTM matrix I can see words like 'you', 'you-', 'youll', etc., I am surprised to see these words because I have already specified stemming & removing punctuation. Can someone tell me why I am still seeing irrelevant words?
sessionInfo output:
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_India.1252
[2] LC_CTYPE=English_India.1252
[3] LC_MONETARY=English_India.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_India.1252
attached base packages:
[1] stats graphics grDevices utils
[5] datasets methods base
other attached packages:
[1] ggplot2_0.9.3.1 e1071_1.6-1
[3] class_7.3-7 tm_0.5-8.3
loaded via a namespace (and not attached):
[1] colorspace_1.2-2 dichromat_2.0-0
[3] digest_0.6.3 grid_3.0.1
[5] gtable_0.1.2 labeling_0.1
[7] MASS_7.3-28 munsell_0.4
[9] plyr_1.8 proto_0.3-10
[11] RColorBrewer_1.0-5 reshape2_1.2.2
[13] rJava_0.9-4 RWeka_0.4-18
[15] RWekajars_3.7.9-1 scales_0.2.3
[17] slam_0.1-28 Snowball_0.0-9
[19] stringr_0.6.2 tools_3.0.1
Update: I attached both packages Snowball_0.0-9 and SnowballC_0.5 by calling library(packageName) but still I don't see any improvements. Any help is highly appreciated. I even tried changing control parameters order while passing the list but still of no luck. Or may be I am misunderstanding stemming process (for English)?
Update 2:
Sample text from Charles Dickens's "Bleak House"
"Dear, dear, to think how much time we passed alone together afterwards, and how often I repeated to the doll the story of my birthday and confided to her that I would try as hard as ever I could to repair the fault I had been born with (of which I confessedly felt guilty and yet innocent) and would strive as I grew up to be industrious, contented, and kind-hearted and to do some good to some one, and win some love to myself if I could. I hope it is not self-indulgent to shed these tears as I think of it. I am very thankful, I am very cheerful, but I cannot quite help their coming to my eyes.
There! I have wiped them away now and can go on again properly."
Sample text 2 from Henry James's "The Altar of the Dead"
"He had a mortal dislike, poor Stransom, to lean anniversaries, and loved them still less when they made a pretence of a figure. Celebrations and suppressions were equally painful to him, and but one of the former found a place in his life. He had kept each year in his own fashion the date of Mary Antrim’s death. It would be more to the point perhaps to say that this occasion kept him: it kept him at least effectually from doing anything else. It took hold of him again and again with a hand of which time had softened but never loosened the touch. He waked to his feast of memory as consciously as he would have waked to his marriage-morn. Marriage had had of old but too little to say to the matter: for the girl who was to have been his bride there had been no bridal embrace. She had died of a malignant fever after the wedding-day had been fixed, and he had lost before fairly tasting it an affection that promised to fill his life to the brim."
Sample text 3 from Mark Twain's "ADVENTURES OF TOM SAWYER"
" The old lady pulled her spectacles down and looked over them about the room; then she put them up and looked out under them. She seldom or never looked through them for so small a thing as a boy; they were her state pair, the pride of her heart, and were built for "style," not service—she could have seen through a pair of stove-lids just as well. She looked perplexed for a moment, and then said, not fiercely, but still loud enough for the furniture to hear:
"Well, I lay if I get hold of you I'll—"
She did not finish, for by this time she was bending down and punching under the bed with the broom, and so she needed breath to punctuate the punches with. She resurrected nothing but the cat.
"I never did see the beat of that boy!"
She went to the open door and stood in it and looked out among the tomato vines and "jimpson" weeds that constituted the garden. No Tom. So she lifted up her voice at an angle calculated for distance and shouted:
"Y-o-u-u TOM!"
There was a slight noise behind her and she turned just in time to seize a small boy by the slack of his roundabout and arrest his flight.
"There! I might 'a' thought of that closet. What you been doing in there?"
"Nothing."
"Nothing! Look at your hands. And look at your mouth. What is that truck?" "
Sample text 4 from Oscar Wilde's "The Happy Prince and Other Tales"
"Then he saw the statue on the tall column.
“I will put up there,” he cried; “it is a fine position, with plenty of fresh air.” So he alighted just between the feet of the Happy Prince.
“I have a golden bedroom,” he said softly to himself as he looked round, and he prepared to go to sleep; but just as he was putting his head under his wing a large drop of water fell on him. “What a curious thing!” he cried; “there is not a single cloud in the sky, the stars are quite clear and bright, and yet it is raining. The climate in the north of Europe is really dreadful. The Reed used to like the rain, but that was merely her selfishness.”
Then another drop fell.
“What is the use of a statue if it cannot keep the rain off?” he said; “I must look for a good chimney-pot,” and he determined to fly away.
But before he had opened his wings, a third drop fell, and he looked up, and saw— Ah! what did he see?
The eyes of the Happy Prince were filled with tears, and tears were running down his golden cheeks. His face was so beautiful in the moonlight that the little Swallow was filled with pity.
“Who are you?” he said.
“I am the Happy Prince.”
“Why are you weeping then?” asked the Swallow; “you have quite drenched me.” "
Sample text 5 from Plato's "Laws"
" MEGILLUS: I think that I can get as far as the fourth head, which is the frequent endurance of pain, exhibited among us Spartans in certain hand-to-hand fights; also in stealing with the prospect of getting a good beating; there is, too, the so-called Crypteia, or secret service, in which wonderful endurance is shown,—our people wander over the whole country by day and by night, and even in winter have not a shoe to their foot, and are without beds to lie upon, and have to attend upon themselves. Marvellous, too, is the endurance which our citizens show in their naked exercises, contending against the violent summer heat; and there are many similar practices, to speak of which in detail would be endless.
ATHENIAN: Excellent, O Lacedaemonian Stranger. But how ought we to define courage? Is it to be regarded only as a combat against fears and pains, or also against desires and pleasures, and against flatteries; which exercise such a tremendous power, that they make the hearts even of respectable citizens to melt like wax? "