I'm thinking of putting a stop words in my similarity program and then a stemmer (going for porters 1 or 2 depends on what easiest to implement)
I was wondering that since I read my text from files as whole lines and save them as a long string, so if I got two strings ex.
String one = "I decided buy something from the shop.";
String two = "Nevertheless I decidedly bought something from a shop.";
Now that I got those strings
Stemming: Can I just use the stemmer algoritmen directly on it, save it as a String and then continue working on the similarity like I did before implementing the stemmer in the program, like running one.stem(); kind of thing?
Stop word: How does this work out? O.o Do I just use; one.replaceall("I", ""); or is there some specific way to use for this proces? I want to keep working with the string and get a string before using the similarity algorithms on it to get the similarity. Wiki doesn't say a lot.
Hope you can help me out! Thanks.
Edit: It is for a school-related project where I'm writing a paper on similarity between different algorithms so I don't think I'm allowed to use lucene or other libraries that does the work for me. Plus I would like to try and understand how it works before I start using the libraries like Lucene and co. Hope it's not too much a bother ^^