I want to parse words from a text file. Apostrophes should be preserved, but single quotes should be removed. Here is some test data:
john's apostrophe is a 'challenge'
I am experimenting with grep as follows:
grep -o "[a-z'A-Z]*" file.txt
and it produces:
john's
apostrophe
is
a
'challenge'
Need to get rid of those quotes around the word challenge
.
The correct/desired output should be:
john's
apostrophe
is
a
challenge
EDIT: As the consensus seems to be that apostrophes are problematic to recognize, I am now seeking a way to strip any kind of apostrophe (leading, trailing, embedded) out of all words. The words are to be added to a vocabulary index. The phrase searching should also strip out apostrophes. This may need another question.