I'm trying to use word2vec in some text that contains phrase delimitations like
I <phrase>like green beans</phrase> in my tortillas.
Before feeding the text to word2vec I need the input to be:
I __like_green_beans__ in my tortillas.
I've been trying to use sed to do the replacement. By doing
sed -e 's@<phrase>\(.*\)</phrase>@__\1__@g' myfile.txt
I can get rid of the delimiter but I haven't found a way to replace the spaces within the capture group.
Any ideas if it is possible with sed?