I am writing a code that opens a link and collects words surrounding a substring j into Res, and then collects all the nouns in Res as follows:
j="Green Index" #defining word to be looked for
sub = '(\w*)\W*(\w*)\W*(%s)\W*(\w*)\W*(\w*)' % j #defining substring including word
allnouns=[]
link="http://greenindex.timberland.com/" #defining link to search for word
f=requests.get(link)
str1=f.text
for i in re.findall(sub, str1, re.I): #collecting all terms found together
print(" ".join([x for x in i if x != ""]))
Res=(" ".join([x for x in i if x != ""]))#creating each sentence Res
Results.append(Res) #putting all sentences Res in one list Results
sentences = nltk.sent_tokenize(Res) #here is where I hit an error
nouns = []
for sentence in sentences:
for word,pos in nltk.pos_tag(nltk.word_tokenize(str(sentence))):
if (pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos ==
'NNPS'):
nouns.append(word)
allnouns.append(nouns)
I hit an error right before my second loop:
TypeError: Can't convert 'list' object to str implicitly
I checked and type(Res)=class str
and I tried to split Res also thinking it might help, sentences = nltk.sent_tokenize(Res.split)
but same error. How can I get around it?