1

I would like to repalce specific token with their pos tag using spacy but I am encpountering this error, is there a way to overcome it.

lemma_token = [sent_doc.replace(w, w.pos_) for w in sent_doc if w.pos_ in list_postag]
AttributeError: 'spacy.tokens.doc.Doc' object has no attribute 'replace'


Script

list_postag = ["ADP","NUM","INTJ","DET","PREP","CCONJ","SCONJ"]

sent_clean = ["je mange bien"; "je l'aime bien", "il est trop beau"]
sent_doc = nlp(sent_clean)

mytokens = [w.lemma_.strip() for w in sent_doc if w.pos_ != "SPACE" and w.pos_ != "PUNCT"]
        sentences_final.append(" ".join(mytokens))


lemma_token = [sent_doc.replace(w, w.pos_) for w in sent_doc if w.pos_ in list_postag]
        sentences_final.append(" ".join(lemma_token))

expecting

lemma_token = ["PRON mange bien"; "PRON PRON aime bien", "PRON est trop beau"]

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
emma
  • 193
  • 2
  • 13

1 Answers1

0

You can't modify the Doc. You could build another Doc but that's usually not what you want, it's usually easier to build a list of strings instead. That's kind of what you're doing towards the end, but you are trying to do much in a list comprehension and it's hard to follow. Here's a simpler version:

out = []
for tok in doc:
    if tok.pos_ in ("SPACE", "PUNCT"): 
        continue
    if tok.pos_ in replace_tags:
        out.append(tok.pos)
        continue
    out.append(tok.text)
polm23
  • 14,456
  • 7
  • 35
  • 59