0

How to : Extract all (word, vector) pairs from spacy Vocab ?

iteration like:

sort([ w.text for w in nlp.vocab ])

array(['\t', '\n', ' ', '"', "'", "''", "'Cause", "'Cos", "'Coz", "'Cuz", "'S", "'bout", "'cause", "'cos", "'coz", "'cuz", "'d", "'em", "'ll", "'m", "'nuff", "'re", "'s",
   "'ve", "'y", '(', '(*_*)', '(-8', '(-:', '(-;', '(-_-)', ...

returns only ~800 weird items

sten
  • 7,028
  • 9
  • 41
  • 63

1 Answers1

0

nlp.vocab has 764 words. Only radicals:

import spacy
nlp = spacy.load("en_core_web_sm")
len([ w.text for w in nlp.vocab ])

output: 764

nlp.vocab has no vector:

[ (w.text, w.has_vector) for w in nlp.vocab if w.has_vector ]

output: []

You may find information map at: https://spacy.io/api

Spacy API