Questions tagged [vocabulary]

For questions related to dictionary-like structures in programming, mainly to Semantic Web vocabularies. Please do not use in place of the "terminology" tag.

In Semantic Web

  • In the Semantic Web field, a controlled vocabulary is a set of URIs used to identify things, relations or classes.

  • A vocabulary with well-developed subsumption (subclass-superclass) relations is often called a taxonomy.

  • A taxonomy with well-developed non-subsumption relations is often called an ontology.

See also:

190 questions
2
votes
1 answer

SimpleTerm title not being set

I have a form with a SelectFieldWidget, that is currently using a static vocabularly, which is basically this: from zope.schema.vocabulary import SimpleVocabulary, SimpleTerm primary_contacts = SimpleVocabulary([ SimpleTerm( unicode(token),…
Matthew Trevor
  • 14,354
  • 6
  • 37
  • 50
2
votes
3 answers

How do I move terms from one vocabulary to another in Drupal 7 without losing the node reference?

I've inherited a site with a very big hierarchical taxonomy: Vocabulary name: categories --term: company name ---- many child terms -- term: country ---- many child terms -- term: issue ---- many child terms I realized it would be easier to create…
Mac Duck
  • 96
  • 1
  • 7
1
vote
1 answer

Creating Vocabulary in python

I have a number of text file. I would like to use NLTK for preprocessing and printing the vocabulary in a plain text .text format, so that I can distribute those file for the people to use. I did following to do it.I started with taking single…
thetna
  • 6,903
  • 26
  • 79
  • 113
1
vote
0 answers

How to generate a merge file and a vocab file in NLP field

I want to use the Megatron framework for Chinese NLP pre-training tasks. Currently, I have Chinese corpus resources and a vocab.txt file. However, for most frameworks, it seems that vocab.json and merge.txt are needed. Can I generate the above two…
Zhang_kg
  • 11
  • 1
1
vote
1 answer

Apply python package (spaCy) word list only covering the specific language vocabulary

I need to filter out non-core German words from a text using spaCy. However, I couldn't find a suitable approach or word list that covers only the essential vocabulary of the German language. I have tried different approaches using the spacy tools…
Levin
  • 13
  • 4
1
vote
2 answers

How to add taxonomy in drupal; not a vocabulary?

I am following drupal documentation. In chapter 6.5, they explain how taxonomy is different from vocabulary. In this page it is suggested that on a website, which list farmers and recipes list by them, we should have a taxonomy for the ingredients…
user31782
  • 7,087
  • 14
  • 68
  • 143
1
vote
0 answers

How do I find the length of a vocabulary computed during TFX Transform?

I'm currently building a project in TFX and during the Transform step I compute the "vocabulary" for a categorical variable. For later steps (but still during preprocessing), I want to use the length of that vocabulary (i.e. the number of distinct…
Sarah Messer
  • 3,592
  • 1
  • 26
  • 43
1
vote
1 answer

Using spacy with archaich/old english words?

I am using en_core_web_lg to compare some texts for similarity and I am not getting the expected results. The issue I guess is that my texts are mostly religious, for example: "Thus hath it been decreed by Him Who is the Source of Divine…
Chicago1988
  • 970
  • 3
  • 14
  • 35
1
vote
1 answer

Is there a limit to the size of target word vocabulary that should be used in seq2seq models?

In a machine translation seq2seq model (using RNN/GRU/LSTM) we provide sentence in a source language and train the model to map it to a sequence of words in another language (e.g., English to German). The idea is, that the decoder part generates a…
anurag
  • 1,715
  • 1
  • 8
  • 28
1
vote
2 answers

how to tokenize new vocab in spacy?

i am using spacy to get a benefit from it's dependency parsing, i am having a trouble in making spcay tokenizer tokenize the new vocabs i am adding. this is my code: nlp = spacy.load("en_core_web_md") nlp.vocab['bone morphogenetic protein…
Leena
  • 11
  • 1
1
vote
1 answer

How to make usernames as vocabulary in Drupal 7?

I am making a website using Drupal 7, in which a user may assign tasks to another user while editing a content node. I was thinking of doing this by making the Username list appear as a tag list with autocomplete. But I am unable to find any modules…
tejzpr
  • 945
  • 8
  • 19
1
vote
1 answer

Build vocabulary only from training data or entire data?

Should I build the vocabulary only from train data or all data, wouldn't that effect test data in both ways? I mean : If we only build the vocab from train data, The model wouldn't recognize a lot of the words in the validation and testing data, if…
1
vote
1 answer

Should and tags be explictly added to vocabulary after using keras.preprocessing.text Tokenizer?

In Keras we have keras.preprocessing.text to tokenize the text on our requirement and generate a voabulary. tokenizer = tf.keras.preprocessing.text.Tokenizer(split=' ', oov_token=1) tokenizer.fit_on_texts(["Hello world"]) seqs =…
wmIbb
  • 125
  • 1
  • 4
  • 19
1
vote
2 answers

How to create vocabulary graph with word vectors using neo4j?

I want to create a vocabulary graph with word vectors. The aim is to query for nearest word in vocabulary graph based on word similarity. How can we achieve this on neo4j? The following is an example: Suppose vocabulary consists of the…
buddy
  • 189
  • 2
  • 16
1
vote
1 answer

Does NLTK provide a lib to measure vocabulary ordinary level?

Does NLTK or any other NLP tools provide a lib to measure vocabulary ordinary level? By that ordinary level, I mean certain words are simple and more frequently used like "and, age, yes, this, those, kind", which any elementary school student must…
David
  • 39
  • 4