Questions tagged [vocabulary]

For questions related to dictionary-like structures in programming, mainly to Semantic Web vocabularies. Please do not use in place of the "terminology" tag.

In Semantic Web

  • In the Semantic Web field, a controlled vocabulary is a set of URIs used to identify things, relations or classes.

  • A vocabulary with well-developed subsumption (subclass-superclass) relations is often called a taxonomy.

  • A taxonomy with well-developed non-subsumption relations is often called an ontology.

See also:

190 questions
0
votes
1 answer

ETL Mapping LOINC vocabulary on OMOP Common Data Model

I am working on the lab test values mapping (MEASUREMENT table of the OMOP CDM). My local mapping table (handmade) has my measurement name (in French) and the associated LOINC code. The LOINC vocabulary has been loaded from Athena (OHDSI community…
MF_
  • 36
  • 4
0
votes
1 answer

Extract all (word, vector) pairs from vocab

How to : Extract all (word, vector) pairs from spacy Vocab ? iteration like: sort([ w.text for w in nlp.vocab ]) array(['\t', '\n', ' ', '"', "'", "''", "'Cause", "'Cos", "'Coz", "'Cuz", "'S", "'bout", "'cause", "'cos", "'coz", "'cuz", "'d", "'em",…
sten
  • 7,028
  • 9
  • 41
  • 63
0
votes
0 answers

How to check that a String contains english words using frequency of words?

i have a couple of strings and i have to say which one contains english words. How con i find it? The suggestion is to use frequency of word in the english vocabulary but i can't figure out a way to make it as an algorithm. I calculated the…
0
votes
0 answers

raise ValueError( ValueError: empty vocabulary; perhaps the documents only contain stop words

Please provide me your insights to resolve the below issue. Error Details Application Name : SCPortal Minimum data per functionality : 25 Input File Name: /Users/Document/Desktop/PythonSpace/SC_Portal_Training_8thSep22.csv Output file path:…
0
votes
1 answer

NLP data processing between `BucketIterator` and `build_vocab_from_iterator`

I am using AG News Dataset to train model for using text classification. The part using TabularDataset to generate dataset from csv file. import torchtext import torch from torchtext.legacy.data import Field, TabularDataset, BucketIterator,…
jackson
  • 11
  • 3
  • 9
0
votes
2 answers

Pandas DataFrame give frequency of Column occurrence whilst maintaining the Doc ID

I have a dataframe and im trying to create a vocabulary of terms from it (I have already tokenized and preprocessed to just a list of all words and the Doc ID attached to it), for example I have Word Doc ID 0 Big XX 1 Big …
0
votes
1 answer

find words out of vocabulary

I have some texts in a pandas dataframe df['mytext'] I have also got a vocabulary vocab (list of words). I am trying to list and count the words out of vocabulary for each document I have tried the following but it is quite slow for 10k…
00__00__00
  • 4,834
  • 9
  • 41
  • 89
0
votes
1 answer

How to identify/detect a vocabulary in a text (Node JS)

I'm currently working on an app on which I have blocs of text and would like to know if they're related to cooking / recipe vocabulary. I've seen and tried a few things, but I'm starting to wonder if I'm not going to much overkill on that ( I don't…
sblr
  • 1
  • 1
0
votes
0 answers

Use pre-trained model vocabulary in an appropriate way with allennlp

When using a huggingface pre-traind model,i passed a tokennizer and indexer for my textfied in Datasetreader, also i want use the same tokennizer and indexer in my model. Which way is an appropriate way in allennlp ? (using config file ?) Here is my…
0
votes
2 answers

Repository of SKOS Vocabularies

Is there any repository or list of freely available SKOS (or even not SKOS) vocabs? I need some special vocabs and want to know if they are already exist or not. I found some well-known ones like http://www.eionet.europa.eu/GEMET or agrovoc (which…
Hamzeh
  • 541
  • 7
  • 15
0
votes
2 answers

What is *.subwords file in natural language processing to use as vocabulary file?

I have been trying to create a vocab file in a nlp task to use in tokenize method of trax to tokenize the word but i can't find which module/library to use to create the *.subwords file. Please help me out?
0
votes
1 answer

spacy nightly (3.0.0rc) load without vocab how to add word2vec vectorspace?

In spacy 2 I use this to add a vocab to an empty spacy model with vectorspace (spacy init) : nlp3=spacy.load('nl_core_news_sm') #standard model without vectors spacy.load("spacyinitnlmodelwithvectorspace",vocab=nlp3.vocab) In spacy nightly version…
0
votes
2 answers

Vocabulary scale of machine translation

When doing machine translation, if you segment words, such as using BPE, how big is the processed vocabulary?
jiangxoo
  • 13
  • 1
0
votes
1 answer

ibm-cloud speech-to-text: Is it possible to specify phonemes for custom vocabulary?

We need to build a custom model with a lot of already phonemically transcribed custom vocabulary, but the current API for specifying custom words has no published option for specifying a phonemic string rather than a manually generated, ad-hoc…
W. Sadkin
  • 261
  • 3
  • 8
0
votes
0 answers

word to speech on click ( without having to click on any button)

I'm new to StackOverflow and I need help. I am developing a new vocabulary app for beginners. I want my app users to be able to hear the pronunciation of any word from the list of vocabulary in my app without needing to click any button ( just…