Highest Voted 'oov' Questions

4

votes

2 answers

Part of speech tagging : tagging unknown words

In the part of speech tagger, the best probable tags for the given sentence is determined using HMM by P(T*) = argmax P(Word/Tag)*P(Tag/TagPrev) T But when 'Word' did not appear in the training corpus, P(Word/Tag) produces ZERO…

nlp pos-tagger oov

asked Sep 27 '12 at 02:37

user1599171

3

votes

4 answers

Efficient way of resolving unknown words to known words?

I am designing a text processing program that will generate a list of keywords from a long itemized text document, and combine entries for words that are similar in meaning. There are metrics out there, however I have a new issue of dealing with…

python language-agnostic machine-learning nlp oov

asked Jun 13 '12 at 18:48

Slater Victoroff

21,376
21
85
144

2

votes

1 answer

How to deal with very uncommon terms in tf-idf?

I'm implementing a naive "keyword extraction algorithm". I'm self-taught though so I lack some terminology and maths common in the online literature. I'm finding "most relevant keywords" of a document thus: I count how often each term is used in…

feature-extraction relevance tf-idf noise-reduction oov

asked Oct 18 '12 at 07:54

hippietrail

15,848
18
99
158

1

vote

1 answer

Find most similar words for OOV word

I am looking for the most similar words for out-of-vocab OOV words using gensim. Something like this: def get_word_vec(self, model, word): try: if word not in model.wv.vocab: mostSimWord = model.wv.similar_by_word(word) …

python nlp gensim similarity oov

asked May 22 '20 at 10:29

N0rA

612
1
7
27

1

vote

1 answer

voice recognition on iOS - convert OOV words to phonemes on iOS?

I’ve tried, as suggested on StackOverflow, Openears sucessfully, and generate custom vocabularies from arrays of NSSTRINGS. However, we also need to recognize names from the addressbook, and here the fallback method inevitably fails miserably very…

ios speech-recognition openears oov

asked Mar 01 '14 at 22:41

ranavision

11
1

0

votes

1 answer

How to tune FastText parameter for OOV word?

I already heard that FastText is generating OOV word vectors using its n-gram's. It is already automatically built-in at FastText architecture or we should like to tune specific parameters to it? like an oov_tokens in Keras tokenizer. I already…

parameters word-embedding fasttext oov

asked Jul 26 '21 at 02:13

Eva Agustine

3
1

0

votes

1 answer

How to handle out of vocab words with bag of words

I am attempting to use BoW before ML on my text based dataset. But, I do not want my training set to influence my test set (i.e., data leakage). I want to deploy BoW on the train set before the test set. But, then my test set has different features…

pandas machine-learning text nlp oov

asked May 12 '21 at 13:14

Kim S.

47
5

0

votes

1 answer

Cannot reproduce pre-trained word vectors from its vector_ngrams

Just curiosity, but I was debugging gensim's FastText code for replicating the implementation of Out-of-Vocabulary (OOV) words, and I'm not being able to accomplish it. So, the process i'm following is training a tiny model with a toy corpus, and…

python-3.x gensim fasttext oov

asked Mar 04 '20 at 11:02

threepwood

13
3

0

votes

3 answers

Handling OOV words in GoogleNews-vectors-negative300.bin

I need to calculate the word vectors for each word of a sentence that is tokenized as follows: ['my', 'aunt', 'give', 'me', 'a', 'teddy', 'ruxpin']. If I was using the pretrained [fastText][1] Embeddings: cc.en.300.bin.gz by facebook. I could get…

word2vec oov

asked Sep 16 '19 at 04:18

chikitin

762
6
28

0

votes

2 answers

fasttext: is there a way export ngrams?

I'm new to DL and NLP, and recently started using a pre-trained fastText embedding model (cc.en.300.bin) through gensim. I would like to be able to calculate vectors for out-of-vocabulary words myself, by splitting the word to n-grams and looking up…

export gensim n-gram fasttext oov

asked Mar 12 '19 at 12:06

R Sorek

3
2

0

votes

2 answers

Part of speech for unknown and known words

what are the different between part of speech tagging for unknown words and part of speech tagging for known words. Is there any tool that can predict part of speech tagging for the words ..

nlp stanford-nlp oov

asked May 20 '13 at 05:15

S Gaber

1,536
7
24
43

-1

votes

1 answer

Find list of Out Of Vocabulary (OOV) words from my domain spectific pdf while using FastText model

How to find list of Out Of Vocabulary (OOV) words from my domain spectific pdf while using FastText model? I need to fine tune FastText with my domain specific words.

nlp data-science fasttext oov

asked Jul 26 '21 at 07:25

Srijita Saha Roy

1

Questions tagged [oov]