Questions tagged [trigram]

Trigrams are a special case of the N-gram, where N is 3. They are often used in natural language processing for doing statistical analysis of texts.

Trigrams are a special case of the N-gram, where N is 3. They are often used in natural language processing for doing statistical analysis of texts.[1]

100 questions
0
votes
1 answer

Is there a way to do trigram similarly with whole phrases Postgres

right now with Postgres and pg_tgrm if I do any similarity(strict word, word, or just standard) it does it based on words so the query 'ruined by' will get a rank of 1 for the term 'ruined' but less for something like 'ruined by design' is there a…
0
votes
1 answer

How to perform Kneser-Ney smoothing in NLTK at word-level for tri-gram language model?

I am trying to train a tri-gram language model on a text corpus and want to perform KN smoothing. Apparently, the 'nltk.trigrams' does this at character-level. I was wondering how I would be able to do this at word-level and also perform KN…
0
votes
0 answers

Trigram: only tokenize part of the dataset

Exception in thread "main" java.util.NoSuchElementException There has 500 lines in the document, but it only tokenize for 300++ lines. s3 = itr.nextToken(); // This is the line that got error. console print that the error at this line. public…
0
votes
1 answer

Counting trigrams (3 letter sequence) in C?

I'm attempting to count the number of trigrams, or three letter sequences, in a block of text. I have some code already that successfully counts the number of bigrams (2 letter sequence) using a 2d array, but I'm having some trouble altering it to…
nhlyoung
  • 67
  • 2
  • 8
0
votes
1 answer

Text search using postgresql

I have a use case in which i want to search a company by it's name or the name of the division. If i put in division name in the search query it should return name of the companies which have that division. To achieve this i created a table with…
vasu gaur
  • 63
  • 2
  • 5
0
votes
1 answer

How to train ngram model on my own corpus

I have a corpus of list of strings: corpus = ["Hello I am Sam", "This is a white desk","I ate cereals", ...] I want to build a language model (preferably using nltk) on this corpus, to get the probability of a word in a sentence. So, my later…
Cranjis
  • 1,590
  • 8
  • 31
  • 64
0
votes
1 answer

Postgres similarity function not appropriately using trigram index

I have a simple person table with a last_name column that I've added a GIST index with CREATE INDEX last_name_idx ON person USING gist (last_name gist_trgm_ops); According to the docs at https://www.postgresql.org/docs/10/pgtrgm.html, the <->…
archeezee
  • 411
  • 1
  • 4
  • 17
0
votes
1 answer

Match parts of a query with trigram index

I have a table of events where each event has a title and description. Searches should search both columns: title: Dick(ens) and Jane description: Meet weekly to discuss classic books! Given the above, I would like a query of book club to…
Matt
  • 2,953
  • 3
  • 27
  • 46
0
votes
0 answers

How can I create a series of trigrams in C++ where each following trigram starts off on the second letter of the first?

I've been working on this for awhile now, and I haven't been able to solve the issue. Essentially, if a user were to be given a ".txt" file in which the following was written, "tHe@doG#wENt&uP$tHE!hiLL!", the end result would need to look like…
user8341287
0
votes
1 answer

Multi-column index with unaccent and pg_trgm (matching dirty data)

I have a table with customer data, it has 12M+ records. I want to query it based on few fields, for example: first_name, last_name, birth_place. But data are really dirty, so as a result I want even records that aren't completely matching. I'm using…
Wlad
  • 410
  • 1
  • 9
  • 27
0
votes
0 answers

Django and approximative category

I work in a company which develops a price comparator with Django (1.11). We have uploaded catalogs from merchants and my job is to find the related category for each products. Then, in our website we have about 22 categories (home, dress, glasses,…
smorele
  • 147
  • 12
0
votes
1 answer

Postgres name search

I have a tiny dataset (~1000 rows). Each row has a username, first name and last name. Can I do a fuzzy search on these three fields by using pg_trgm and concatenating the three fields together with two spaces between each? Alternatively, is there a…
Shien
  • 495
  • 5
  • 13
0
votes
1 answer

Brown Corpus Doesn't Yield Results with HMM and TnT taggers

The most dynamic corpus with respect to POS taggers is the treebank corpus . However the Brown Corpus Just refuses to yield results with both HMM and TnT taggers. Any explanation on that? size = int(len(brown.tagged_sents())*0.9) train =…
djokester
  • 567
  • 9
  • 20
0
votes
2 answers

How do I nest a current dictionary into another one in python?

I have a default dict that has 3 layers of embedding that is to be used later for a trigram. counts = defaultdict(lambda:defaultdict(lambda:defaultdict(lambda:0))) Then, I have a for loop that goes through a document and creates counts of each…
Katie Tetzloff
  • 55
  • 1
  • 1
  • 6
0
votes
1 answer

pg_search trigram extension not working

Rails5, i have it installed on database pg_trgm | 1.1 | public | text similarity measurement and index searching based on trigrams) and in the initializer : PgSearch.multisearch_options = { :using => [:tsearch, :trigram], } i've…
daveasdf
  • 83
  • 1
  • 12