Questions tagged [trigram]

Trigrams are a special case of the N-gram, where N is 3. They are often used in natural language processing for doing statistical analysis of texts.

Trigrams are a special case of the N-gram, where N is 3. They are often used in natural language processing for doing statistical analysis of texts.[1]

100 questions
0
votes
1 answer

How to calculate conditional_frequency_distribution and conditional_probability_distribution for trigrams in nltk python

I want to calculate Conditional Probability Distribution for my language model but I am not able to do because I need Conditional Frequency Distribution which I am not able to generate. This is my code: # -*- coding: utf-8 -*- import io import…
Jai Prak
  • 2,855
  • 4
  • 29
  • 37
0
votes
2 answers

plpgsql function calling trigram similarity function inside does not utilize GIN or GIST indexes

I wanted to combine PostgreSQL Levenshtein and trigram similarity functions. The main advantage of the trigram similarity function is that it can utilize GIN or GIST indexes and thus can return fuzzy match results quickly. However, if it is called…
zlatko
  • 596
  • 1
  • 6
  • 23
0
votes
2 answers

Python reverting bigrams and trigrams

I have a list of bigrams and trigrams: string = 'do not be sad' a_list: = ['do', 'not', 'do not', 'be', 'not be', 'do not be', 'sad', 'be sad', 'not be sad'] I was wondering if there is a function to reverse the bigram and trigram in a_list? I…
user47467
  • 1,045
  • 2
  • 17
  • 34
0
votes
0 answers

How to specify conditions before performing trigram search in PostgreSQL?

I'm fairly new to fuzzy searching and trigrams in PostgreSQL. I have a few hundred thousand products in the database and I want to be able to select products whose name is closest to another product's name. After a few hours of experimenting and…
Tadeáš Peták
  • 567
  • 2
  • 13
0
votes
2 answers

Unicode Error when using NLTK to find trigrams for entire corpus and print to csv

I'm trying to go through a whole folder of text files, collect a cumulative frequency of trigrams across all of those files, and print them to a csv file. It does the whole corpus thing but then when it starts to do the work for the trigrams I get…
Jolijt Tamanaha
  • 333
  • 2
  • 9
  • 23
0
votes
2 answers

String clustering (using trigrams?)

I have a list of strings, which I want to classify into groups. I then want to show on string from each groups. Say my list looks like this: The quick brown fox jumps over the lazy dog The quick brown fox jumps over the lazy dog!!!! The brown fox…
Markus
  • 2,526
  • 4
  • 28
  • 35
0
votes
1 answer

Rails 4 pg_search - multisearchable Issue

I'm using the pg_search gem for Rails (Ruby 2.1.1, Rails 4.1.4) and trying to implement a global multisearch for some of my models but I keep getting PG::UndefinedFunction: ERROR: operator does not exist: text % unknown when I run the multisearch…
user3029619
  • 109
  • 1
  • 10
0
votes
3 answers

How can I get words after and before a specific token?

I currently work on a project which is simply creating basic corpus databases and tokenizes texts. But it seems I am stuck in a matter. Assume that we have those things: import os, re texts = [] for i in os.listdir(somedir): # Somedir contains…
Eray Erdin
  • 2,633
  • 1
  • 32
  • 66
-1
votes
1 answer

Finding the List of words in List of Sentences and return the matching sentences

From the List of Sentences and List of words, how to return the list of Sentences, only if all three words are matching from words Lists (Trigrams). Please suggest. Below are example lists. listwords = [['people','suffering','acute'],…
mlds522
  • 55
  • 5
-1
votes
1 answer

PostgreSQL text search with an index within a Jsonb array

Given the following dataset: -- Create random table CREATE TABLE sample ( id serial primary key, content jsonb ); -- Insert sample rows INSERT INTO sample (content) VALUES ('{"text": ["Lorem ipsum dolor sit amet","consectetur…
Gab
  • 3,404
  • 1
  • 11
  • 22
1 2 3 4 5 6
7