Questions tagged [trigram]

Trigrams are a special case of the N-gram, where N is 3. They are often used in natural language processing for doing statistical analysis of texts.

Trigrams are a special case of the N-gram, where N is 3. They are often used in natural language processing for doing statistical analysis of texts.[1]

100 questions
1
vote
0 answers

How to create a Trigram index in Django 3 with multiple columns

I've implemented Trigram Similarity search with annotate which gives me exactly what I want in terms of results; but I have 220,000 records in my database and the search takes 5+ seconds per query which is too long. The search is with 3 columns and…
SEMICS
  • 181
  • 3
  • 5
1
vote
0 answers

How to implement spelling insensitive search in django & PostgreSQL?

What I wanted to achieve is, if user enters search for "laptp" then database should return results with actual word "Laptop". Similarly if user enters "ambroidery", then database should return results with both "embroidery" and "embroidred" words…
Maha Waqar
  • 585
  • 1
  • 10
  • 24
1
vote
1 answer

data.table error and warnings for finding trigram probability

I'm trying the same code as in https://thiloshon.wordpress.com/2018/03/11/build-your-own-word-sentence-prediction-application-part-02/ to do word-level prediction. The input textual data is also in the mentioned link and I use en_US.news.txt file as…
ch.elahe
  • 289
  • 4
  • 18
1
vote
1 answer

Ngrams from pandas column

I have a pandas dataframe, with the following columns : Column 1 ['if', 'you', 'think', 'she', "'s", 'cute', 'now', ',', 'you', 'should', 'have', 'see', 'her', 'a', 'couple', 'of', 'year', 'ago', '.'] ['uh', ',', 'yeah', '.', 'just', 'a', 'fax',…
Alex Dana
  • 1,076
  • 3
  • 14
  • 30
1
vote
0 answers

ElasticSearch: how can i influence the "directionality" of a trigram match?

we use elasticsearch to search on address data and for the purpose of non-exact matches we include a field variant of the streetname that is analyzed with an ngram tokenizer (trigrams to be specific). and we use a minimum-should-match clause of…
Mario Köhler
  • 192
  • 1
  • 10
1
vote
3 answers

Trigram distance operations in sqlalchemy

I'm currently trying to use the pg_trgm operations % and <->. The GIN Indices on the columns are already available, but I can't find the sqlalchemy equivalent to the previously mentioned operators. What would be the best approach to solve this…
Nukesor
  • 945
  • 1
  • 7
  • 13
1
vote
1 answer

Finding verbs in a list of trigrams (with partial match?)

I know that there are many questions out there about partial matches and I've read as many as I've been able to, but I have still not managed to extract what I need using R. In a nutshell, my problem is that I have a data set with over a million…
PrisLB
  • 23
  • 6
1
vote
1 answer

How to extract all the ngrams from a text dataframe column in different order in a pandas dataframe?

Below is the input Dataframe I have. id description 1 **must watch avoid** **good acting** 2 average movie bad acting 3 good movie **acting good** 4 pathetic avoid 5 **avoid watch must** I want to extract the ngrams i.e bigram, trigram…
Ash
  • 11
  • 1
  • 4
1
vote
1 answer

postgres trigram index is too slow with ilike search

I'm doing a pattern matching search with ILIKE in our system, but it gets too slow with some tables due to the amount of records in the table. So I'm implementing trigram index following instructions in this post…
thedummy
  • 11
  • 3
1
vote
0 answers

PostgreSQL scoring distance with trigrams

I would like to understand why the distance between 2 words is so big with just a letter reverse. Example: SELECT name, searchable, searchable <-> 'fluerie' AS dist FROM shd_appellations WHERE region_id=3 ORDER BY dist LIMIT 1; Result: fleurie =>…
alex.bour
  • 2,842
  • 9
  • 40
  • 66
1
vote
1 answer

How to combine full text search and trigrams in postgres

I am working on a search system for a database of git commits. I am currently using full text search to enable a user to search by author, commit date, the log message, and commit hash. Currently, the commit hash is only useful if the user provides…
Evan
  • 508
  • 1
  • 5
  • 18
1
vote
0 answers

Infer optimum threshold values for normalized levenshtein distance and trigram similarity functions

Records from two datasets are compared for fuzzy match string similarity, by using normalized levenshtein distance function and trigram similarity function. 4 different similarity metrics are calculated: LevCmpSimilarity - normalized Levenshtein…
zlatko
  • 596
  • 1
  • 6
  • 23
1
vote
3 answers

PostgreSQL full text search abbreviations

I created a Postgresql full text search using 'german'. How can I configer, that when I search for "Bezirk", lines containing "Bez." are also a match? (And vice-versa)
JohnDoe
  • 2,422
  • 5
  • 29
  • 44
1
vote
2 answers

Getting 'trigrams' in Java

I am having a bit of an issue getting trigrams in Java. My program can currently get bigrams fine but when I try to implement the same structure of the method and change it to get trigrams it seems to not work as well. I want the trigrams to get…
John Lewis
  • 139
  • 1
  • 2
  • 15
0
votes
0 answers

trigram model getting IndexError: list index out of range when choosing random word

I'm new to python and need help with NLTK language modeling. I'm trying to generate the setence starting with "he said" using trigram model but get the following error: Traceback (most recent call last): File "C:\Users\PycharmProjects\homework3…