Questions tagged [collocation]

Anything related to collocations, i.e. sequences of words in text that often appears together. This is a term widely used in linguistics and this tag should be used for related questions.

Anything related to collocations, i.e. sequences of words in text that often appears together. This is a term widely used in linguistics and this tag should be used for related questions.

See Wikipedia on collocations.

43 questions
1
vote
0 answers

Implement collocation in ElasticSearch

I am new in Elastic Search Usage I am trying to do an analyser or an ingest pipeline that would create collocations of words ( unigram, bigrams, trigrams with a step up to 2 ) . I am aware it's feasible in python but I am only interested in an ES…
Axel
  • 186
  • 3
  • 11
1
vote
1 answer

How do I use collocation() to store it as a list

I have to save the bigram collocations as a list in a variable but unable to do it with collocation(). I need to later return the variable from a function.
1
vote
5 answers

'NoneType' object is not iterable for collocation function

I am new to NLTK and trying to return the collocation output. I am getting the output and along with it, I am getting none as well. Below is my code, input and output. import nltk from nltk.corpus import stopwords def…
sk2020
  • 11
  • 3
1
vote
0 answers

How to tabulate Conditional Frequency Distribution for collocations in texts

I have several texts that I found collocations for, and now I'd like to create a table that shows how many times each collocation appears in each text of the corpus. When I generate a table or a plot from the ConditionalFreqDist,it shows only 1…
Gavrk
  • 295
  • 1
  • 4
  • 16
1
vote
0 answers

Trapezoidal collocation using cvxpy

I tried trapezoidal collocation using to solve a double integrator control problem. I am doing in CVXPY. This is a two point boundary value problem. After solving there is a jump in control beyond limits to meet the end condition. Could not find…
1
vote
1 answer

How to train a Phrases model from a huge corpus of articles (wikipedia)?

I'd like to create a big gensim dictionary for french language to try getting better results in topic detection, similarities between texts and other things like that. So I've planned to use a wikipedia dump and process it the following…
fbparis
  • 880
  • 1
  • 10
  • 23
1
vote
1 answer

Search nearest value by lat-lon from list data files by Python loop for and saved in array form

I tried to search nearest values for 34 locations with given lat-lon as reference. Set of data files for 30 days (consists of thousands data) arranged in array forms with nearest lat-lon and desired data. How to collocate the lat-lon to find…
Azam
  • 165
  • 14
1
vote
1 answer

Extracting collocates for a given word from a text corpus - Python

I am trying to find out how to extract the collocates of a specific word out of a text. As in: what are the words that make a statistically significant collocation with e.g. the word "hobbit" in the entire text corpus? I am expecting a result…
A-B
  • 31
  • 1
  • 6
1
vote
0 answers

Finding Collocation using Apache OpenNLP

I would like to find collocated words using Apache OpenNLP framework. By looking at the API, it seems there seems to no API for Collocation Finder. How to find the collocated words in a given sentences using OpenNLP. For example, in the below given…
1
vote
0 answers

How to find word collocation in WordNet from a sentence with Stanford NLP CollocationFinder?

Given a paragraph or a sentence. The task is to find all word collocation which is already hard coded in WordNet. To clarify, I'm trying to find word collocation which already in WordNet from a given text, I'm not trying to extract word collocation…
V.Tran
  • 457
  • 1
  • 5
  • 13
1
vote
0 answers

Collocations within a document, window size

I am trying to calculate collocations within a document. Is there a possibility to define a window size for it within the NLTK function (as it is with the function "from_words")? Or instead of the window size the border of a sentence or the whole…
student
  • 511
  • 1
  • 5
  • 20
1
vote
2 answers

List the collocations for a txt file

I want to list the collocations as reported by the NLTK for Dracula.txt. How do i do this? I am able to find word frequency by adding it to my corpus. I also have a variable DracWords dracWords = mycorpus.words('Dracula.txt') which has the words…
1
vote
1 answer

How to count documents in which two words appear in close proximity in R?

I would like to count documents in which two strings appear within a set distance, within 10 words of each other. Let's say 'German*' and 'War'. I do not want to count the times they appear in total, but only the number of documents in which the…
Melvin Wevers
  • 151
  • 4
  • 11
1
vote
1 answer

CouchDB: Collocated views and keys

In my couch I have document pairs like this: { _id: "DOCID", type: "Task", info: { k1: "v1", k2: "v2" } } { _id: "ANOTHER DOCID", type: "Final", task: "DOCID", author: "Authorname" } For an author, several of these pairs can exist. I…
enpenax
  • 1,476
  • 1
  • 15
  • 27
1
vote
1 answer

Converting mysql chars

I have a database that's seems to be on latin1_swedish. I need to add some more text to it. The new text contains some Brazilian words. Example: tilápia Cachaça ... The old text that is in the db has these words too, but it's like this:…
Lucca Zenobio
  • 192
  • 1
  • 1
  • 9