Questions tagged [collocation]

Anything related to collocations, i.e. sequences of words in text that often appears together. This is a term widely used in linguistics and this tag should be used for related questions.

Anything related to collocations, i.e. sequences of words in text that often appears together. This is a term widely used in linguistics and this tag should be used for related questions.

See Wikipedia on collocations.

43 questions
1
vote
1 answer

Can I get BigramCollocationFinder (nltk) to honour document boundaries?

I am using NLTK to do some analysis of a number of distinct documents. The content of these documents means that they all tend to end and start with the same tokens. I tokenize the documents into a list of lists and then use…
Jennifer
  • 5,148
  • 2
  • 21
  • 19
0
votes
1 answer

C collections and pointers error - assignment from incompatible pointer type

I'm learning C. I tried to solve the following problem, but I had several problems. I do not control the chain linked list with pointers. Here's the problem: We want to write functions for managing the employees of a company. An employee is defined…
0
votes
2 answers

Sorting an array with each element contains an NSString and an NSNumber (double) entry

I created a NSMutableArray with two elements; the name of a city (string at index 0) and the distance (double at index 1) from my present position. for (i=0;i<[City count];++i) { distanceFromMe = [Location distanceFromLocation:[cityLocation]]; [a…
cavuco
  • 99
  • 1
  • 12
0
votes
1 answer

Filter trigram tags with nltk

I want to find out the trigrams of a corpus but with the restriction that at least two words of the trigrams are not proper nouns. This is my code so far. def collocation_finder(text,window_size): ign = stopwords.words('english') …
gd13
  • 55
  • 1
  • 7
0
votes
1 answer

Extracting collocates from texts/sentences

I have a large number of sentences, each of them containing at least one occurrence of 'well'. I'd like to get a list of the two words occurring immediately to the left of 'well' and the two words immediately to the right of 'well'. For example, in…
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
0
votes
2 answers

Ignite loading similar data to particular instance

So i'm really new to apache ignite here. What i'm trying to do is load data having similar properties to a single rather than it being loaded to random instances. For example, say that some data of this…
Marek
  • 245
  • 1
  • 4
  • 15
0
votes
2 answers

Segmentation and Collocation

I am looking for new ideas for two features I am implementing. 1.) Text segmentation feature: Ex: User Query: Resolved Query: ----------- --------------- It has…
starkk92
  • 5,754
  • 9
  • 43
  • 59
0
votes
1 answer

Looking for an elegant way of finding the intersection between two lists of words' tuples in different orders

I think its best to show an example of what I'm trying to do. The point is that I'm looking for an elegant way of doing so. Let's say I have two lists of tuples: x = [('a', 'c', 'e'), ('k', 'n')] y = [('e', 'd', 'w'), ('c', 'a', 'e'), ('n', 'k')] z…
Lior Magen
  • 1,533
  • 2
  • 15
  • 33
0
votes
2 answers

Bi-grams in python with lots of txt files

I have a corpus which includes 70,429 files(296.5 mb). I try to find bi-grams by using whole corpus. I have written the following code; allFiles = "" for dirName in os.listdir(rootDirectory): for subDir in os.listdir(dirName): for…
yns
  • 440
  • 2
  • 8
  • 28
0
votes
1 answer

N-Gram with ArrayList

I am undergoing a project where I am analysing 'ngrams'. I have a method in my program that creates bigrams and trigrams. However, they only get the consecutive adjacent words together where I want it to get all combinations of words... For…
John Lewis
  • 139
  • 1
  • 2
  • 15
0
votes
1 answer

Getting unrecognizable words while finding Trigrams from NLTK Collocations

I am using NLTK Collocations to find trigrams and 'training_set' is a string with many lines of text. finder = TrigramCollocationFinder.from_words(str(training_set)) print finder.nbest(trigram_measures.pmi, 5) But I am getting the output as …
Shivendra
  • 1,076
  • 2
  • 12
  • 26
-1
votes
1 answer

High order derivatives of chebyshev polynomials needed

I need to evaluate high order (up to 4) derivatives of Chebyshev polynomials at points of the so called Chebyshev grid, x(j)=cos(πj/N), j=0,...,N Anyone know how to do that? I tried iterative methods but they are too clumsy. I remember seeing…
-1
votes
2 answers

Most common sentences extractions with count using Python

I want to write a Python Script that searches all Excel rows and returns top 10 most common sentences. I have written the basics of ngrams for a txt file. The file contains csv text with dj is best 4 times and gd is cool 3 times. import nltk import…
DJKarma
  • 172
  • 9
1 2
3