Questions tagged [linguistics]

Linguistics is the scientific study of language and its structure, including the study of morphology, syntax, phonetics, and semantics.

Linguistics is the scientific study of language and its structure, including the study of morphology, syntax, phonetics, and semantics.

Specific branches of linguistics include sociolinguistics, dialectology, psycholinguistics, computational linguistics, historical-comparative linguistics, and applied linguistics.

323 questions
4
votes
2 answers

Match two columns from two dataframes and add items from a third column if cells match

I have two pandas dataframes with linguistic data, oset with the full data and miscset which is a subset of the full data. I am looking for a way of comparing two columns with strings from two different dataframes, and identify those rows that…
Coedwig
  • 41
  • 5
4
votes
1 answer

Duplicate elimination of similar company names

I have a table with company names. There are many duplicates because of human input errors. There are different perceptions if the subdivision should be included, typos, etc. I want all these duplicates to be marked as one company…
EliteRaceElephant
  • 7,744
  • 4
  • 47
  • 62
4
votes
5 answers

Create short human-readable string from longer string

I have a requirement to contract a string such as... Would you consider becoming a robot? You would be provided with a free annual oil change." ...to something much shorter but yet still humanly identifiable (it will need to be found from a select…
David Neale
  • 16,498
  • 6
  • 59
  • 85
4
votes
4 answers

Techniques other than RegEx to discover 'intent' in sentences

I'm embarking on a project for a non-profit organization to help process and classify 1000's of reports annually from their field workers / contractors the world over. I'm relatively new to NLP and as such wanted to seek the group's guidance on the…
Akhu Nam
  • 41
  • 3
4
votes
2 answers

Proper way of eliminating letter repetitions from English words?

As the title clearly describes, I wonder what is the right way to eliminate character repetitions in English that are commonly used in social media to exaggerate the feeling. Since I am developing a software solution to correct mistyped words, I…
talha06
  • 6,206
  • 21
  • 92
  • 147
4
votes
1 answer

SimpleNLG: how we specify the quantity?

My question is how to specify the quantity in a noun phrase? For example: NPPhraseSpec np = nlgFactory.createNounPhrase("", "apple"); How to generate "5 apples", for example? A solution is to put a preModifier, the code would be: Lexicon…
Karim
  • 133
  • 5
4
votes
3 answers

Using RNN tensorflow language model to predict the probabilities of test sentences

I was able to train a language model using the tensorflow tutorials , the models are saved as checkpoint files as per the code given here. save_path = saver.save(sess, "/tmp/model.epoch.%03d.ckpt" % (i + 1)) Now I need to restore the checkpoint and…
stackit
  • 3,036
  • 9
  • 34
  • 62
4
votes
1 answer

How to implement a good Pronoun Resolver algorithm in OpenNLP?

I use OpenNLP's coreference package for anaphora resolution. So basically I have this input string: "Harry writes a letter to his brother. He told him that he met Mary in London. They had a lunch together."; The set of mentions output are as…
sw2
  • 357
  • 6
  • 13
4
votes
1 answer

Oracle linguistic index not used when SQL contains parameter with LIKE

My schema (simplified): CREATE TABLE LOC ( LOC_ID NUMBER(15,0) NOT NULL, LOC_REF_NO VARCHAR2(100 CHAR) NOT NULL ) / CREATE INDEX LOC_REF_NO_IDX ON LOC ( NLSSORT("LOC_REF_NO",'nls_sort=''BINARY_AI''') ASC ) / My query (in…
VinceJS
  • 1,254
  • 3
  • 18
  • 38
4
votes
2 answers

How to conjugate English words in Java?

Say I have a base form of a word and a tag from the Penn Treebank Tag Set. How can I get the conjugated form? For example for "do" and "VBN" how can I get "done"? I thinks this task is already implemented in some nlp library, so I'd rather not…
Fluffy
  • 27,504
  • 41
  • 151
  • 234
4
votes
3 answers

The distance between the meaning of two sentences

I am looking for a way to measure the semantic distance between two sentences. Suppose we have the following sentences: (S1) The beautiful cherry blossoms in Japan. (S2) The beautiful Japan. S2 is created from S1 by eliminating the words "cherry",…
Riadh Belkebir
  • 797
  • 1
  • 12
  • 34
4
votes
3 answers

How can I get the possessive form of a noun?

Here's an algorithm for adding an apostrophe to a given input noun. How would you contruct a string to show ownership? /** * apostrophizes the string properly *
 * curtis = curtis'
 * shaun = shaun's
 * 
* * @param input string to…
Shaun
  • 4,057
  • 7
  • 38
  • 48
4
votes
0 answers

Transliteration between different writing systems

I need to learn how to change a transliteration of a text to another writing system. Apparently the best way would somehow involve regular expressions and perl, probably from command line? I've been using regular expressions earlier in Notepad++ and…
nikopartanen
  • 577
  • 8
  • 15
4
votes
2 answers

Training Hidden Markov Models without Tagged Corpus Data

For a linguistics course we implemented Part of Speech (POS) tagging using a hidden markov model, where the hidden variables were the parts of speech. We trained the system on some tagged data, and then tested it and compared our results with the…
Claudiu
  • 224,032
  • 165
  • 485
  • 680
4
votes
8 answers

Theory: "Lexical Encoding"

I am using the term "Lexical Encoding" for my lack of a better one. A Word is arguably the fundamental unit of communication as opposed to a Letter. Unicode tries to assign a numeric value to each Letter of all known Alphabets. What is a Letter to…
Ande Turner
  • 7,096
  • 19
  • 80
  • 107