Questions tagged [linguistics]

Linguistics is the scientific study of language and its structure, including the study of morphology, syntax, phonetics, and semantics.

Linguistics is the scientific study of language and its structure, including the study of morphology, syntax, phonetics, and semantics.

Specific branches of linguistics include sociolinguistics, dialectology, psycholinguistics, computational linguistics, historical-comparative linguistics, and applied linguistics.

323 questions
6
votes
2 answers

How to get logical parts of a sentence with java?

Let's say there is a sentence: On March 1, he was born. Changing it to He was born on March 1. doesn't break the sense of the sentence and it is still valid. Shuffling words in any other way would produce weird to invalid sentences. So basically,…
Fluffy
  • 27,504
  • 41
  • 151
  • 234
6
votes
1 answer

How can I generate parse trees of English sentences on iOS?

I would like to generate constituency-based parsed trees of English sentences within an iOS application. http://en.wikipedia.org/wiki/Parse_tree My current options appear to be: Write my own tree generation on top of POS tagging from…
Giles
  • 1,428
  • 11
  • 21
5
votes
1 answer

words usage database?

Is there any free database/place out there with commonality/usage ratios of English words? (British or U.S. English, doesn't matter) I don't care about the exact numbers, only relative to eachother. Something like: the | 0.2 car | 0.08 chroma |…
manixrock
  • 2,533
  • 4
  • 24
  • 29
5
votes
3 answers

Anaphora resolution in stanford-nlp using python

I am trying to do anaphora resolution and for that below is my code. first i navigate to the folder where i have downloaded the stanford module. Then i run the command in command prompt to initialize stanford nlp module java -mx4g -cp…
StatguyUser
  • 2,595
  • 2
  • 22
  • 45
5
votes
2 answers

Where to find wordlists with gender and plural for German?

I'm trying to write a simple text mining application to try to tell a German word's gender and plural form. So, first of all, I need a big wordlist for training. I've searched around but could not find any list having either gender nor plural.
erickrf
  • 2,069
  • 5
  • 21
  • 44
5
votes
1 answer

Case insensitive comparisons across locales in Java

Considering the following Java code comparing a small string containing the German grapheme ß String a = "ß"; String b = a.toUpperCase(); assertTrue(a.equalsIgnoreCase(b)); The comparison fails, because "ß".toUpperCase() is actually equal to "SS",…
Oleksi
  • 12,947
  • 4
  • 56
  • 80
5
votes
1 answer

Verb tense conversion in Python

I'm trying to convert certain verbs to other tenses for some NLP task. I'm trying to use the NodeBox::Linguistics library as suggested here: Using NLTK and WordNet; how do I convert simple tense verb into its present, past or past participle…
kerouac
  • 360
  • 5
  • 14
5
votes
1 answer

Why are Cosine Similarity and TF-IDF used together?

TF-IDF and Cosine Similarity is a commonly used combination for text clustering. Each document is represented by vectors of TF-IDF weights. This is what my text book says. With Cosine Similarity you can then compute the similarities between…
Evgenij Reznik
  • 17,916
  • 39
  • 104
  • 181
5
votes
2 answers

Software to inflect English

Is there any software out there which can do the following? Given an English sentence like "He likes baked beans", I change "he" to "I" and the sentence changes to "I like baked beans" (note the S) or "She has her hair in a ponytail" I change…
user181548
5
votes
3 answers

Handling count of characters with diacritics in R

I'm trying to get the number of characters in strings with characters with diacritics, but I can't manage to get the right result. > x <- "n̥ala" > nchar(x) [1] 5 What I want to get is is 4, since n̥ should be considered one character (i.e.…
Stefano
  • 1,405
  • 11
  • 21
5
votes
4 answers

Dual-line bilingual paragraph in LaTeX

An interlinear gloss can be used to layout a translation of a document. http://en.wikipedia.org/wiki/Interlinear_gloss Usually this is done word-by-word or morpheme-by-morpheme. However, I would like to do this in a different way, translating…
D W
  • 2,979
  • 4
  • 34
  • 45
5
votes
0 answers

Aligned glosses in orgmode (for export to odt/doc)

I'm contemplating writing linguistics articles in orgmode in cases where I know I will need to provide a doc(x) file (normally I would otherwise use LaTeX). One of things I will need to be able to is produce numbered examples with aligned…
emacsomancer
  • 597
  • 1
  • 5
  • 14
5
votes
1 answer

stanford corenlp not working

I'm using Windows 8, and running python in eclipse with pyDev. I installed Stanford coreNLP (python version) from the site: https://github.com/relwell/stanford-corenlp-python When I try to import corenlp, I get the following error message. Traceback…
ghantauke
  • 61
  • 1
  • 5
5
votes
7 answers

NLP: Building (small) corpora, or "Where to get lots of not-too-specialized English-language text files?"

Does anyone have a suggestion for where to find archives or collections of everyday English text for use in a small corpus? I have been using Gutenberg Project books for a working prototype, and would like to incorporate more contemporary language.…
unmounted
  • 33,530
  • 16
  • 61
  • 61
4
votes
5 answers

Best practices for seaching for alternate forms of a word with Lucene

I have a site which is searchable using Lucene. I've noticed from logs that users sometimes don't find what they're looking for because they enter a singular term, but only the plural version of that term is used on the site. I would like the…
Kip
  • 107,154
  • 87
  • 232
  • 265