Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches. It is often regarded as the engineering arm of Computational Linguistics.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Beginner books on Natural Language Processing

Popular software packages

20185 questions
6
votes
2 answers

How to remove english text from arabic string in python?

I have an Arabic string with English text and punctuations. I need to filter Arabic text and I tried removing punctuations and English words using sting. However, I lost the spacing between Arabic words. Where am I wrong? import string exclude =…
Anish
  • 1,920
  • 11
  • 28
  • 48
6
votes
1 answer

Chunking NP, VP and PP phrases in Java (CoreNLP)

I'm using Stanford CoreNLP and I'm aware it doesn't support chunking of sentences. What I'm looking for is, given an input sentence, to have something like this as output: [NP He ] [VP reckons ] [NP the current account deficit ] [VP will narrow ]…
The Coding Monk
  • 7,684
  • 12
  • 41
  • 56
6
votes
2 answers

Natural language grammar and user-entered names

Some languages, particularly Slavic languages, change the endings of people's names according to the grammatical context. (For those of you who know grammar or studied languages that do this to words, such as German or Russian, and to help with…
Owen Blacker
  • 4,117
  • 2
  • 33
  • 70
6
votes
2 answers

Memory efficient way of union a sequence of RDDs from Files in Apache Spark

I'm currently trying to train a set of Word2Vec Vectors on the UMBC Webbase Corpus (around 30GB of text in 400 files). I often run into out of memory situations even on 100 GB plus Machines. I run Spark in the application itself. I tried to tweak a…
dice89
  • 459
  • 5
  • 10
6
votes
3 answers

Text summarization: how to choose the right n-gram size

I am working on summarizing texts, using nltk library I am able to extract bigrams unigrams and trigrams and order them by frequency As I am very new to this area (NLP) I was wondering if I can use a statistical model that will allow me to…
sel
  • 942
  • 1
  • 12
  • 25
6
votes
3 answers

Stanford parser java error

I am working on a research about NLP, i woul to use Stanford parser to extract noun phrases from text, the parser version i used is 3.4.1 this is the sample code i used package stanfordparser; import java.util.Collection; import…
Karim Harazin
  • 1,463
  • 2
  • 16
  • 34
6
votes
1 answer

Algorithm for Determining Word Type using WordNet Database

I'm working on a project which requires scanning through paragraphs of natural text in English and detecting what type of word they are. The application works with AJAX, PHP, and MySQL. My application doesn't need to be 100% accurate and simply…
Arcana
  • 239
  • 5
  • 13
6
votes
1 answer

Simplifying the French POS Tag Set with NLTK

How can one simplify the part of speech tags returned by Stanford's French POS tagger? It is fairly easy to read an English sentence into NLTK, find each word's part of speech, then use map_tag() to simplify the tag set: #!/usr/bin/python # -*-…
duhaime
  • 25,611
  • 17
  • 169
  • 224
6
votes
10 answers

Is it better to use a "natural" language to write code?

I recently saw a programming language called supernova and they said in the web page : The Supernova Programming language is a modern scripting language and the First one presents the concept of programming with direct Fiction Description…
Mohamad Alhamoud
  • 4,881
  • 9
  • 33
  • 44
6
votes
1 answer

Using Wordnet to generate superlative, comparative and adjectives

I have a wordnet database setup, and I'm trying to generate synonyms for various words. For example, the word, "greatest". I'll look through and find several different synonyms, but none of them really fit the definition - for example, one is…
Steven Matthews
  • 9,705
  • 45
  • 126
  • 232
6
votes
1 answer

In natural language processing (NLP), how do you make an efficient dimension reduction?

In NLP, it's always the case that the dimension of the features are very huge. For example, for one project at hand, the dimension of features is almost 20 thousands (p = 20,000), and each feature is a 0-1 integer to show whether a specific word or…
6
votes
2 answers

How to get logical parts of a sentence with java?

Let's say there is a sentence: On March 1, he was born. Changing it to He was born on March 1. doesn't break the sense of the sentence and it is still valid. Shuffling words in any other way would produce weird to invalid sentences. So basically,…
Fluffy
  • 27,504
  • 41
  • 151
  • 234
6
votes
2 answers

Generating easy-to-remember random identifiers

As all developers do, we constantly deal with some kind of identifiers as part of our daily work. Most of the time, it's about bugs or support tickets. Our software, upon detecting a bug, creates a package that has a name formatted from a timestamp…
Carl Seleborg
  • 13,125
  • 11
  • 58
  • 70
6
votes
3 answers

Missing Spanish wordnet from NLTK

I am trying to use the Spanish Wordnet from the Open Multilingual Wordnet in NLTK 3.0, but it seems that it was not downloaded with the 'omw' package. For example, with a code like the following: from nltk.corpus import wordnet as wn print…
papafe
  • 2,959
  • 4
  • 41
  • 72
6
votes
3 answers

are there any c# libraries for Named Entity Recognition?

I am looking for any free libraries for Named Entity Recognition in c# or any other .net language.
Tasawer Khan
  • 5,994
  • 7
  • 46
  • 69