Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches. It is often regarded as the engineering arm of Computational Linguistics.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Beginner books on Natural Language Processing

Popular software packages

20185 questions
6
votes
1 answer

NLP - Embeddings selection of `start` and `end` of sentence tokens

Suppose we're training a neural network model to learn the mapping from the following input to output, where the output is Name Entity (NE). Input: EU rejects German call to boycott British lamb . Output: ORG O MISC O O O MISC O O A sliding window…
GabrielChu
  • 6,026
  • 10
  • 27
  • 42
6
votes
3 answers

Spacy Japanese Tokenizer

I am trying to use Spacy's Japanese tokenizer. import spacy Question= 'すぺいんへ いきました。' nlp(Question.decode('utf8')) I am getting the below error, TypeError: Expected unicode, got spacy.tokens.token.Token Any ideas on how to fix this? Thanks!
AKSHAYAA VAIDYANATHAN
  • 2,715
  • 7
  • 30
  • 51
6
votes
6 answers

Natural language parser for dates (.NET)?

I want to be able to let users enter dates (including recurring dates) using natural language (eg "next friday", "every weekday"). Much like the examples at http://todoist.com/Help/timeInsert I found this post, but it's a bit old and offered only…
Crescent Fresh
  • 115,249
  • 25
  • 154
  • 140
6
votes
1 answer

How do I solve the following error?Input must be a character vector of any length or a list of character vectors, each of which has a length of 1

I am working on a R project. The data set I used is available at the following link https://www.kaggle.com/ranjitha1/hotel-reviews-city-chennai/data The code I have used is. df1 = read.csv("chennai.csv", header = TRUE) library(tidytext) tidy_books…
Varun Raghav B
  • 73
  • 1
  • 1
  • 6
6
votes
1 answer

NLTK was unable to find the java file! for Stanford POS Tagger

I have been stuck trying to get the Stanford POS Tagger to work for a while. From an old SO post I found the following (slightly modified) code: stanford_dir = 'C:/Users/.../stanford-postagger-2017-06-09/' from nltk.tag import…
jss367
  • 4,759
  • 14
  • 54
  • 76
6
votes
2 answers

How to filter tokens from spaCy document

I would like to parse a document using spaCy and apply a token filter so that the final spaCy document does not include the filtered tokens. I know that I can take the sequence of tokens filtered, but I am insterested in having the actual Doc…
Kon Pal
  • 546
  • 1
  • 3
  • 13
6
votes
1 answer

State-of-the-art method for large-scale near-duplicate detection of documents?

To my understanding, the scientific consensus in NLP is that the most effective method for near-duplicate detection in large-scale scientific document collections (more than 1 billion documents) is the one found here:…
Alex
  • 117
  • 8
6
votes
1 answer

Which database can be used to store processed data from NLP engine

I am looking at taking unstructured data in the form of files, processing it and storing it in a database for retrieval. The data will be in natural language and the queries to get information will also be in natural language. Ex: the data could be…
6
votes
1 answer

Automatic labeling of LDA generated topics

I'm trying to categorize customer feedback and I ran an LDA in python and got the following output for 10 topics: (0, u'0.559*"delivery" + 0.124*"area" + 0.018*"mile" + 0.016*"option" + 0.012*"partner" + 0.011*"traffic" + 0.011*"hub" +…
Arman
  • 827
  • 3
  • 14
  • 28
6
votes
1 answer

Load Custom NER Model Stanford CoreNLP

I have created my own NER model with Stanford's "Stanford-NER" software and by following these directions. I am aware that CoreNLP loads three NER models out of the box in the following…
Fraizier Reiland
  • 147
  • 1
  • 11
6
votes
1 answer

Natural language processing library for auto-tagging (.NET)

Dose anyone know of any good libraries out there for .NET that could help pull keywords out of blocks of natural language. I'm basically trying to strip out stop words and ignore tenses, plurals and generally find words that are essentially the…
Ben
  • 1,767
  • 16
  • 32
6
votes
1 answer

Add word embedding to word2vec gensim model

I'm looking for a way to dinamically add pre-trained word vectors to a word2vec gensim model. I have a pre-trained word2vec model in a txt (words and their embedding) and I need to get Word Mover's Distance (for example via…
eardil
  • 86
  • 1
  • 6
6
votes
1 answer

python charmap codec can't decode byte X in position Y character maps to

I'm experimenting with python libraries for data analysis,the problem i'm facing is this exception UnicodeDecodeError was unhandled by user code Message: 'charmap' codec can't decode byte 0x81 in position 165: character maps to < undefined> I…
DayTimeCoder
  • 4,294
  • 5
  • 38
  • 61
6
votes
1 answer

Using predict on new text with kmeans (sklearn)?

I have a very small list of short strings which I want to (1) cluster and (2) use that model to predict which cluster a new string belongs to. Running the first part works fine, getting a prediction for the new string does not. First Part from…
Itay Livni
  • 2,143
  • 24
  • 38
6
votes
1 answer

Tensorflow : ValueError: Shape must be rank 2 but is rank 3

I'm new to tensorflow and I'm trying to update some code for a bidirectional LSTM from an old version of tensorflow to the newest (1.0), but I get this error: Shape must be rank 2 but is rank 3 for 'MatMul_3' (op: 'MatMul') with input shapes:…
D. Clem
  • 85
  • 1
  • 6