Questions tagged [text-processing]

Mechanizing the creation or manipulation of electronic text.

Text processing includes basic processing jobs using filter, tokenization or normalization method to process text. This could be a pre-processing step for .

See also:

1959 questions
0
votes
2 answers

Remove special character at the beginning of words in Unix

I need help removing special characters from the beginning of the word in a Unix shell. For example I have the list of words like this, 'aaa 'bbb 'ccc 'ddd I want to remove the quotes and get output like this, aaa bbb ccc ddd How can I remove…
Narmatha
  • 3
  • 1
0
votes
0 answers

What is the best approach for text processing in AWS?

This is kind of a "big data with Amazon web services" question: consider a massive set of txt files (all with the same content format inside: [title;body;author]). I want to store them in AWS and be able to search a substring in the whole set. What…
0
votes
2 answers

Text Cleaning and Combining

I am trying to combine a variable date with a piece of text so that it looks as follows: time <- c(end_date_override="20180531") This is my code: bb <- Sys.Date()-1 b1 <- paste("c(end_date_override","",sep = "=") b1<-noquote(b1) b2 <-…
asathe1
  • 31
  • 4
0
votes
2 answers

Term Matching using SQL

I would like to split the text in the database and see whether the all the term that I searched is in the text. For example, "this is a cat" is the text in the database. If I search for "a cat" or "is cat" it should return the data but it shouldn't…
Chit Khine
  • 830
  • 1
  • 13
  • 34
0
votes
1 answer

How to remove repeated lines but not sorting (keep the initial order)

I have this input: Host: ping.chartbeat.net Host: extra.test.co userblablabla Host: extra.test.co Host: extra.test.co Host: extra.test.co Host: extra.test.co Host: extra.test.co Host: extra.test.co Host: extra.test.co Host: extra.test.co Host:…
aDoN
  • 1,877
  • 4
  • 39
  • 55
0
votes
1 answer

How to get dependency information about a word?

I have already successfully parsed sentences to get dependency information using stanford parser (version 3.9.1(run it in IDE Eclipse)) with command "TypedDependencies", but how could I get depnedency information about a single word( it's parent,…
user8420001
  • 25
  • 1
  • 4
0
votes
4 answers

Regular Expression - Applied to a Text File

I have a text file with the following structure: KEYWORD0 DataKey01-DataValue01 DataKey02-DataValue02 ... DataKey0N-DataValue0N KEYWORD1 DataKey11-DataValue11 DataKey12-DataValue12 …
Imaginativeone
  • 107
  • 1
  • 2
  • 8
0
votes
0 answers

How can I tokenize a text efficiently?

Given a text (T) and a dictionary (D), how can I find all words that occur in the text? A1. One can assume that there are just few repetitions of characters in T, for example, the T is in Chinese. A2. Iterating over the D, as one may suspect, is…
Imago
  • 521
  • 6
  • 29
0
votes
1 answer

Why won't my program recgonize all names from a text file?

I'm a CS210 student and I am having trouble checking a text file for multiple names to pull data from it. I'm given the following data to store in a text file. It is a name, followed by 11 integers to read data from: -Sally 0 0 0 0 0 0 0 0 0 0 886…
dat_cube
  • 13
  • 1
0
votes
0 answers

Typerror : 'list' object is not callable which the input is from .txt Python 3

I am trying to create a function that would remove stop-words inside a document. But instead, I got an error like this : TypeError: 'list' object is not callable Here is the code: import re def stopwords(text): reg = re.compile(r"\n") …
Willze Fortner
  • 141
  • 2
  • 2
  • 12
0
votes
1 answer

How can I process Persian texts using Rapid Miner?

I am working on a persian classification project. Persian texts is very similar to arabic texts. when I use Tokenize, it does not show any word in its wordlist page and in Example Set Page, The Image below will be shown: I need to classify persian…
0
votes
1 answer

Remove a string found in column 1 from column 2

I have a very large excel file (150000 rows). For each row I have a string in column 1 that I need to find and remove from column 2. Input column 1 Input column 2 Output…
chives
  • 9
  • 1
0
votes
2 answers

Extract text inside brackets and store in dictionary

I am trying to separate all the functions within square brackets and store them in a dictionary. However, the output strips the closing bracket from all the outputs except the last one. import re line="[f(x,y),g(y,z),f1(x1,y1)]" matches =…
Raj
  • 23
  • 4
0
votes
1 answer

awk: change a field's value conditionally based on the value of another column

I have a table snp150Common.txt, where the second and third fields $2 and $3 can be equal or not. If they are equal, I want $2 to become $2-1, so that: chr1 10177 10177 rs367896724 - - -/C insertion near-gene-5 chr1 10352 10352 …
0
votes
1 answer

How to remove the empty documents from the Document Term Matrix in R

I have got empty documents in my document term matrix. I need to remove them. This is code that I used to build the DocumentTermMatrix: tweets_dtm_tfidf <- DocumentTermMatrix(tweet_corpus, control = list(weighting = weightTfIdf)) And this the…
AdeeThyag
  • 125
  • 3
  • 15