Questions tagged [text-processing]

Mechanizing the creation or manipulation of electronic text.

Text processing includes basic processing jobs using filter, tokenization or normalization method to process text. This could be a pre-processing step for .

See also:

1959 questions
70
votes
7 answers

How does uʍop-ǝpᴉsdn text work?

Here's a website I found that will produce upside down versions of any English text. how does it work? does unicode have upside down chars? Or what? How can I write my own text flipping function?
flybywire
  • 261,858
  • 191
  • 397
  • 503
61
votes
8 answers

Expanding English language contractions in Python

The English language has a couple of contractions. For instance: you've -> you have he's -> he is These can sometimes cause headache when you are doing natural language processing. Is there a Python library, which can expand these contractions?
Maarten
  • 4,549
  • 4
  • 31
  • 36
57
votes
3 answers

Text Summarization Evaluation - BLEU vs ROUGE

With the results of two different summary systems (sys1 and sys2) and the same reference summaries, I evaluated them with both BLEU and ROUGE. The problem is: All ROUGE scores of sys1 was higher than sys2 (ROUGE-1, ROUGE-2, ROUGE-3, ROUGE-4,…
Chelsea_cole
  • 1,055
  • 3
  • 15
  • 21
54
votes
4 answers

How to find text files not containing text on Linux?

How do I find files not containing some text on Linux? Basically I'm looking for the inverse of the following find . -print | xargs grep -iL "somestring"
user481572
52
votes
5 answers

Add text to file at certain line in Linux

I want to add a specific line, lets say avatar to the files that starts with MakeFile and avatar should be added to the 15th line in the file. This is how to add text to files: echo 'avatar' >> MakeFile.websvc and this is how to add text to files…
user2123459
  • 523
  • 1
  • 4
  • 6
45
votes
5 answers

Algorithms to detect phrases and keywords from text

I have around 100 megabytes of text, without any markup, divided to approximately 10,000 entries. I would like to automatically generate a 'tag' list. The problem is that there are word groups (i.e. phrases) that only make sense when they are…
Kimvais
  • 38,306
  • 16
  • 108
  • 142
43
votes
4 answers

Running a macro till the end of text file in Emacs

I have a text file with some sample content as shown here: Sno = 1p Sno = 2p Sno = 3p What i want is to remove the p from each of the columns. With this intention i write a macro: M-x //go to buffer C-x (//start the macro C-s = // search for…
whatf
  • 6,378
  • 14
  • 49
  • 78
39
votes
7 answers

How can I delete all lines that do not begin with certain characters?

I need to figure out a regular expression to delete all lines that do not begin with either "+" or "-". I want to print a paper copy of a large diff file, but it shows 5 or so lines before and after the actual diff.
mager
  • 4,813
  • 8
  • 29
  • 30
35
votes
12 answers

How do I convert multi public key into a single line?

I'm trying to make a txt file with a generated key into 1 line. example: <----- key start -----> lkdjasdjskdjaskdjasdkj skdhfjlkdfjlkdsfjsdlfk kldshfjlsdhjfksdhfksdj jdhsfkjsdhfksdjfhskdfh jhdfkjsdhfkjsdhfkjsdhf <----- key stop -----> I want it to…
john
  • 1,330
  • 3
  • 20
  • 34
35
votes
7 answers

What are the available tools to summarize or simplify text?

Is there any library, preferably in python but at least open source, that can summarize and or simplify natural-language text?
captainandcoke
  • 1,085
  • 2
  • 13
  • 16
33
votes
8 answers

How to get Git log with short stat in one line?

Following command outputs following lines of text on console git log --pretty=format:"%h;%ai;%s" --shortstat ed6e0ab;2014-01-07 16:32:39 +0530;Foo 3 files changed, 14 insertions(+), 13 deletions(-) cdfbb10;2014-01-07 14:59:48 +0530;Bar 1 file…
Ankush
  • 2,454
  • 2
  • 21
  • 27
32
votes
10 answers

How to delete all blank lines in the file with the help of python?

For example, we have some file like that: first line second line third line And in result we have to get: first line second line third line Use ONLY python
user285070
  • 761
  • 2
  • 12
  • 21
30
votes
1 answer

Measuring width of text (Python/PIL)

I'm using the following two methods to calculate a sample string's rendered width for a set font-type and size: font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 14) sample = "Lorem ipsum dolor sit amet, partem periculis…
Hassan Baig
  • 15,055
  • 27
  • 102
  • 205
29
votes
7 answers

How to add double quotes to a line with SED or AWK?

I have the following list of words: name,id,3 I need to have it double quoted like this: "name,id,3" I have tried sed 's/.*/\"&\"/g' and got: "name,id,3 Which has only one double quote and is missing the closing double quote. I've also tried awk…
minerals
  • 1,195
  • 4
  • 15
  • 22
27
votes
3 answers

NLTK for Named Entity Recognition

I am trying to use NLTK toolkit to get extract place, date and time from text messages. I just installed the toolkit on my machine and I wrote this quick snippet to test it out: sentence = "Let's meet tomorrow at 9 pm"; tokens =…