Highest Voted 'text-processing' Questions

6

votes

4 answers

Low performance with BufferedReader

I am processing a number of text files line by line using BufferReader.readlLine(). Two files having same size 130MB but one take 40sec to get processed while other takes 75 sec. I noticed one file has 1.8 million of lines while other has 2.1…

java text-processing readline bufferedreader seek

asked Aug 24 '11 at 16:59

samarth

3,866
7
45
60

6

votes

0 answers

What is the purpose of the Configurations2 directory inside of an ODF-Document?

The directory is not mentioned in the OASIS-Specification of ODF. Does anyone know the purpose of this directory? Its structure is as…

openoffice.org specifications text-processing odf odt

asked Jul 23 '11 at 22:22

AlexTheBird

677
4
16

6

votes

1 answer

Parallel Computation for Create_Matrix 'RTextTools' package

I am creating a DocumentTermMatrix using create_matrix() from RTextTools and create container and model based on that. It is for extremely large datasets. I do this for each category (factor levels). So for each category it has to run matrix,…

r foreach parallel-processing text-processing doparallel

asked Jan 09 '19 at 10:31

Prasanna Nandakumar

4,295
34
63

6

votes

1 answer

I cannot understand the skipgrams() function in keras

I am trying to understand the skipgrams() function in keras by using the following code from keras.preprocessing.text import * from keras.preprocessing.sequence import skipgrams text = "I love money" #My test sentence tokenizer =…

python machine-learning nlp keras text-processing

asked May 15 '18 at 10:06

Raven Cheuk

2,903
4
27
54

6

votes

8 answers

"Absolute" string metric

I have a huge (but finite) set of natural language strings. I need a way to convert each string to a numeric value. For any given string the value must be the same every time. The more "different" two given strings are, the more different two…

algorithm string text-processing hilbert-curve string-metric

asked Jan 30 '09 at 22:46

Alexander Gladysh

39,865
32
103
160

6

votes

3 answers

How to efficiently parse large text files in Ruby

I'm writing an import script that processes a file that has potentially hundreds of thousands of lines (log file). Using a very simple approach (below) took enough time and memory that I felt like it would take out my MBP at any moment, so I killed…

ruby text-processing

asked Jan 30 '11 at 23:43

localshred

2,244
1
21
33

6

votes

2 answers

Can "perl -a" somehow re-join @F using the original whitespace?

My input has a mix of tabs and spaces for readability. I want to modify a field using perl -a, then print out the line in its original form. (The data is from findup, showing me a count of duplicate files and the space they waste.) Input is: 2 *…

perl text-processing

asked Jul 23 '17 at 08:07

piojo

6,351
1
26
36

6

votes

4 answers

Big text file processing

I need to implement lazy loading in Mathematica. I have a 600 Mb CSV text file which I need to process. This file contains a lot of duplicated records: 1;0;0;13;6 1;0;0;13;6 .......... 2;0;0;13;6 2;0;0;13;6 .......... etc. So instead of loading…

import wolfram-mathematica text-processing

asked Nov 26 '10 at 12:00

Max

19,654
13
84
122

6

votes

8 answers

Read line by line and print matches line by line

I am new to shell scripting, it would be great if I can get some help with the question below. I want to read a text file line by line, and print all matched patterns in that line to a line in a new text file. For example: $ cat input.txt SYSTEM…

linux bash shell grep text-processing

asked Dec 09 '16 at 19:03

Dinesh Kumar

105
1
8

6

votes

10 answers

How can I loop through blocks of lines in a file?

I have a text file that looks like this, with blocks of lines separated by blank lines: ID: 1 Name: X FamilyN: Y Age: 20 ID: 2 Name: H FamilyN: F Age: 23 ID: 3 Name: S FamilyN: Y Age: 13 ID: 4 Name: M FamilyN: Z Age: 25 How can I loop through…

python text-processing

asked Oct 12 '10 at 12:06

Adia

1,171
5
16
33

6

votes

4 answers

Parse string into a tree structure?

I'm trying to figure out how to parse a string in this format into a tree like data structure of arbitrary depth. "{{Hello big|Hi|Hey} {world|earth}|{Goodbye|farewell} {planet|rock|globe{.|!}}}" [[["Hello big" "Hi" "Hey"] ["world" "earth"]] …

parsing clojure tree text-processing text-parsing

asked Sep 29 '10 at 22:35

erikcw

10,787
15
58
75

6

votes

8 answers

Efficiently parsing a large text file in C#

I need to read a large space-seperated text file and count the number of instances of each code in the file. Essentially, these are the results of running some experiments hundreds of thousands of times. The system spits out a text file that looks…

c# algorithm parsing text-processing

asked Aug 27 '10 at 11:54

ChrisCa

10,876
22
81
118

6

votes

3 answers

Randomizing text between delimiters

I have this simple input I have {red;green;orange} fruit and cup of {tea;coffee;juice} I use Perl to identify patterns between two external brace delimiters { and }, and randomize the fields inside with the internal delimiter ;. I'm getting this…

perl shell text-processing text-parsing

asked Dec 24 '15 at 13:02

kempinski

63
3

6

votes

6 answers

Fast Text Preprocessing

In my project I work with text in general. I found that preprocessing can be very slow. So I would like to ask you if you know how to optimize my code. The flow is like this: get HTML page -> (To plain text -> stemming -> remove stop words) ->…

c# regex text-processing

asked Jul 29 '10 at 17:29

Ventus

2,482
4
35
41

6

votes

1 answer

Count word frequencies in list-of-lists-of-words

I have this large corpus data in dataframe res (dataframe) text.1 1 …

r nested-lists text-processing word-frequency

asked Apr 09 '15 at 05:38

KRU

291
4
18

Questions tagged [text-processing]