Questions tagged [text-parsing]

Text parsing is a variation of parsing which refers to the action of breaking a stream of text into different components, and capturing the relationship between those components.

When the stream of text is arbitrary, parsing is often used to mean breaking the stream into constituent atoms (words or lexemes).

When the stream of text corresponds to natural language, parsing is used to mean breaking the stream into natural language elements (words and punctuation) and discovering the structure of the text as phrases or sentences.

When the string of text corresponds to a computer source language (or other formal language), parsing consists of applying any of a variety of parsing algorithms (ad hoc, recursive descent, LL, LR, Packrat, Earley or other) to the source text (often broken into lexemes by another lower level parser called a "lexer") to verify the validity of the source language, and often to construct a parse tree representing the grammar productions used to tile the text.

1268 questions
12
votes
3 answers

How do I tokenize this string in Ruby?

I have this string: %{Children^10 Health "sanitation management"^5} And I want to convert it to tokenize this into an array of hashes: [{:keywords=>"children", :boost=>10}, {:keywords=>"health", :boost=>nil}, {:keywords=>"sanitation management",…
Radamanthus
  • 690
  • 2
  • 5
  • 17
12
votes
2 answers

Why is there no std::from_string()?

Why is there no template T std::from_string(const std::string& s); in the C++ standard? (Seeing how there's an std::to_string() function, I mean.) PS - If you have an idea for the reason this was not adopted/considered, just…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
12
votes
8 answers

Expand array of numbers and hyphenated number ranges to array of integers

I'm trying to normalize/expand/hydrate/translate a string of numbers as well as hyphen-separated numbers (as range expressions) so that it becomes an array of integer values. Sample input: $array = ["1","2","5-10","15-20"]; should become : $array =…
muffin
  • 2,034
  • 10
  • 43
  • 79
12
votes
3 answers

Making links clickable in Javascript?

Is there an simple way of turning a string from Then go to http:/example.com/ and foo the bar! into Then go to example.com and foo the bar! in Javascript within an existing HTML page?
max
  • 29,122
  • 12
  • 52
  • 79
12
votes
4 answers

Should I use cut or awk to extract fields and field substrings?

I have a file with pipe-separated fields. I want to print a subset of field 1 and all of field 2: cat tmpfile.txt # 10 chars.|variable length num|text ABCDEFGHIJ|99|U|HOMEWORK JIDVESDFXW|8|C|CHORES DDFEXFEWEW|73|B|AFTER-HOURS I'd like the output to…
user3486154
  • 121
  • 1
  • 1
  • 3
12
votes
1 answer

How can I parse a string to a function in Haskell?

I want a function that looks something like this readFunc :: String -> (Float -> Float) which operates something like this >(readFunc "sin") (pi/2) >1.0 >(readFunc "(+2)") 3.0 >5.0 >(readFunc "(\x -> if x > 5.0 then 5.0 else x)")…
user2407038
  • 14,400
  • 3
  • 29
  • 42
12
votes
13 answers

Code Golf: Quickly Build List of Keywords from Text, Including # of Instances

I've already worked out this solution for myself with PHP, but I'm curious how it could be done differently - better even. The two languages I'm primarily interested in are PHP and Javascript, but I'd be interested in seeing how quickly this could…
Sampson
  • 265,109
  • 74
  • 539
  • 565
11
votes
2 answers

Splitting large text file by a delimiter in Python

I imaging this is going to be a simple task but I can't find what I am looking for exactly in previous StackOverflow questions to here goes... I have large text files in a proprietry format that look comething like this: :Entry - Name John Doe -…
Kevin
  • 1,113
  • 3
  • 12
  • 26
11
votes
2 answers

Split & Trim in a single step

In PS 5.0 I can split and trim a string in a single line, like this $string = 'One, Two, Three' $array = ($string.Split(',')).Trim() But that fails in PS 2.0. I can of course do a foreach to trim each item, or replace ', ' with ',' before doing the…
Gordon
  • 6,257
  • 6
  • 36
  • 89
11
votes
2 answers

Which Perl modules for good for data munging?

Nine years ago when I started to parsing HTML and free text with Perl I read the classic Data Munging with Perl. Does someone know if David is planning to update the book or if there are similar books or web pages where the new parsing modules like…
11
votes
17 answers

Convert "1d2h3m" to ["day" => 1, ”hour” => 2,"minutes"=>3]

I am trying to parse a time expression string into an associative array with full-word keys. My input: $time = "1d2h3m"; My desired output: array( "day" => 1, "hour" => 2, "minutes" => 3 ) I have tried to extract the numbers with…
Jack jdeoel
  • 4,554
  • 5
  • 26
  • 52
10
votes
1 answer

Chunking with rule-based grammar in spacy

I have this simple example of chunking in nltk. My data: data = 'The little yellow dog will then walk to the Starbucks, where he will introduce them to Michael.' ...pre-processing ... data_tok = nltk.word_tokenize(data) #tokenisation data_pos =…
ben_aaron
  • 1,504
  • 2
  • 19
  • 39
9
votes
6 answers

Extract floating point numbers from a delimited string in PHP

I would like to convert a string of delimited dimension values into floating numbers. For example 152.15 x 12.34 x 11mm into 152.15, 12.34 and 11 and store in an array such that: $dim[0] = 152.15; $dim[1] = 12.34; $dim[2] = 11; I would also need…
Tian Bo
  • 551
  • 3
  • 7
  • 12
9
votes
8 answers

Populate array of integers from a comma-separated string of numbers and hyphenated number ranges

I want to translate/hydrate/expand/parse a comma-separated string of integers and hyhenated integer range expressions and populate an array with its equivalent values as individual integers elements. Input strings might look like the…
Matthew Higgins
  • 588
  • 1
  • 9
  • 22
9
votes
2 answers

Parsing ASCII file efficiently in Haskell

I wanted to reimplement some of my ASCII parsers in Haskell since I thought I could gain some speed. However, even a simple "grep and count" is much slower than a sloppy Python implementation. Can someone explain me why and how to do it…
tamasgal
  • 24,826
  • 18
  • 96
  • 135
1 2
3
84 85