2

I need to detect URLs efficiently in an input stream during typesetting.

The URL detector will be part of the typesetting flow. It should accept one character at a time as input and should output one character at a time along with the URL the character belongs to. It can buffer text for lookahead in order to do this.

For example if the input stream is "Hello http://foo.com World", the output should be:

"H": "" 
"e": "" 
"l": "" 
"l": "" 
"o": "" 
" ": "" 
"h": "http://foo.com" 
"t": "http://foo.com" 
"t": "http://foo.com" 
"p": "http://foo.com" 
":": "http://foo.com" 
"/": "http://foo.com" 
"/": "http://foo.com" 
"f": "http://foo.com" 
"o": "http://foo.com" 
"o": "http://foo.com" 
".": "http://foo.com" 
"c": "http://foo.com" 
"o": "http://foo.com" 
"m": "http://foo.com" 
" ": "" 
"W": "" 
"o": "" 
"r": "" 
"l": "" 
"d": ""

Can Ragel be made to stream the input and output as needed?

Incidentally, There is a (Java) ragel URL parser here, which I'm thinking of using as a starting point.

bright
  • 4,700
  • 1
  • 34
  • 59

0 Answers0