-4

I have been trying to develop a regex to match a block's argument, and then all the instances of that argument.

Using this example:

File.open(inFile).each do |line|
  line.chomp!
    if line.empty? then
      next
    elsif line =~ /^>/
      line.slice!(/>/)
      names.push(line)
    elsif line !~ /^>/
      seqs.push(line)
    end             
end 

I would like to match the word between the pipes, line, and then all instances of line.

Matching the argument is simple:

(?<=\|)(\w*?)(?=\|)

But I am really unsure how to use this match as a pattern for the rest of the document.

Any thoughts on how to proceed are welcome.

(Edit 2: I am now not concerned with limiting the scope of the regex to the block. I would like to match all instances in the whole document. Please consider re-examining this simpler question.)

(Edit: I am trying to incorporate this regex into a tmLanguage file for textmate/sublime. This way, the argument and all instances are the same color. I am sure there is a way to construct a plugin to do this, but I haven't tried yet, short of looking how the sublime plugin bracketHighlighter works.)

AGS
  • 14,288
  • 5
  • 52
  • 67
  • 1
    You need to provide some test data and an expected output to get a good answer to this question. – ian Feb 24 '13 at 22:29
  • Expected output is clearly stated in my original question. – AGS Feb 24 '13 at 22:32
  • 1
    I don't see any, hence the request. – ian Feb 24 '13 at 22:45
  • Seriously? "I would like to match the word between the pipes, `line`, and then all instances of `line` within the block," – AGS Feb 24 '13 at 22:51
  • 2
    That's not an _example_ of the output, that's a _description_ of the output. Seriously. – ian Feb 24 '13 at 22:52
  • I think you are purposefully being obtuse. Both the description and example are the same thing. A regex match. – AGS Feb 24 '13 at 22:55
  • A regular expression _match_ is the part of the _input_ data matched by applying a regex _pattern_. So there are 3 things, the _input_, the _pattern_, and the _output_ (i.e. match). In the comment above, you gave a written _description_ of the thing to _match_ i.e. a portion of the (ungiven) _input_. You've also provided a pattern (`(?<=\|)(\w*?)(?=\|)`) and unless this is the _input_ or the resulting _match_ then you haven't provided the _example(s)_ I requested. -1. – ian Feb 24 '13 at 23:06

2 Answers2

2

Don't do that. Seriously. It won't work reliably, and you'll spend more time proofreading and fixing the results than you would making the changes by hand.

glenn mcdonald
  • 15,290
  • 3
  • 35
  • 40
  • I agree, it's not ideal. I'll edit my post to explain my rationale for trying. Thanks glenn. – AGS Feb 24 '13 at 22:28
  • 1
    Think about all the contexts in which `line` could appear. Now all the places `end` could appear, never mind potential `{}` pairs. You're trying to write a Ruby language parser inside a regular expression. Unless you have some vastly simplifying assumptions you can make due to your particular situation, it's hopeless. – glenn mcdonald Feb 24 '13 at 22:33
  • Thanks for the comments. Please consider looking at the edited question, which now just looks to match the argument across the whole document. – AGS Feb 24 '13 at 23:04
  • But the "whole document" will presumably have lots of other `|x|`s. As might even a single block. So I don't think you've really simplified things. – glenn mcdonald Feb 25 '13 at 00:34
0

If you look inside the Ruby language bundle and see which names have already been assigned matches, you'll find the first part has already been done:

name: variable.other.block.ruby
match: [_a-zA-Z][_a-zA-Z0-9]*

You can use that name to refer to block local vars, probably as source.ruby.variable.other.block.ruby. I'm not sure how you'll refer to the different matches of a multi argument list.

I'll keep the downvote on the question because it's incredibly unclear that you're trying to parse the language itself, rather than the input from the file.

ian
  • 12,003
  • 9
  • 51
  • 107