6

I would like to be able to search a string for various words, when I find one, i want to split the string at that point into 3 parts (left, match, right), the matched text would be excluded, and the process would continue with the new string left+right.

Now, once i have all my matches done, i need to reverse the process by reinserting the matched words (or a replacement for them) at the point they were removed. I have never really found what i wanted in any of my searches, so I thought I would ask for input here on SO.

Please let me know if this question needs further description.

BTW - at the moment, i have a very poor algorithm that replaces matched text with a unique string token, and then replaces the tokens with the replacement text for the appropriate match after all the matches have been done.

This is the goal:

one two three four five six 

match "three" replace with foo (remember we found three, and where we found it)

one two four five six
       |
     three

match "two four" and prevent it from being matched by anything (edited for clarity)

one five six
   |
 two four 
       |
     three

at this point, you cannot match for example "one two"

all the matches have been found, now put their replacements back in (in reverse order)

one two four five six
       |
     three


one two foo four five six

What's the point? Preventing one match's replacement text from being matched by another pattern. (all the patterns are run at the same time and in the same order for every string that is processed)

I'm not sure the language matters, but I'm using Lua in this case.

I'll try rephrasing, I have a list of patterns i want to find in a given string, if I find one, I want to remove that part of the string so it isnt matched by anything else, but I want to keep track of where i found it so I can insert the replacement text there once I am done trying to match my list of patterns

Here's a related question:

Shell script - search and replace text in multiple files using a list of strings

Community
  • 1
  • 1
sylvanaar
  • 8,096
  • 37
  • 59
  • 2
    So after the algorithm is done, the string is just as you left it? Why do you need to remove the strings in the first place? What are you *doing* with the results of this? There may be an easier solution. Please post what language you are using. – Brian Schroth Oct 30 '09 at 19:19
  • What exactly do you mean by continue with left+right? Say the original text was "abcdefgh", and your two 'words' are "cd" and "bef", would you first split into "ab"-"cd"-"efgh", and then search in "abefgh", and find "bef", and split into "a"-"bef"-"gh" and then continue with "agh", and not find anything? – Lasse V. Karlsen Oct 30 '09 at 19:23
  • Ok, ill improve the question with a diagram – sylvanaar Oct 30 '09 at 19:26
  • How do you know whether you get "one bar foo five six" or "one foo bar five six" in your example? Is there an unambiguous rule as to where matches inside other matches go? – jprete Oct 30 '09 at 19:50
  • How is the search term matching going to proceed? The term "two four" skips a word in the original input. (I think I may have to delete my original answer... lol) – Jon Seigel Oct 30 '09 at 19:53
  • @jprete you are right - i didnt give one, but lets say its 'keep the original position unless a replacement overwrites it, in which case insert after the replacement text' – sylvanaar Oct 30 '09 at 20:27
  • i simplified the example and removed the replacement in the string which already had a matched section. – sylvanaar Oct 30 '09 at 20:35

3 Answers3

3

Your algorithm description is unclear. There's no exact rule where the extracted tokens should be re-inserted.

Here's an example:

  1. Find 'three' in 'one two three four five six'
  2. Choose one of these two to get 'foo bar' as result:

    a. replace 'one two' with 'foo' and 'four five six' with 'bar'

    b. replace 'one two four five six' with 'foo bar'

  3. Insert 'three' back in the step 2 resulting string 'foo bar'

At step 3 does 'three' goes before 'bar' or after it?

Once you've come up with clear rules for reinserting, you can easily implement the algorithm as a recursive method or as an iterative method with a replacements stack.

Franci Penov
  • 74,861
  • 18
  • 132
  • 169
1

Given the structure of the problem, I'd probably try an algorithm based on a binary tree.

Jon Seigel
  • 12,251
  • 8
  • 58
  • 92
  • My answer was posted based on the original edition of the question... I'd still like to solve the problem, but what I've written so far may not be the best way to do it (as no one seems to fully understand the problem yet). – Jon Seigel Nov 02 '09 at 12:29
0

pseudocode:

for( String snippet in snippets )
{
    int location = indexOf(snippet,inputData);
    if( location != -1)
    {
        // store replacement text for a found snippet on a stack along with the
        // location where it was found
        lengthChange = getReplacementFor(snippet).length - snippet.length;
        for each replacement in foundStack
        {
            // IF the location part of the pair is greater than the location just found
            //Increment the location part of the pair by the lengthChange to account
            // for the fact that when you replace a string with a new one the location
            // of all subsequent strings will be shifted 
        }

        //remove snippet
        inputData.replace(snippet, "");
    }
}

for( pair in foundStack )
{
    inputData.insert( pair.text, pair.location);
}

This is basically just doing exactly as you said in your problem description. Step through the algorithm, putting everything on a stack with the location it was found at. You use a stack so when you reinsert in the second half, it happens in reverse order so that the stored "location" applies to the current state of the inputString.

Edited with a potential fix for commenter's criticism. Does the commented for block within the first one account for your criticisms, or is it still buggy in certain scenarios?

Brian Schroth
  • 2,447
  • 1
  • 15
  • 26
  • Except as a result of subsequent replacements location can be outside of the string. Or it could be in the middle of a replacement string. – Franci Penov Oct 30 '09 at 19:57
  • I edited with a potential solution that might address your criticism. Do you think it would work? – Brian Schroth Oct 30 '09 at 20:32
  • Even if this does work, on further consideration I think it would be better to do this recursively. – Brian Schroth Oct 30 '09 at 20:35
  • I tend to agree - recursion seems like a good way to solve this assuming the number of matches stays small. – sylvanaar Oct 30 '09 at 22:07
  • Well I tried to do it recursively and couldn't find a good way around the same problem of later replacements screwing up the insert positions of earlier replacements, so I'm inclined to go back to this, my original approach. My initial back of the envelope testing found it worked, at least. – Brian Schroth Nov 02 '09 at 16:27