2

If I have a search criterion: She likes to watch tv

Input file text.txt containing some sentences, e.g.:

I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.

I want to search for string within the text file, and return the sentence that contains the string, plus the sentence before and after it.

The output should looks like this:

She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault.

So, it outputs the sentence before the matching search word, the sentence containing the search word, and the sentence after the search word.

Jeffrey Kemp
  • 59,135
  • 14
  • 106
  • 158
icebox19
  • 493
  • 3
  • 6
  • 15
  • 6
    [What have you tried?](http://whathaveyoutried.com) – Cole Tobin Jun 10 '12 at 17:08
  • well, the ideea is that I don't know how to start. I just found here how to delimitate a text with . or ! or ? Actually I wanna delimitate the phrases by ". ? !". I think I know how to save that phrase, but not the phrase before and after. – icebox19 Jun 10 '12 at 17:10
  • What do you mean by "a frase after and before it"? Do you want the entire frase where this sentence is or the frases that suround id? – Ricardo Souza Jun 10 '12 at 17:15

4 Answers4

3

How about something like this:

    string @in = @"I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";
    string phrase = @"She likes to watch tv";


    int startIndex = @in.IndexOf(phrase);
    int endIndex = startIndex + phrase.Length;
    int tmpIndex;

    tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
    if (tmpIndex > -1)
    {
        startIndex = tmpIndex + 1;
        tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
        if (tmpIndex > -1)
        {
            startIndex = tmpIndex + 1;
            tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
            if (tmpIndex > -1)
            {
                startIndex = tmpIndex;
            }
        }
    }

    tmpIndex = @in.IndexOf(".", endIndex);
    if (tmpIndex > -1)
    {
        endIndex = tmpIndex + 1;
        tmpIndex = @in.IndexOf(".", endIndex);
        if (tmpIndex > -1)
        {
            endIndex = tmpIndex + 1;
        }
    }

    Console.WriteLine(@in.Substring(startIndex, endIndex - startIndex).Trim());

I'm assuming that the phrases you're looking for are delimited by '.'. This code works by finding the index of the phrase and looking behind the match for the previous phrase, and also looking ahead of the phrase for the sentence which follows.

Ian Newson
  • 7,679
  • 2
  • 47
  • 80
  • WOW! Thanks for this buddy. You saved me. Still have to figure out what you wrote there step by step, but it working! Thanks again – icebox19 Jun 10 '12 at 17:33
  • This is actually my first post ... can't vote up, but I guess I can accept. – icebox19 Jun 10 '12 at 17:37
3

One way is presented here:

string content = @"I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";

string input = @"She likes to watch tv";
string curPhrase = string.Empty, prevPhrase = string.Empty, nextPhrase = string.Empty;

char[] delim = new char[] { '.' };
string[] phrases = content.Split(delim, StringSplitOptions.RemoveEmptyEntries);

for(int i=0; i<phrases.Length; i++){
    if(phrases[i].IndexOf(input) != -1){
        curPhrase = phrases[i];
        prevPhrase = phrases[i - 1];
        if (phrases[i + 1] != null)
            nextPhrase = phrases[i + 1];

        break;
    }
}

It first splits the entire text at period ., stores them in an array then after searching the input string in array takes out the current, previous and next phrases.

Software Engineer
  • 3,906
  • 1
  • 26
  • 35
  • I get this errors :Error 1 'System.Array' does not contain a definition for 'length' and no extension method 'length' accepting a first argument of type 'System.Array' could be found (are you missing a using directive or an assembly reference?) Error 2 'string' does not contain a definition for 'indexOf' and no extension method 'indexOf' accepting a first argument of type 'string' could be found (are you missing a using directive or an assembly reference?) – icebox19 Jun 10 '12 at 18:05
  • @icebox19 He just need to start the method names with uppercase letters. – Ricardo Souza Jun 10 '12 at 18:08
  • Thanks! I understood this concept, and I like it too. Thanks to everybody! – icebox19 Jun 10 '12 at 18:11
2

Use String.IndexOf() (docs) which will return the first occurence of the string within the file. Using this value you can then remove the containing phrase or sentence:

int index = paragraph.IndexOf("She likes to watch tv")

then you would use index to set the boundaries and split (perhaps using capital letters and full stops in a regular expression), to pull out the sentences either side.

m.edmondson
  • 30,382
  • 27
  • 123
  • 206
2

You can use Regex to grab the text:

string text = "I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";

string target = "She likes to watch tv";

string result = Regex.Replace(text, "(?:.*?\\.\\s)?((?:[^.]*?)" + target + "[^.]*?\\.)(?:.*)", "$1");

//result = "She likes to watch tv but really don't know what to say."

Reference: http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace(v=vs.90).aspx

Ricardo Souza
  • 16,030
  • 6
  • 37
  • 69
  • thanks for the answer, but I want the "string phrase" and the one before and after it. Thanks – icebox19 Jun 10 '12 at 17:46
  • // result="She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault." In this case – icebox19 Jun 10 '12 at 17:47
  • and ... I get this error "Error 1 Unrecognized escape sequence " – icebox19 Jun 10 '12 at 17:51
  • This is why I've asked you what did you mean. – Ricardo Souza Jun 10 '12 at 17:51
  • sorry ... just saw now the comment, I am new here :) – icebox19 Jun 10 '12 at 17:52
  • Just forgot to put the extra slashes \ on the string C# requires. Edited now. – Ricardo Souza Jun 10 '12 at 17:52
  • hmmm ... I added "System.Text.RegularExpressions" too, but I get another error now "Error 1 An object reference is required for the non-static field, method, or property 'System.Text.RegularExpressions.Regex.Replace(string, string)' " – icebox19 Jun 10 '12 at 17:55
  • Ouch... just forgot to add the text to replace. Edited again. I'm doint it from memory, so forgive me for this. – Ricardo Souza Jun 10 '12 at 17:59
  • thanks! I love this method. But one more thing (if you can) to add the before phrase and after one. Would really appreciate :) – icebox19 Jun 10 '12 at 18:00
  • I think regular expressions are an overkill in this case. Take the Coder's answer or Ian Newson's answer. I prefer Coder's. – Ricardo Souza Jun 10 '12 at 18:02