-1
for (int i = 0; i < newText.Count; i++)
{
    for (int x = 0; x < WordsList.words.Length; x++)
    {
        lineToPost = newText[i];
        if (!lineToPost.Contains(WordsList.words[x]))
        {
            newText.Remove(lineToPost);
        }
    }
}

words is a type array string[] newText is List

I want to remove from newText lines that not contain any word from words. In a new class this is how i built the array words:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ScrollLabelTest
{
    class WordsList
    {
        public static string[] words = 
           { "testing", "world", "ראשוני", "new", "hello", "test" };
    }
}

This is how the newText List look like:

enter image description here

The first line is a text line then a line with date and time then empty line/space. Then again text line date and time line and empty line/space.

What i want to do is to keep this newText format and remove any text line/s that are not contain any word from words.

Tried to do it like this:

newText.Remove(lineToPost);

But that will remove any line that is not with any word. I want to remove only text line/s that not contain any word. The way im doing now will give after few itertions exception out of index...on: lineToPost = newText[i]; since it's removing any line from the List.

The main goal:

  1. To remove/filter any text line that any of the words not exist in this line.

  2. To keep the List newText format as it is in the original (screenshot ).

In the screenshot:

Index 0 is text line Index 1 is a date time line Index 2 is empty/space line

If in the line in index 0 the line not contain any of the words remove this line and remove also line index 1 and index 2.

Next if in index 3 none of the words exist remove index 3 4 and 5.

In the end the format of newText should be the same as it is in the screenshot. Just without the lines that didnt have the words.

st4hoo
  • 2,196
  • 17
  • 25
user3681442
  • 297
  • 3
  • 15
  • "If in the line in index 0 the line not contain any of the words remove this line and remove also line index 1 and index 2." - Does "remove" mean remove element from list OR make list element equal to empty string? – st4hoo Jun 12 '14 at 07:02

3 Answers3

1

You can use regex in your loop to find out lines which does not contain any word from the word list. For this, you can use expression like ^((?!word).)*$.

Refer this post for further explanation.

Community
  • 1
  • 1
Anil Soman
  • 2,443
  • 7
  • 40
  • 64
1

Here is an example solution with MoreLinq Batch extension method usage:

var filtered = 
    newText.Batch(3)   //creates batches(each batch contains 3 following lines)
    .Where(b => WordsList.words.Any(w => b.First().Contains(w))) //filters out batches with first line that is not acceptable
    .SelectMany(b => b) //flattens list of batches to list of string
    .ToList();
st4hoo
  • 2,196
  • 17
  • 25
1

When you are iterating a list by index and removing items from that list, you need to adjust your index every time you remove an item. For example, if you have list with 10 items in it and you remove an item at index 3, the indexes and length of the list have changed. The list will now have .Count = 9 instead of 10, and what was at index 4 is now at index 3, what was at index 5 is now at 4, and so forth.

You can also take advantage of linq to find the intersections between words in the line and the words you are searching for. The code below should do what you want. NOTE: the parsing of a line to an array of words is weak and is for illustrative purposes only. If you had a in a line containing the characters "hello?", this code would not find that as match.

string[] words = { "testing", "world", "ראשוני", "new", "hello", "test" };

List<string> newText = new List<string>() {         
    "This line dosent match",
    "date1",
    "",
    "This line does match with the word: hello",
    "date2",
    "",
    "This line dosent match either",
    "date1",
    "",
    "This line does match with the word: world",
    "date4",
    "",
};

int i = 0;

while (i < newText.Count)
{            
    // Get an array of words
    string[] lineWords = newText[i].Split(' ');

    if (lineWords.Intersect(words).Count() == 0)
    {
        // This line has no matching words. Remove 3 lines.
        for (int n = 0; n < 3; n++)
            newText.RemoveAt(i);
    }
    else
    {
        // This line has matching words. Move forward 3 lines.
        i += 3;
    }
}

foreach (string line in newText)
    Console.WriteLine(line);

Fiddle showing the code working: https://dotnetfiddle.net/jBWsYq

Mike Hixson
  • 5,071
  • 1
  • 19
  • 24