5

I am trying to filter a List of strings based on the number of words in each string. I am assuming that you would trim any white-space at the ends of the string, and then count the number of spaces left in the string, so that WordCount = NumberOfSpaces + 1. Is that the most efficient way to do this? I know that for filtering based on character count the following is working fine...just cant figure out how to write it succinctly using C#/LINQ.

if (checkBox_MinMaxChars.Checked)
{
    int minChar = int.Parse(numeric_MinChars.Text);
    int maxChar = int.Parse(numeric_MaxChars.Text);

    myList = myList.Where(x => 
                              x.Length >= minChar && 
                              x.Length <= maxChar).ToList();
}

Any ideas of for counting words?

UPDATE: This Worked like a charm...Thanks Mathew:

int minWords = int.Parse(numeric_MinWords.Text);
int maxWords = int.Parse(numeric_MaxWords.Text);

sortBox1 = sortBox1.Where(x => x.Trim().Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Count() >= minWords &&
                               x.Trim().Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Count() <= maxWords).ToList();
Jeagr
  • 1,038
  • 6
  • 16
  • 30

4 Answers4

8

I would approach it in a more simplified manner since you have indicated that a space can be used reliably as a delimiter like so:

var str = "     the string to split and count        ";
var wordCount = str.Trim().Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Count();

EDIT:

If optimal perforamnce is necessary and memory usage is a concern you could write your own method and leverage IndexOf() (although there are many avenues for implementation on a problem like this, I just prefer reuse rather than from-scratch code design):

    public int WordCount(string s) {
        const int DONE = -1;
        var wordCount = 0;
        var index = 0;
        var str = s.Trim();
        while (index != DONE) {
            wordCount++;
            index = str.IndexOf(" ", index + 1);
        }
        return wordCount;
    }
Matthew Cox
  • 13,566
  • 9
  • 54
  • 72
  • int minWords = int.Parse(numeric_MinWords.Text); int maxWords = int.Parse(numeric_MaxWords.Text); sortBox1 = sortBox1.Where(x => x.Trim().Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Count() >= minWords && x.Trim().Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Count() <= maxWords).ToList(); – Jeagr Dec 19 '12 at 08:05
  • Is a method like that more efficient than using LINQ queries? – Jeagr Dec 19 '12 at 08:09
  • Generally speaking, LINQ has more overhead than something like this. You have to keep in mind that behind the scenes, LINQ is just take your predicate functions and using them in a glorified foreach loop. How much performance gain you would see ... is very dependent upon the problem. – Matthew Cox Dec 19 '12 at 08:12
  • http://stackoverflow.com/questions/1185209/do-linq-queries-have-a-lot-of-overhead – Matthew Cox Dec 19 '12 at 08:13
3

You approach to counting words is ok. String.Split will give similar result for more memory usage.

Than just implement your int WordCount(string text) function and pass it to Where:

myList.Where(s => WordCount(s) > minWordCount)
Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
1

how about splitting the string to an array using space and counting that?

s.Split().Count()

removed the space :)

ufosnowcat
  • 558
  • 2
  • 13
1

You want all strings with word-count in a given range?

int minCount = 10;
int maxCount = 15;
IEnumerable<string> result = list
    .Select(String => new { String, Words = String.Split() })
    .Where(x => x.Words.Length >= minCount
             && x.Words.Length <= maxCount)
    .Select(x => x.String);
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939