0

I wrote a program to count words in a sentence that have x number of matching characters. The input string is a multi line string and I need to consider only alternate lines based on given criteria. Also in those filtered lines, I need to further select alternate words and then check if those filtered words match the characters for intersection.

E.g. let's say I have a input string as below and need to find words with 2 or more vowels in them:

string myString = "1.In order to get a Disneyland ticket
2.that includes the new Star Wars land
3.you must stay at a Disney hotel 
4.the night of or night before your visit. 
5.Day passes without a hotel 
6.reservation will be available after June 23";

now let's say I need to count every 3rd word in every 2nd line and count if these filtered words have 2 or more vowels in it. If this condition is matched, return the matching word count and total line count containing these matching words.

E.g. with criteria of selecting every 2nd line, filtered lines will be {2, 4, 6}. Similarly every 3rd word in these filtered line will be line 2: {"the", "Wars"}, for line 4: {"of", "before"} and for line 6: {"be", "after"}.

For these filtered words, matching words with 2 or more vowels will be {"before"} in line number 4 and word {"after"} in line 6. So the final output will be wordCount = 2 and since these words are from line 4 and 6 so total lineCount = 2.

I wrote the below code using nested for loops that produces the desired output.

public static void Main(string[] args)
    {
        int vowelCount = 2; // match words with 2 or more vowels
        int skipWord = 3; // Consider every 3rd word only
        int skipLine = 2; // Consider every 2nd line only
        int wordCount = 0;
        int lineCount = 0;

        string myString = @"1.In order to get a Disneyland ticket
2.that includes the new Star Wars land
3.you must stay at a Disney hotel 
4.the night of or night before your visit. 
5.Day passes without a hotel 
6.reservation will be available after June 23";";

        List<string> myList = myString.Split(Environment.NewLine).ToList();
        List<string> lineWords = new List<string>();
        char[] vowels = {'a', 'e', 'i', 'o', 'u'};

        for (int i = skipLine; i <= myList.Count; i += skipLine)
        {
            int origWordCount = wordCount;
            lineWords = myList[i - 1].Split(' ').ToList();
            for (int j = skipWord; j <= lineWords.Count; j += skipWord)
            {
                char[] wordArr = lineWords[j-1].ToLower().ToCharArray();
                int match = vowels.Intersect(wordArr).Count();
                if (match >= vowelCount)
                    wordCount++;                 
            }
            if (wordCount > origWordCount)
                lineCount++;
        }

        Console.WriteLine("WordCount : {0}, LineCount : {1}", wordCount, lineCount);

Above code works great but wondering if there is a way to do it without nested loops. I read about linq and lambda expressions but not sure how to use them here.

All comments are appreciated.

Swets
  • 3
  • 2

2 Answers2

1

First filter the lines using a 'predicate' in the 'where' clause like this to get every 2nd line:

List<string> lines = myString.Split(Environment.NewLine).Where((l,index) => index % 2 != 0).ToList();

Then you can get the results as below:

foreach (var line in lines)
{
    // Get every 3rd word in the line
    var thirdWords = line.Split(' ').Where((w,index) => index % 3 == 2).ToList();

    // Get words with 2 or more vowels in it. 
    // Have you tested words that have same vowel twice?
    var matchWords = thirdWords.Where(w => w.Intersect(vowels).Count() >= vowelCount).ToList();

    //if words with vowels found, update 'wordCount' and 'lineCount' 
    if (matchWords.Any()) {
        wordCount = wordCount + matchWords.Count;
        lineCount++;
    }
}
Console.WriteLine("WordCount : {0}, LineCount : {1}", wordCount, lineCount);
Ruviru
  • 111
  • 6
0

Is it wise to put everything in one big LINQ query? This will make it more difficult to understand your code, test it, implement changes or reuse parts of it.

Consider writing separate LINQ-like functions that do only parts of your code. This is called an extension method. See extension methods demystified

I need to count every 3rd word in every 2nd line and count if these filtered words have 2 or more vowels in it. If this condition is matched, return the matching word count and total line count containing these matching words.

So create functions:

  • Input: a string; Output: the string divided in a sequence of lines
  • Input: a sequence of Lines; Output: sequence of every 2nd line (or every n-th line)
  • Input: a Line; output: words in the line
  • Input: a sequence of words; output: the 3rd word in the sequence (or every n-th word)
  • Input: a word; output: the number of vowels in the word

Most of them are pretty easy:

IEnumerable<string> ToLines(this text)
{
    // TODO: check text not null
    using (var reader = new StringReader(text))
    {
        var line = reader.ReadLine();
        while (line != null)
        {
            yield return line;
            line = reader.ReadLine();
        }
    }
}

IEnumerable<string> KeepEvery2ndLine(this IEnumerable<string> lines)
{
    // TODO: check lines not null
    int lineNr = 0;
    foreach (var line in lines
    {
        ++lineNr;
        if (lineNr%2 == 0)
            yield return line;
    }
}

IEnumerable<string> ToWords(this string line)
{
    // TODO: check line not null
    // when is a word a word? do you need to consider tabs? semicolons?
    // is a number like 12 a word?
}

Advise: use Regex to separate the line into words. See how to split string in words
By the way: if "12" is a word, how many vowels does it have: zero or two? ("twelve" has two vowels)

I won't write all the functions. You'll get the gist.

Because you split your task up in smaller tasks, it is fairly easy to understand what each function should do, it is easy to implement, easy to test, easy to change.

Once you've got them, your query is fairly simple:

var result = inputText.ToLines()
                      .KeepEvery2ndLine()
                      .ToWords()
                      .KeepEvery3rdWord()
                      .CountVowels();
Harald Coppoolse
  • 28,834
  • 7
  • 67
  • 116