I’m just so close, but my program is still not working properly. I am trying to count how many times a set of words appear in a text file, list those words and their individual count and then give a sum of all the found matched words.
If there are 3 instances of “lorem”, 2 instances of “ipsum”, then the total should be 5. My sample text file is simply a paragraph of “Lorem ipsum” repeated a few times in a text file.
My problem is that this code I have so far, only counts the first occurrence of each word, even though each word is repeated several times throughout the text file.
I am using a “pay for” parser called “GroupDocs.Parser” that I added through the NuGet package manager. I would prefer not to use a paid for version if possible.
Is there an easier way to do this in C#?
Here’s a screen shot of my desired results.
Here is the full code that I have so far.
using GroupDocs.Parser;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace ConsoleApp5
{
class Program
{
static void Main(string[] args)
{
using (Parser parser = new Parser(@"E:\testdata\loremIpsum.txt"))
{
// Extract a text into the reader
using (TextReader reader = parser.GetText())
{
// Define the search terms.
string[] wordsToMatch = { "Lorem", "ipsum", "amet" };
Dictionary<string, int> stats = new Dictionary<string, int>();
string text = reader.ReadToEnd();
char[] chars = { ' ', '.', ',', ';', ':', '?', '\n', '\r' };
// split words
string[] words = text.Split(chars);
int minWordLength = 2;// to count words having more than 2 characters
// iterate over the word collection to count occurrences
foreach (string word in wordsToMatch)
{
string w = word.Trim().ToLower();
if (w.Length > minWordLength)
{
if (!stats.ContainsKey(w))
{
// add new word to collection
stats.Add(w, 1);
}
else
{
// update word occurrence count
stats[w] += 1;
}
}
}
// order the collection by word count
var orderedStats = stats.OrderByDescending(x => x.Value);
// print occurrence of each word
foreach (var pair in orderedStats)
{
Console.WriteLine("Total occurrences of {0}: {1}", pair.Key, pair.Value);
}
// print total word count
Console.WriteLine("Total word count: {0}", stats.Count);
Console.ReadKey();
}
}
}
}
}
Any suggestions on what I'm doing wrong?
Thanks in advance.