0

I am going to make a sinhala english dictionary. SO i have a file that contains sinhala meaning for every english word. So i thought to load it while form is loading. So i added following command to get all file content to a string variable. SO i used following command in FormLoad method,

private string DictionaryWords = "";

private string ss = null;

...

private void Form1_Load(object sender, EventArgs e)
{
    this.BackColor = ColorTranslator.FromHtml("#AFC3E0");

    string fileName = @"SI-utf8.Txt";

    using (StreamReader sr = File.OpenText(fileName))
    {
        while ((ss = sr.ReadLine()) != null)
        {
            DictionaryWords += ss;
        }
    }
}

But unfortunately that txt file has 130000+ line and it size it more than 5MB. SO my winform not loading.

see the image

enter image description here

I need to load this faster for winform to use REGEX form getting right meaning for every english word.. Could anybody tell me a method to do this. I tried everything.

Load this huge file to my project within 15 more less and need to use Regex for finding each english words..

Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
  • What does "not loading" mean? – shingo Feb 14 '23 at 12:49
  • 1
    What is `DictionaryWords`, please? `DictionaryWords += ss;` looks very suspicious. – Dmitry Bychenko Feb 14 '23 at 12:50
  • Yet another possibility is to start *async Task* on `Form1_Load`, then whenever you want to obtain the dictionary you can `await` the previously ran task – Dmitry Bychenko Feb 14 '23 at 12:52
  • 1
    What is the expected *regular expression* then? Please, note, that combining all the words into a single regex is a bad idea. – Dmitry Bychenko Feb 14 '23 at 12:54
  • From the looks of it "DictionaryWords" is a string? If it is, I highly recommend changing it to an actual dictionary. Strings are immutable in c#, this means that every time you add something to a string, a new string will be created. You are essentially reading 130k strings, and then for every string, you diplicate the current string. The more, the slower it gets. TL;DR: Change to dictionary or List – Joost00719 Feb 14 '23 at 13:16
  • [edit] the question and include the definition for `DictionaryWords`. I suspect the issue here isn't the loading of the text file, which isn't that big, but the processing of the contents that is the problem here. – John Alexiou Feb 14 '23 at 14:13
  • @DmitryBychenko dictionarywords means string variable which is store content of txt file – Ryan Gabriel Feb 14 '23 at 14:41
  • @Ryan Gabriel: so we know the culprit (`DictionaryWords += ss`) let's get rid of him: please, see my edit. Please, not that the felon can have an assistent which can be a *regular expression* – Dmitry Bychenko Feb 14 '23 at 15:11

1 Answers1

2

Well, there are too little code to analyze. I suspect that

DictionaryWords += ss;

is the felon: appending string 130000 times which means re-creating quite long string over and over again can well put the system on the knees, but I have not rigorous proof (I've asked about DictionaryWords in the comment). Another possible candidate to be blamed is the unknown for me your regular expression.

That's why let me try to solve the problem from scratch.

  • We a have a (long) dictionary in SI-utf8.Txt.
  • We should load the dictionary without freezing the UI.
  • We should use the dictionary loaded to translate the English texts.

I have got something like this:

using System.IO;
using System.Linq;
using System.Threading.Tasks;

...

// Loading dictionary (async, since dictionary can be quite long)
// static: we want just one dictionary for all the instances
private static readonly Task<IReadOnlyDictionary<string, string>> s_Dictionary = 
  Task<IReadOnlyDictionary<string, string>>.Run(() => {
    char[] delimiters = { ' ', '\t' };

    IReadOnlyDictionary<string, string> result = File
      .ReadLines(@"SI-utf8.Txt")
      .Where(line => !string.IsNullOrWhiteSpace(line))
      .Select(line => line.Split(delimiters, StringSplitOptions.RemoveEmptyEntries))
      .Where(items => items.Length == 2)
      .ToDictionary(items => items[0], 
                    items => items[1], 
                    StringComparer.OrdinalIgnoreCase);

    return result;
  });

Then we need a translation part:

// Let it be the simplest regex: English letters and apostrophes;
// you can improve it if you like
private static readonly Regex s_EnglishWords = new Regex("[A-Za-z']+");

// Tanslation is async, since we have to wait for dictionary to be loaded
private static async Task<string> Translate(string englishText) {
  if (string.IsNullOrWhiteSpace(englishText))
    return englishText;

  var dictionary = await s_Dictionary;

  return s_EnglishWords.Replace(englishText,
    match => dictionary.TryGetValue(match.Value, out var translation) 
      ? translation   // if we know the translation
      : match.Value); // if we don't know the translation
}

Usage:

// Note, that button event should be async as well
private async void button1_Click(object sender, EventArgs e) {
  TranslationTextBox.Text = await Translate(OriginalTextBox.Text);
}

Edit: So, DictionaryWords is a string and thus

DictionaryWords += ss;

is a felon. Please, don't append string in a (deep) loop: each append re-creates the string which is slow. If you insist on the looping, use StringBuilder:

// Let's pre-allocate a buffer for 6 million chars
StringBuilder sb = new StringBuilder(6 * 1024 * 1024);

using (StreamReader sr = File.OpenText(fileName))
{
    while ((ss = sr.ReadLine()) != null)
    {
        sb.Append(ss);
    }
}

DictionaryWords = sb.ToString();            

Or, why should you loop at all? Let .net do the work for you:

DictionaryWords = File.ReadAllText(@"SI-utf8.Txt");

Edit 2: If actual file size is not that huge (it is DictionaryWords += ss; alone who spoils the fun) you can stick to a simple synchronous solution:

private static readonly Regex s_EnglishWords = new Regex("[A-Za-z']+");

private static readonly IReadOnlyDictionary<string, string> s_Dictionary = File
  .ReadLines(@"SI-utf8.Txt")
  .Where(line => !string.IsNullOrWhiteSpace(line))
  .Select(line => line.Split(new char[] { ' ', '\t' },
     StringSplitOptions.RemoveEmptyEntries))
  .Where(items => items.Length == 2)
  .ToDictionary(items => items[0], 
                items => items[1], 
                StringComparer.OrdinalIgnoreCase);

private static string Translate(string englishText) {
  if (string.IsNullOrWhiteSpace(englishText))
    return englishText;

  return s_EnglishWords.Replace(englishText,
    match => s_Dictionary.TryGetValue(match.Value, out var translation) 
      ? translation 
      : match.Value);
}

An then the usage is quite simple:

// Note, that button event should be async as well
private void button1_Click(object sender, EventArgs e) {
  TranslationTextBox.Text = Translate(OriginalTextBox.Text);
}
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
  • 1
    This is the most correct solution here. For simplicity's sake can you provide a non-async version of this, assuming the actual reading of the text file is fast enough for the user not to notice really. – John Alexiou Feb 14 '23 at 15:13
  • 1
    wow... Mr Dmitry Bychenko.. Sir you saved my time ,, Thanks a lot. your method is working. – Ryan Gabriel Feb 14 '23 at 17:41