-1

I made a WPF application that uses a trie (based on this one) to store a polish dictionary (37.9MB). Creating it from the dictionary.txt takes too much time (30 seconds on my laptop).

I thought that maybe if I created some kind of binary file with a trie already done and load it instead, it would speed things up.

Jecke
  • 239
  • 2
  • 13

1 Answers1

0

You could try saving it as a serialized object either XML or binary. You would need to mark the class's that will be serialized with the attribute. Generic collections are already serializable.

[Serializable]
public class Node
{
...
}

[Serializable]
public class Trie
{
...
}

XML Save

var trie = new Trie();

using (var fs = new System.IO.FileStream("path", FileMode.CreateNew, FileAccess.Write, FileShare.None))
{
    Type objType = typeof (Trie);
    var xmls = new XmlSerializer(objType);

    xmls.Serialize(fs, trie);
}

XML Load

XmlSerializer xmls = new XmlSerializer(typeof(Trie));
using (FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.Read))
    return xmls.Deserialize(fs) as Trie;

Binary Save

var trie = new Trie();
using (var fs = new System.IO.FileStream("path", FileMode.CreateNew, 
FileAccess.Write))
{
    var bf = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
    bf.Serialize(fs, trie);
}

Binary Load

using (var fs = new FileStream("Path", FileMode.Open, FileAccess.Read))
{
        var bf = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
        return bf.Deserialize(fs) as Trie;
}
  • What's the difference between the two? And how (un)portable are these solutions? If I make a release version of my app, will it work on all Windowses (with .NET)? – Jecke Jul 27 '17 at 17:51
  • So, my experience with serialization is that XML is safer between versions of the program, but binary is faster and will most likely result in a much smaller file size. When you serialize a file using binary, it's pretty much tied to a version of the class, the program can be updated but changing the serialized class usually breaks the ability to load the file unless you implement ISerializable and perform all the serialization functions yourself. XML is safer between revisions but would contain a lot of overhead in the save file in the form of formatting. Both are portable. – geoffb-csharpguy Jul 27 '17 at 18:00
  • I tried using binary serialization (I simply copied your code, changing `trie` to my trie with loaded polish dictionary and `"path"` to `"dict.xml"`) and the file I got is quite big: 54.2MB (as opposed to 37.9MB of dictionary.txt). I tried loading it (again, simply copied your code changing `return` to `Trie my_dictionary =`) but it's either stuck or loading for a couple minutes now... Am I doing something wrong or is there another problem? – Jecke Jul 27 '17 at 18:16
  • 2 things, you seem to have mixed your comment between binary and then saved it as xml. So I am going to assume you meant binary. If saving a file as binary, don't use an extension like xml that will break any other system that might think it's an xml file. Use something like dict.dat, or some other non standard extension. I wouldn't expect it to take any longer than it takes your system to read the 54MB, a few seconds at most. Are the save, and read on the same method? Are you sure the writer is closed, and that the read statement is not waiting for the file to be available? – geoffb-csharpguy Jul 27 '17 at 18:30
  • First I saved it, then I rewritten the code to load it. Here's how the function changed: https://pastebin.com/RqGuAeGu to save and https://pastebin.com/iu88sh6y to load. By the way, when I save it as .dat, the size is 266MB. – Jecke Jul 27 '17 at 18:41
  • Changing the file extension in this way should have no bearing on the final size of the file. The program is just dumping the object (as it exists in memory) to the disk. So if the final file is 266MB, that's what your 37MB text file uses in memory after its be converted into your dictionary class. Hypothetical question, what if there is no way to load the file any faster. Is your real problem that you don't like the 30 seconds the program is unresponsive (invisible) after it's opened/launched? We could move the load to another thread, then pass the class to your form after it's loaded. – geoffb-csharpguy Jul 27 '17 at 19:09
  • It's already loading in another thread, so the application is responsive, but since it revolves around this dictionary, there just isn't much that the user can do before if loads. I was hoping that since it takes 30 seconds to create trie from scratch, loading an already-made one would be faster. – Jecke Jul 27 '17 at 19:13
  • The only other thing I can think of is that you are running this method on startup: game_dictionary.AcceptableWord(word). Just create a new file that doesn't contain any of the words that return false and drop this check/statement. – geoffb-csharpguy Jul 27 '17 at 19:15
  • I was running it from public `MainWindow() function`. Like this: `ThreadPool.QueueUserWorkItem(GenerateDictionary, "pack://application:,,,/PolishDictionary.txt");`. But now I created a button to run it and it's still the same. – Jecke Jul 27 '17 at 19:26