2

SCENARIO: Given 2 input strings I need to find minimum number of insertions deletions and substitutions required to convert one string to other. The strings are text from 2 files. The comparison has to be done at word level.

What i have done is implemented edit distance algorithm which does the job well using a 2-dimensional array of size (m*n) where sizes of input strings are m and n.

The PROBLEM i am facing is if the value of m and n becomes large, say more than 16,000 i am getting OutOfMemory exception due to the large size of m*n array. Also i am running into memory fragmentation and LargeObjectHeap issues

QUESTION Looking for a C# code to solve edit distance problem for 2 very large sized strings (each containing more than 20k words) without getting OutOfMemory exception.

MapReduce or DataBase or MemoryMappedFile related solutions not feasible. Only a pure C# code will work.

uzair_syed
  • 313
  • 3
  • 16
  • If `Int32` is array item's type, it requires 1.5G Not that much – Dmitry Bychenko Mar 11 '16 at 13:13
  • Do you have `String` and edit `Char` (e.g. *DNA*) or `List` (e.g. *Text*) and edit words? – Dmitry Bychenko Mar 11 '16 at 13:18
  • I'm not sure whether that breaks the comparison algorithm, but is it feasible to split the strings and compare piece by piece? So that the comparison array can be garbage-collected when you proceed to the next piece. – Antoine Mar 11 '16 at 13:19
  • On a 4GB 32bit system it is throwing OutOfMemory exception. Since it is an array we need continuous memory. In DOTNET huge objects are allocated on LargeObjectHeaps which gives rise to memory fragmentation problems. – uzair_syed Mar 11 '16 at 13:21
  • breaking into pieces will give wrong output. I have already tried breaking into pieces. – uzair_syed Mar 11 '16 at 13:23
  • so you should change your algorithm to be able to work with large strings. i guess you should show your code and explain the algorithm that you currently have so people can think on it and give solution – M.kazem Akhgary Mar 11 '16 at 13:24
  • i have list of strings each string is a word in file – uzair_syed Mar 11 '16 at 13:24

0 Answers0