How to read or write huge Unicode files?

Question

I need to read huge Unicode files into my program and convert to ANSI for parsing and for some files, store them again as Univode while others should be in ANSI code page.

As I have understood it, simple read/write don't support Unicode text, and for the biggest files (some maybe as big as 300 Mb or even bigger) using twidestring.loadfromfile is out of question both because memory usage and time to load.

I have been wondering if loading blocks could be a path of solution, but as I know, it doesn't support Unicode BOM ?

Any suggetions?

Try to create a procedure that spit that file into smaller parts, and after indexing these parts you can read each one. Eventually these chunks try to write them in memory. — Mihai8, Mar 16 '13 at 14:03
Just read the file one bit at a time. Process each part and move on to the next. — David Heffernan, Mar 16 '13 at 16:02
Why are you using a non-unicode Delphi version 2006? Move to a unicode delphi version, if you care about unicode. Next after you load it, what do you plan to do? Display only in TNT components? Waste of time and effort. — Warren P, Mar 16 '13 at 21:43

score 0 · Answer 1 · answered Mar 16 '13 at 16:59

0

There is an excellent and very fast text reader in the german 'Delphi Forum'. It uses memory mapped files.

You will probably be able to modify it to read Unicode text files. However, you might have to test the BOM yourself.

answered Mar 16 '13 at 16:59

alzaimar

4,572
1
16
30

score 0 · Answer 2 · answered Mar 17 '13 at 15:16

In Delphi you can also use memory mapped files.

The primary benefit of memory mapping a file is increasing I/O performance, especially when used on large files. ... A possible benefit of memory-mapped files is a "lazy loading", thus using small amounts of RAM even for a very large file.

Memory-mapped file. (2013, February 26). In Wikipedia, The Free Encyclopedia. Retrieved 15:14, March 17, 2013, from http://en.wikipedia.org/w/index.php?title=Memory-mapped_file&oldid=540609840

How to read or write huge Unicode files?

2 Answers2