10

I want to use ADO Stream to read lines from a local large text file with UTF-8 encoding so I try

Set objStream = CreateObject("ADODB.Stream")
objStream.Charset = "utf-8"
objStream.Type = 2
objStream.Open
objStream.LoadFromFile = strFile
objStream.LineSeparator = 10
Do Until objStream.EOS
    strLine = objStream.ReadText(-2)
Loop

However the result is that the script takes lots of RAM and CPU usages. So is there any way to tell the script not to load all the file contents into memory, but just open it and read until it encounters any line separator?

Teiv
  • 2,605
  • 10
  • 39
  • 48
  • 1
    if you want to start at the first lineseparator you shouldn't loop until EOS (end of stream), I'm also not sure if you should have changed your lineseparator from the default CrLf to Lf like you did. – AardVark71 Mar 01 '13 at 09:04

2 Answers2

14

As you work with Stream object, I think it's obvious, however, .LoadFromFile fill current stream with the whole file content, and no any cutomize option to load parial data from file.

As for reading 1 line, you done this already with .ReadText(-2), (-2 = adReadLine).

Set objStream = CreateObject("ADODB.Stream")
With objStream
    .Charset = "utf-8"
    .Type = 2
    .Open
    'objStream.LoadFromFile = strFile ''I see a typo here
    .LoadFromFile strFile
    .LineSeparator = 10      ''that's Ok
    'Do Until objStream.EOS           ''no need this
    strLine = .ReadText(-2)
    'Loop
    .Close ''add this though!
End with

Set objStream = Nothing

For .LineSeparator you can use just 3 constants:

Constant Value Description

adCRLF   -1    Default. Carriage return line feed 
adLF     10    Line feed only 
adCR     13    Carriage return only 

If you need to break your Do..Loop at other letter, as .ReadText is the only choice for reading text stream, you may use it in conjunction with InStr function and Exit Do then you find your custom separator.

Const cSeparator = "_" 'your custom separator
Dim strLine, strTotal, index
Do Until objStream.EOS
    strLine = objStream.ReadText(-2)
    index = InStr(1, strLine, cSeparator)
    If index <> 0 Then
        strTotal = strTotal & Left(strLine, index-1)
        Exit Do
    Else
        strTotal = strTotal & strLine
    End If
Loop

Shortly, this is the whole optimization you can do (or at least as far as I know).

double-beep
  • 5,031
  • 17
  • 33
  • 41
Panayot Karabakalov
  • 3,109
  • 3
  • 19
  • 28
  • Thanks for your correction however what I need is to loop over each line of the file not getting just first line, and I want to optimize the the code so it won't take large amount of memory. – Teiv Mar 02 '13 at 01:12
  • Ok, probably I misunderstood your question, I'll update my answer then. – Panayot Karabakalov Mar 02 '13 at 14:00
3

If you look at this snippet from J. T. Roff's ADO book, you'll see that in theory you can read from a file line by line (without loading it completely into memory). I tried using the file: protocoll in the source parameter, but did not succeed.

So let's try another approach: To treat the .txt file as a UTF8 encoded trivial (one column) ADO database table, you need a schema.ini file in the source directory:

[linesutf8.txt]
ColNameHeader=False
CharacterSet=65001
Format=TabDelimited
Col1=SampleText CHAR WIDTH 100

Then you can do:

  Dim sTDir  : sTDir   = "M:/lib/kurs0705/testdata"
  Dim sFName : sFName  = "[linesutf8.txt]"
  Dim oDb    : Set oDb = CreateObject("ADODB.Connection")
  Dim sCs    : sCs     = Join(Array( _
          "Provider=MSDASQL" _
        , "Driver={Microsoft Text Driver (*.txt; *.csv)}" _
        , "DBQ=" + sTDir _
  ), ";")
  oDb.open sCs
  WScript.Stdin.Readline
  Dim oRs    : Set oRs = oDb.Execute("SELECT * FROM " & sFName)
  WScript.Stdin.Readline
  Do Until oRS.EOF
     WScript.Echo oRS.Fields(0).Value
     oRs.MoveNext
  Loop
  oRs.Close
  oDb.Close

For some background look here.

Ekkehard.Horner
  • 38,498
  • 2
  • 45
  • 96
  • Very interesting, but seems I cannot test this properly, as I get "Provider cannot be found". Can you elaborate with example file in "C:\lib\linesutf8.txt"? I tried all possible variations for `sTDir`, "C:/lib/linesutf8.txt", "C:/lib", "C:/lib/", "C:\lib\linesutf8.txt", "C:\lib", "C:\lib\" and all fail. Thanks – Panayot Karabakalov Mar 02 '13 at 20:41
  • @PanayotKarabakalov - If you get a "Provider not found" error, then something is wrong with the "Provider=..." property or your ADO install. sTDir is the folder the .txt file (and the schema.ini file) reside in. – Ekkehard.Horner Mar 02 '13 at 20:59
  • @Ekkehard.Horner The path to my text file is c:\test.txt and here was my attempt http://pastebin.com/r4Zm1N4d . I opened the command prompt and ran the script but all I got was "Not enough storage is available to process this command" (80078008) on line 10 which is "WScript.Stdin.Readline" . Do you have any idea on this error? – Teiv Mar 03 '13 at 05:52
  • @PanayotKarabakalov - use "\" for path separator; line 10 is the ".Open" statement; does it 'work' with a small file? – Ekkehard.Horner Mar 03 '13 at 09:38
  • @Ekkehard.Horner - I think your last comment is to user433531, right? ;-) – Panayot Karabakalov Mar 03 '13 at 13:39
  • @PanayotKarabakalov - yes, my fault, sorry! – Ekkehard.Horner Mar 03 '13 at 13:42
  • @user433531 - please see my comment wrongly addressed to Panayot. – Ekkehard.Horner Mar 03 '13 at 13:44
  • @Ekkehard.Horner Thanks but when I had changed the path separator to \ and tried a smaller file (about 100KB), the command prompt just stayed still and wouldn't output anything. – Teiv Mar 03 '13 at 17:57
  • @user433531 - the stop is by intention (ReadLine) to check resources in task manager. – Ekkehard.Horner Mar 03 '13 at 18:12
  • @Ekkehard.Horner - I found solution, the connection to MSDASQL should be done in 32bit environment, i.e. using `C:\Windows\SysWOW64\CScript.exe` Thanks for that idea (+1). And if you don't mind I thought to update my answer with example code based on your idea, with all the credits of course. – Panayot Karabakalov Mar 04 '13 at 02:31
  • @PanayotKarabakalov - Though it goes without saying that you don't need my 'ok' to use/improve an idea (Text driver table) I also got from something I saw long ago - by all means, go ahead. Giving evidence for *not* pre-loading the 10 GB into memory before the output/processing of the first line would be a much appreciated boon. – Ekkehard.Horner Mar 04 '13 at 13:08