1

I am trying to download files to disk from squeak. My method worked fine for small text/html files, but due to lack of buffering, it was very slow for the large binary file https://mirror.racket-lang.org/installers/6.12/racket-6.12-x86_64-win32.exe. Also, after it finished, the file was much larger (113 MB) than shown on download page (75MB).

My code looks like this:

download: anURL 
    "download a file over HTTP and save it to disk under a name extracted from url."
    | ios name |
    name := ((anURL findTokens: '/') removeLast findTokens: '?') removeFirst.
    ios := FileStream oldFileNamed: name.
    ios  nextPutAll: ((HTTPClient httpGetDocument: anURL) content).
    ios close.
    Transcript show: 'done'; cr.

I have tried [bytes = stream next bufSize. bytes printTo: ios] for fixed size blocks in HTTP response's contentStream using a [stream atEnd] whileFalse: loop, but that garbled the output file with single quotes around each block, and also extra content after the blocks, which looked like all characters of the stream, each single quoted.

How can I implement buffered writing of an HTTP response to a disk file? Also, is there a way to do this in squeak while showing download progress?

pii_ke
  • 2,811
  • 2
  • 20
  • 30
  • For the size mismatch, have you tried sending `#binary` to the `FileStream` before storing binary contents on it? – Leandro Caniglia Mar 21 '18 at 12:56
  • @Leandro I had thought of doing that. But I did not have the time to test it. Squeak had frozen as soon as the download started for well over five minutes on my computer. I was unsure about other things about code too, so I decided to do that after I learn how to write HTTP response to disk chunk by chunk. – pii_ke Mar 21 '18 at 13:21

2 Answers2

2

As Leandro already wrote the issue is with #binary.

Your code is nearly correct, I have taken the liberty to run it - now it downloads the whole file correctly:

| ios name anURL |
anURL := ' https://mirror.racket-lang.org/installers/6.12/racket-6.12-x86_64-win32.exe'.
name := ((anURL findTokens: '/') removeLast findTokens: '?') removeFirst.
ios := FileStream newFileNamed: 'C:\Users\user\Downloads\_squeak\', name.
ios binary.
ios  nextPutAll: ((HTTPClient httpGetDocument: anURL) content).
ios close.
Transcript show: 'done'; cr.

As for the freezing, I think the issue is with the one thread for the whole environment while you are downloading. That means that means till you download the whole file you won't be able to use Squeak.

Just tested in Pharo (easier install) and the following code works as you want:

ZnClient new
  url: 'https://mirror.racket-lang.org/installers/6.12/racket-6.12-x86_64-win32.exe';
  downloadTo: 'C:\Users\user\Downloads\_squeak'.
tukan
  • 17,050
  • 1
  • 20
  • 48
  • Is there a way to write incoming HTTP chunks to disk as soon as they reach squeak? So that I can see the size of disk file changing while the download occurs. Currently, this downloads the whole file for few minutes, during which I can see high network traffic, following by high cpu usage, when it is processing the response, or writing to disk. – pii_ke Mar 26 '18 at 15:02
  • @pii_ke:@pii_ke: I think the best would be to use *Zinc HTTP Components* at http://zn.stfx.eu/zn/index.html. Squeak 4.2+ is not 100% but most of it works. – tukan Mar 26 '18 at 15:19
1

The WebResponse class, when building the response content, creates a buffer large enough to hold the entire response, even for huge responses! I think this happens due to code in WebMessage>>#getContentWithProgress:.

I tried to copy data from the input SocketStream of WebResponse directly to an output FileStream. I had to subclass WebClient and WebResponse, and write a two methods. Now the following code works as required.

| client link |
client := PkWebClient new.
link := 'http://localhost:8000/racket-6.12-x86_64-linux.sh'.
client download: link toFile: '/home/yo/test'.

I have verified block by block update and integrity of the downloaded file.

I include source below. The method streamContentDirectToFile: aFilePathString is the one that does things differently and solves the problem.

WebClient subclass: #PkWebClient
    instanceVariableNames: ''
    classVariableNames: ''
    poolDictionaries: ''
    category: 'PK'!
!PkWebClient commentStamp: 'pk 3/28/2018 20:16' prior: 0!
Trying to download http directly to file.!


!PkWebClient methodsFor: 'as yet unclassified' stamp: 'pk 3/29/2018 13:29'!
download: urlString toFile: aFilePathString 
    "Try to download large files sensibly"
    | res |
    res := self httpGet: urlString.
    res := PkWebResponse new copySameFrom: res.
    res streamContentDirectToFile: aFilePathString! !


WebResponse subclass: #PkWebResponse
    instanceVariableNames: ''
    classVariableNames: ''
    poolDictionaries: ''
    category: 'PK'!
!PkWebResponse commentStamp: 'pk 3/28/2018 20:49' prior: 0!
To make getContentwithProgress better.!
]style[(38)f1!


!PkWebResponse methodsFor: 'as yet unclassified' stamp: 'pk 3/29/2018 13:20'!
streamContentDirectToFile: aFilePathString 
    "stream response's content directly to file."
    | buffer ostream |
    stream binary.
    buffer := ByteArray new: 4096.
    ostream := FileStream oldFileNamed: aFilePathString.
    ostream binary.
    [stream atEnd]
        whileFalse: [buffer := stream nextInBuffer: 4096.
            stream receiveAvailableData.
            ostream nextPutAll: buffer].
    stream close.
    ostream close! !
pii_ke
  • 2,811
  • 2
  • 20
  • 30
  • Out of curiosity, did you measure the difference between different buffer sizes? The performance difference between your implementation and Zinc? Did you just wanted to re-implemented or did you have any other reason to do so? – tukan Apr 10 '18 at 10:21