0

Sorry for such a vague title, I really dont know what to title this issue. Basically when I get a stream thats chunked as told by Transfer-Encoding, I then do the following code:

private IEnumerable<byte[]> ReceiveMessageBodyChunked() {
    readChunk:
    #region Read a line from the Stream which should be a Block Length (Chunk Body Length)
    string blockLength = _receiverHelper.ReadLine();
    #endregion
    #region If the end of the block is reached, re-read from the stream
    if (blockLength == Http.NewLine) {
        goto readChunk;
    }
    #endregion
    #region Trim it so it should end up with JUST the number
    blockLength = blockLength.Trim(' ', '\r', '\n');
    #endregion
    #region If the end of the message body is reached
    if (blockLength == string.Empty) {
        yield break;
    }
    #endregion
    int blockLengthInt = 0;
    #region Convert the Block Length String to an Int32 base16 (hex)
    try {
        blockLengthInt = Convert.ToInt32(blockLength, 16);
    } catch (Exception ex) {
        if (ex is FormatException || ex is OverflowException) {
            throw new Exception(string.Format(ExceptionValues.HttpException_WrongChunkedBlockLength, blockLength), ex);
        }
        throw;
    }
    #endregion
    // If the end of the message body is reached.
    if (blockLengthInt == 0) {
        yield break;
    }
    byte[] buffer = new byte[blockLengthInt];
    int totalBytesRead = 0;
    while (totalBytesRead != blockLengthInt) {
        int length = blockLengthInt - totalBytesRead;
        int bytesRead = _receiverHelper.HasData ? _receiverHelper.Read(buffer, 0, length) : _request.ClientStream.Read(buffer, 0, length);
        if (bytesRead == 0) {
            WaitData();
            continue;
        }
        totalBytesRead += bytesRead;
        System.Windows.Forms.MessageBox.Show("Chunk Length: " + blockLengthInt + "\nBytes Read/Total:" + bytesRead + "/" + totalBytesRead + "\n\n" + Encoding.ASCII.GetString(buffer));
        yield return buffer;
    }
    goto readChunk;
}

What this is doing is reading 1 line of data from the stream which should be the Chunk's Length, does some checks here and there but eventually converts that to a Int32 Radix16 integer.

From there it essentially creates a byte buffer of that int32 as its length size.

It then just keeps reading from the stream until its read the same amount as the Int32 we converted.

This works splendid, however, for whatever reason, its responding incorrectly on the last read.

It will read the exact amount of bytes as the chunk length perfectly fine, and all data I expect is read. BUT it's ALSO reading again another small chunk of data that was ALREADY read at the very end, resulting in lets say all data from <!DOCTYPE html> down to </html> ASWELL as some data from inside somewhere like <form> e.t.c

Here's an example of what occured:

enter image description here

As you can see, the highlighted red text should NOT have been returned from the read! It should have ended at </html>. Why is the Chunk's Length lying to me and how can I find the proper size to read at?

Ma Dude
  • 477
  • 1
  • 5
  • 17
  • If anyone wants to test this and knows how to manually get a stream setup for this test, test against any website that ALWAYS requires a cloudflare block check. Cloudflares block is always a Transfer-Encoded Chunked response – Ma Dude Sep 09 '18 at 18:41
  • *"...test against any website that ALWAYS requires a cloudflare block check..."* - so you expect everybody who wants to help you to first find such a site? Or could you provide an actual URL where you see this error? – Steffen Ullrich Sep 09 '18 at 19:11
  • @SteffenUllrich torrentleech.org – Ma Dude Sep 09 '18 at 19:18
  • Whats odd AF is that if I hand the request to another TCPStream proxied (127.0.0.1:8888 as a proxy which is fiddler) it works perfectly fine... – Ma Dude Sep 09 '18 at 19:33

1 Answers1

0

I'm not familiar with C# but if I understand your code and the semantics of Read in C# correctly (which seem to be similar to read in C) then the problem is that you are using the same buffer again and again without resetting it first:

byte[] buffer = new byte[blockLengthInt];
int totalBytesRead = 0;
while (totalBytesRead != blockLengthInt) {
    int length = blockLengthInt - totalBytesRead;
    int bytesRead = _receiverHelper.HasData ? _receiverHelper.Read(buffer, 0, length) : _request.ClientStream.Read(buffer, 0, length);
    ...
    totalBytesRead += bytesRead;
    ...
    yield return buffer;
}

To make some example of what goes wrong here: assume that the chunk size is 10, the content you read is 0123456789 and the first read will return 6 bytes and the second read the remaining 4 bytes. In this case your buffer will be 012345 after the first read and 567845 after the second read. These 45 at the end of the buffer remain from the previous read since you only replaced the first 4 bytes in the buffer but kept the rest.

Whats odd AF is that if I hand the request to another TCPStream proxied (127.0.0.1:8888 as a proxy which is fiddler) it works perfectly fine...

Fiddler is a proxy and might change how the response gets transferred. For example it might use Content-length instead of chunked encoding or it might use smaller chunks so that you always get the full chunk with the first read.

Steffen Ullrich
  • 114,247
  • 10
  • 131
  • 172
  • OHHH After reading the example under my snippet I actually understand what you mean! – Ma Dude Sep 09 '18 at 20:25
  • If I understand the reason for my issue correctly, I should just move the buffer declaration to inside the while() loop that way its reset ever read? – Ma Dude Sep 09 '18 at 20:27
  • Ok really odd, but when im foreaching the yielded returns, and appending them to a list and then converting with Encoding.ASCII.GetString() of the byte[]'s for whatever reason when I mbox it to see the text data, it only shows the first yield returned content yet it shows that it has a proper length. WHAT? – Ma Dude Sep 09 '18 at 20:43
  • If I convert to string and mbox it as its looping in the foreach, it shows that its going through all yields, yet after the foreach when I get all the list content and show it, it fails, its as if the .read put a weird broken character at the end causing it to completely break stuff – Ma Dude Sep 09 '18 at 20:44
  • In Notepad++ with show all chars enabled, to see CR/LF etc, all I see is CR/LF nothing else (apart from the actual text of course). So maybe CR/LF is weird with MBOX? But its odd too cause if I clipboard.settext it, it does the same thing, cuts off at the end of the first read – Ma Dude Sep 09 '18 at 20:49