0

I'm working on a split downloader for c#. It is downloading fine (so the logic is working) but the problem is that whatever file it downloads it corrupts. I have no idea on how to fix it. Here's the code:

private void mergeClean()
    {
        const int chunkSize = 1 * 1024; // 2KB
        using (var output = File.Create("output.jpg"))
        {
            foreach (var file in Files)
            {
                using (var input = File.OpenRead(file))
                {
                    var buffer = new byte[chunkSize];
                    int bytesRead;
                    while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
                    {
                        output.Write(buffer, 0, bytesRead);
                    }
                }
            }
        }

        foreach (var file in Files)
        {
            File.Delete(file);
        }
    }

    private void SaveFileStream(String path, Stream stream)
    {
        var fileStream = new FileStream(path, FileMode.Create, FileAccess.Write);
        stream.CopyTo(fileStream);
        fileStream.Dispose();
    }

    public void SplitDownload(string URL)
    {
        System.Net.WebRequest req = System.Net.HttpWebRequest.Create(URL);
        req.Method = "HEAD";
        System.Net.WebResponse resp = req.GetResponse();
        var responseLength = double.Parse(resp.Headers.Get("Content-Length"));
        var partSize = Math.Ceiling(responseLength / 10);
        var previous = 0;

        for (int i = (int)partSize; i <= responseLength; i = i + (int)partSize)
        {
            Thread t = new Thread(() => Download(URL, previous, i));
            t.Start();
            t.Join();
            previous = i;
        }

        mergeClean();
    }

    private void Download(string URL, int Start, int End)
    {
        Console.WriteLine(String.Format("{0},{1}", Start, End));

        HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(URL);
        myHttpWebRequest.AddRange(Start, End);
        HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
        Stream streamResponse = myHttpWebResponse.GetResponseStream();
        String name = GenerateTempName();
        SaveFileStream(name, streamResponse);
        Files.Add(name);
    }

Here is an example of what it does:

Original Downloaded

UPDATED CODE:

static string GenerateTempName(int start)
    {
        String name = String.Format("{0:D6}.tmp", start);
        return name;
    }

    static public List<string> Files = new List<string>();

    static private void mergeClean()
    {
        Files.Sort();
        const int chunkSize = 1 * 1024; // 2KB
        using (var output = File.Create("output.jpg"))
        {
            foreach (var file in Files)
            {
                using (var input = File.OpenRead(file))
                {
                    var buffer = new byte[chunkSize];
                    int bytesRead;
                    while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
                    {
                        output.Write(buffer, 0, bytesRead);
                    }
                }
            }
        }

        foreach (var file in Files)
        {
            File.Delete(file);
        }
    }
0xSingularity
  • 577
  • 6
  • 36
  • 3
    How are you ensuring that the files are read/rebuilt in the proper order? – Ron Beyer Apr 22 '15 at 16:05
  • After each file is downloaded, the temp name is added to an array of names. The mergeclean function iterate through this array and joins the files. So it is merged in the order of the files in the array. But since it is downloading in the correct order. I'm guessing it's merging in the correct order. @RonBeyer – 0xSingularity Apr 22 '15 at 16:10
  • Create a checksum of the file before you send, then confirm it after downloading – Michael B Apr 22 '15 at 16:12
  • But with the t.join() doesn't it wait for the previous thread to end before next thread starts? – 0xSingularity Apr 22 '15 at 16:14
  • One problem may be that your reads overlap by one byte, you should be sending previous+1, not previous. Plus I don't know why you are using threads if you .Start and then .Join right away. – Ron Beyer Apr 22 '15 at 16:14
  • 2
    I also notice that you are using several objects (WebResponse, Stream) which are not in `using` blocks. Exceptions will leave those in a bad state. – John Saunders Apr 22 '15 at 16:15
  • I have edited your title. Please see, "[Should questions include “tags” in their titles?](http://meta.stackexchange.com/questions/19190/)", where the consensus is "no, they should not". – John Saunders Apr 22 '15 at 16:16
  • Can you guys also help me optimize my usage of threads? I think the code I have is not actually doing what I want it to do. – 0xSingularity Apr 22 '15 at 16:20
  • Try it without threads first (remove the threading calls), get it to work, and then add threads back in. – Ron Beyer Apr 22 '15 at 16:21
  • 1
    `const int chunkSize = 1 * 1024; // 2KB` **;-)** – Jeroen van Langen Apr 22 '15 at 16:23
  • @JeroenvanLangen haha I forgot to change it – 0xSingularity Apr 22 '15 at 16:27

1 Answers1

2

You need to recombine file from pieces in correct order - current code create random file names and even if items are added to list of files they are added in random order due to unpredictable time when segment download finishes.

Possible fix: use block start offset as part of the file name String name = String.Format("file{0:D6}.tmp", Start) and sort files by name before combining them back.

Note that {0:D6} formatting is used to pad index with 0 to allow sorting by name to be easier and avoid need for natural sort code.

Community
  • 1
  • 1
Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
  • Ok I took your advise and I am now sorting the array before I merge it. Yet the problem still persists and the images it is downloading is still corrupted. any other ideas? – 0xSingularity Apr 22 '15 at 22:27
  • @Chocoloco "you should be sending previous+1, not previous" - {0,10}, {10,20},... instead of {0,9}, {10,19},... – Alexei Levenkov Apr 24 '15 at 00:20