People are downloading corrupted PDF's from my website

Question

I'm currently working on a website I did not personally developped. It's is wordpress based, but a lot of developement had been made to it by external people.

We have a problem with some pdf's. When people are downloading these pdf's or even images, they are corrupted. I investigated why, and I found out, by opening the files with notepad, that the html of the webpage was at the begining of the document. if I delete this html, the pdf is not broken anymore.

I know what the problem is, but I cant seem to find how to fix it. Here is the html link for the "download" button.

<a href="<?php echo get_bloginfo('url');?>/?download_process=<?php echo $_GET['dl'];?>" target="_blank" id="#downloadfile_atag" class="downloadfile_atag" style="display:none;">Download</a>

When I click on this button, the page is refreshed, and the download starts.

I found this line in the code that should be the part of the code that takes care of catching the "GET" :

if (isset($_GET['download_process'])) {

The code in this "if" is a bit more than 100 lines and I'm not experienced enough to know what to do. I would need some advices to where to look. For example, There is

ob_start();

at the beginning, and an

ob_clean();

in the middle of the code. Is it normal that there is no

ob_end_clean(); // or
ob_end_flush();

or something like that ?

Or may be is does not come from this buffer. What kind of instruction should I be checking for?

Or may be it does not come from that part of the code, and if so, I'm really lost...

Anyway, thank you in advance for your answers.

Guillaume.

score 0 · Answer 1 · edited Jun 01 '12 at 11:09

0

Is there anything about header(); in the IF statement you have not shown us?

If so is there anything about content size? (this is something I am working on at the moment so have a couple of ideas)

$fsize is the size of the content for download.

$fsize = filesize($fullPath); //this is the filename and path
header("Content-Length: ".$fsize);  //this tells the header how big it should be.

edited Jun 01 '12 at 11:09

halfer

19,824
17
99
186

answered Jun 01 '12 at 08:37

Adsy2010

525
6
23

Yes there is something about headers. I cannot seem to post enough characters in a comment so I'll just post another answer for you to see. – user1430142 Jun 01 '12 at 08:51
Here is the code (did not find out how to post it in a better way..) http://hpics.li/c600569 – user1430142 Jun 01 '12 at 09:05
I dont think it's only a problem of header size, since the whole html is copied inside the PDF. head, body, and footer. Could it be because there are instructions before the header instruction ? – user1430142 Jun 01 '12 at 09:23
well the reason I said that is that you said the downloaded files were corrupt, not that they weren't downloading. Are you streaming the file download or are you linking the file to download? – Adsy2010 Jun 01 '12 at 11:36
Sorry if I did not make myself clear : People can download pdf's, but when they try to open them, they cannot because they are corrupted. About the files, being linked to download, or streamed, how do I check that ? It uses curl, so I would think it's being streamed, but may be I am mistaking. The code I have for the pdf download is what I screened and posted above in my previous comment. – user1430142 Jun 01 '12 at 11:57
sorry about that, by linked i mean do the files to be downloaded actually exist? The little bit of info about cURL is missing at the end of the line in the image you posted so I missed that bit. It is quite possible that it is a filesize related issue that causes the file to be corrupted http://php.net/manual/en/function.header.php look at the post here from Yasser Khan on 3/7/2008. That may help to explain a bit more (UK date dd/mm/yy) – Adsy2010 Jun 01 '12 at 12:21
Yes the files are on the server, I can download them by myself from the FTP and they work fine. The little bit of code I posted is not cut, only the comments are. Before this part, there are other instructions but I dont think it is about the pdf download. And after it there is nothing else. I'll try to do as described in your link, thank you. – user1430142 Jun 01 '12 at 13:42
There is another bit on that link about clearing cache for download. In other words remove the line that adds the HTML and then send file to the browser – Adsy2010 Jun 01 '12 at 17:01
Hello, I still have the same problem... I narrowed it down a little bit, I know what code is executed, but I still dont get why... May be I'm too far down the road and the file I have here is already compromised, but I'm not sure how to check for that. Here is the code : http://hpics.li/410059a I put red line where i'm sure the thing goes and I crossed the other part. How can I check that the file is not yet corrupted here ? What does "fopen" with "rb" option mean ? If I only put "r" or "r+", the pdf only contains the html code, no trace of the pdf itself inside. – user1430142 Jun 05 '12 at 08:25

People are downloading corrupted PDF's from my website

1 Answers1