1

I am writing a PHP script to automatically download a CSV file.

When I manually visit the URL from the web-browser(safari) and download the file and open it in MS Excel, columns A and D (date and time respectively) are formatted as I would expect, e.g. '12-Sep-16' and '3:00:30 pm'. However, when I run the below PHP script:

$url_="";

$filename="File.csv";

file_put_contents('/home/file/path' . $filename, file_get_contents($url));

The file downloads, and the content is fine apart from columns A & D, which are formatted as follows 'Mon Sep 12 00:00:00 EDT 2016' and 'Mon Sep 13 10:00:30 EDT 2016'.

My server timezone is set to 'Europe/London'.

Clearly either file_get_contents(); file_put_contents(); is changing something. Would anyone happen to know how I can resolve this?

Russell
  • 63
  • 1
  • 7
  • 1
    That's not a PHP problem. You're fetching from a URL, and f_g_c/f_p_c are merely CONDUITS for what that website is outputting. If you're getting the wrong dates/times in that output, then you're not passing something via that url to inform the remote server what timezone it SHOULD be using. – Marc B Sep 13 '16 at 20:08
  • 1
    How are you actually checking this file? In MS Excel? Or in a text editor? The MS Excel importer for CSV files won't necessarily leave dates in the format that you'd expect! – Mark Baker Sep 13 '16 at 20:08
  • 1
    Excel will automatically convert the date to their own format. Try opening the file in a text editor and you will see the same format that php sees. Should be simple enough to convert from that. – Jonathan Kuhn Sep 13 '16 at 20:10
  • I'm checking the file via Microsoft Excel, although I experience the same behaviour via TextEdit (i.e. the file downloaded from Safari has the Format as I would expect, but the file downloaded by PHP has the strange formatting) – Russell Sep 13 '16 at 20:18
  • `wget 'https://www.theice.com/marketdata/reports/icebenchmarkadmin/GoldPriceAuctionReport.shtml?excelExport=&criteria.reportDate=09.%2f09.%2f2016&criteria.sessionNumber=1500' -O tmp.csv'` gets me a with report dates of Fri Sep 09 00:00:00 EDT 2016. Checked with plain old `cat`. This is most certainly Excel trying to be helpful. – Robert Sep 14 '16 at 03:19

2 Answers2

1

Neither file_get_contents nor file_put_contents modify the content in any way.

This is just a different display format in Excel. Excel reads the timestamp in any format, interprets it, and displays it in some default format, or in the format you specify.


When I download and save the file with Firefox, the dates are indeed different (13-Sep-2016, 3:00:31 PM).

After that, I downloaded the file with wget, and again with curl. Each time I get the same date/time format (Tue Sep 13 00:00:00 EDT 2016, Tue Sep 13 10:00:31 EDT 2016) as with PHP, except when I download with Firefox.

So I would rather suspect, that Firefox is the culprit. Maybe Firefox looks at the mime type and decides to interpret the contents somehow.

Olaf Dietsche
  • 72,253
  • 8
  • 102
  • 198
  • How come when I download the file via the web-browser it formats as I would expect? Same URL via PHP, and the format is different? – Russell Sep 13 '16 at 20:14
  • Thank you for testing Olaf. Firefox appears to be behaving the same as Safari. Is there a way that you know of to use PHP to pass this information through? – Russell Sep 13 '16 at 21:52
  • You can see the mime type "application/vnd.ms-excel" when you download with wget or when you open the console in Firefox. Maybe there is a way in PHP to use this information, besides doing the hard work manually, but I don't know. Maybe ask another question. – Olaf Dietsche Sep 13 '16 at 22:05
  • Thank you very much, this helped me resolve my issue. I'll post the solution below. – Russell Sep 14 '16 at 21:26
1

I was able to resolve this by including header details in file_get_contents(), as follows:

$url = "";

$options = array(
  'http'=>array(
    'method'=>"GET",
    'header'=>"Accept-language: en\r\n" .
              "Cookie: foo=bar\r\n" .  // check function.stream-context-create on php.net
              "User-Agent: Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.102011-10-16 20:23:10\r\n" // i.e. An iPad 
  )
);

$context = stream_context_create($options);
$file = file_get_contents($url, false, $context);
Russell
  • 63
  • 1
  • 7
  • Congratulation, so it's neither Firefox nor Safari, but the website itself, which delivers content depending on some headers. Again, I tested this with wget, and it seems the header `Accept-language` triggers this behaviour. With `wget --header='Accept-Language: xy' ...`, the CSV file has the 13-Sep-2016/3:00:31 PM timestamp. – Olaf Dietsche Sep 14 '16 at 21:40