11

For some reason, I want to serve my robots.txt via a PHP script. I have setup apache so that the robots.txt file request (infact all file requests) come to a single PHP script.

The code I am using to render robots.txt is:

echo "User-agent: wget\n";
echo "Disallow: /\n";

However, it is not processing the newlines. How to server robots.txt correctly, so search engines (or any client) see it properly? Do I have to send some special headers for txt files?

EDIT 1:

Now I have the following code:

header("Content-Type: text/plain");
echo "User-agent: wget\n";
echo "Disallow: /\n";

which still does not display newlines (see http://sarcastic-quotes.com/robots.txt ).

EDIT 2:

Some people mentioned its just fine and not displayed in browser. Was just curious how does this one display correctly: http://en.wikipedia.org/robots.txt

EDIT 3:

I downloaded both mine and wikipedia's through wget, and see this:

$ file en.wikipedia.org/robots.txt
en.wikipedia.org/robots.txt: UTF-8 Unicode English text

$ file sarcastic-quotes.com/robots.txt
sarcastic-quotes.com/robots.txt: ASCII text

FINAL SUMMARY:

Main issue was I was not setting the header. However, there is another internal bug, which is making the Content-Type as html. (this is because my request is actually served through an internal proxy but thats another issue).

Some comments that browsers don't display newline were only half-correct -> modern browsers correctly display newline if content-type is text/plain. I am selecting the answer that closely matched the real problem and was void of the above slightly misleading misconception :). Thanks everyone for the help and your time!

thanks

JP

6 Answers6

29

Yes, you forgot to set the Content Type of your output to text/plain:

header("Content-Type: text/plain");

Your output is probably being sent as HTML, where a newline is truncated into a space, and to actually display a newline, you would need the <br /> tag.

RabidFire
  • 6,280
  • 1
  • 28
  • 24
  • Thanks. I set the header type via the line you mentioned. Still no newlines. Its coming like this: http://sarcastic-quotes.com/robots.txt –  Dec 22 '10 at 06:24
  • I checked that page, and I'm still receiving the response type as `text/html` – RabidFire Dec 22 '10 at 06:27
  • oh thats so strange. let me debug further. –  Dec 22 '10 at 06:30
  • 1
    There might be a problem in the way you've set up Apache to "serve files through a single PHP script". If you could provide that code, we might be able to help. – RabidFire Dec 22 '10 at 06:32
5
  1. header('Content-Type: text/plain') is correct.
  2. You must call this method before anything is written to your output, including white space. Check for whitespace before your opening <?php.
  3. If your Content-Type header has been set to text/plain, no browser in its right mind would collapse whitespace. That behaviour is exclusive to HTML and similar formats.
  4. I'm sure you have your reasons, but as a rule, serving static content through PHP uses unnecessary server resources. Every hit to PHP is typically a new process spawn and a few megs of memory. You can use apache config directives to point to different robots files based on headers like User-Agent - I'd be looking into that.
  5. It's likely that search engines ignore the Content-Type header, so this shouldn't be an issue anyway.

Hope this helps.

-n

Neil
  • 3,001
  • 2
  • 16
  • 20
1

i was having a similar issue and either "\n" nor PHP_EOL worked. I finally used:

header('Content-Disposition: attachment; filename="plaintext.txt"');
header("Content-Type: text/plain");
echo "some data";
echo chr(13).chr(10);

The echo of BOTH characters did the trick. Hope it helps someone.

Bye anankin

anakin
  • 25
  • 3
  • 12
1
<?php header("Content-Type: text/plain"); ?>
User-agent: wget
Disallow: /

BTW, the newlines are there just fine. They're just not displayed in a browser. Browsers collapse all whitespace, including newlines, to a single space.

deceze$ curl http://sarcastic-quotes.com/robots.txt
User-agent: wget
Disallow: /
deceze
  • 510,633
  • 85
  • 743
  • 889
  • Thanks. How come wikipedia's robots.txt displays correctly in browser? See en.wikipedia.org/robots.txt –  Dec 22 '10 at 06:48
0

You must set the content type of the document you are serving. In the case of a .txt text file:

header("Content-Type: text/plain");

The IANA has information about some of the more popular MIME (content) types.

Matthew Scharley
  • 127,823
  • 52
  • 194
  • 222
-2

If you are using echo, then use <br> for new lines. the printf function is what uses \n.

In your case, use printf because you are not using HTML. I believe this is the proper way to do this, along with setting the MIME type to text.

deceze
  • 510,633
  • 85
  • 743
  • 889
Thomas Havlik
  • 1,378
  • 4
  • 12
  • 20
  • 1
    Sorry, that's utter nonsense. `\n` within double quoted strings is always a newline, it has nothing to do with `echo` or `printf`. `
    ` is only useful in the context of HTML.
    – deceze Dec 22 '10 at 06:25
  • `print` is a synonym for `echo`. `printf` is a wrapper around `print` that will substitute different variable into a format string. How you print the content is irrelevant to how it is displayed. – Matthew Scharley Dec 22 '10 at 06:25