83

I'm currently working on an application built with Express (Node.js) and I want to know what is the smartest way to handle different robots.txt for different environments (development, production).

This is what I have right now but I'm not convinced by the solution, I think it is dirty:

app.get '/robots.txt', (req, res) ->
  res.set 'Content-Type', 'text/plain'
  if app.settings.env == 'production'
    res.send 'User-agent: *\nDisallow: /signin\nDisallow: /signup\nDisallow: /signout\nSitemap: /sitemap.xml'
  else
    res.send 'User-agent: *\nDisallow: /'

(NB: it is CoffeeScript)

There should be a better way. How would you do it?

Thank you.

Vinch
  • 1,551
  • 3
  • 13
  • 15

9 Answers9

125

Use a middleware function. This way the robots.txt will be handled before any session, cookieParser, etc:

app.use('/robots.txt', function (req, res, next) {
    res.type('text/plain')
    res.send("User-agent: *\nDisallow: /");
});

With express 4 app.get now gets handled in the order it appears so you can just use that:

app.get('/robots.txt', function (req, res) {
    res.type('text/plain');
    res.send("User-agent: *\nDisallow: /");
});
SystemParadox
  • 8,203
  • 5
  • 49
  • 57
  • 1
    Surely it makes sense to do `app.use('/robots.txt', function (req, res, next) { ... });` and lose the `req.url` check. – c24w Feb 02 '15 at 16:49
  • 1
    @c24w with express 4 yes it would. `app.get` would work as well. I will update. Thanks – SystemParadox Feb 03 '15 at 08:59
  • Ah, I thought it might be a new API feature (I should have checked). `app.get` is even better! :) – c24w Feb 03 '15 at 10:16
  • 4
    The second approach here is better as is the one being then used by the OP for express < 4. Adding a middleware means ALL requests go through this robots.txt check. – flaviodesousa Oct 26 '16 at 18:46
  • res.type('text/plain'); - Why is this line necessary? Expressjs automatically changes the content type to the correct value because it detects that the file ends with a .txt extension, right? – Fabian Jun 20 '20 at 12:35
  • 1
    @Fabian, only `res.sendFile()` and the `static` middleware use file extensions. For `res.send()` it will automatically set the content-type if it's not set already, but for a string it will set it to text/html. – SystemParadox Jun 22 '20 at 07:52
22

1. Create robots.txt with following content :

User-agent: *
Disallow: # your rules here

2. Add it to public/ directory.

3. If not already present in your code, add:

app.use(express.static('public'))

Your robots.txt will be available to any crawler at http://yoursite.com/robots.txt

Exadra37
  • 11,244
  • 3
  • 43
  • 57
atul
  • 540
  • 4
  • 11
  • 7
    Express doesn't automatically serve static files from `/public` does it? I think you need to configure it to do so. So the answer is incomplete. – Ian Walter Oct 17 '18 at 19:28
  • 1
    If your express aplication have the `public/` directory to serve static files; this answer works like a charm. – Leandro Lima Mar 12 '19 at 19:56
  • 1
    this is the simplest and the most obvious solution that can be done in an express app. i still cannot believe why didn't i think of this. – peter Sep 08 '20 at 04:52
  • 1
    This should be the checked answer. –  Jul 11 '21 at 12:56
2

Looks like an ok way.

An alternative, if you'd like to be able to edit robots.txt as regular file, and possibly have other files you only want in production or development mode would be to use 2 separate directories, and activate one or the other at startup.

if (app.settings.env === 'production') {
  app.use(express['static'](__dirname + '/production'));
} else {
  app.use(express['static'](__dirname + '/development'));
}

then you add 2 directories with each version of robots.txt.

PROJECT DIR
    development
        robots.txt  <-- dev version
    production
        robots.txt  <-- more permissive prod version

And you can keep adding more files in either directory and keep your code simpler.

(sorry, this is javascript, not coffeescript)

Pascal Belloncle
  • 11,184
  • 3
  • 56
  • 56
  • That's interesting, I think I'll try something like that, it looks more graceful to me! Thank you! – Vinch Feb 27 '13 at 22:49
  • just wanted to mention that things will change soon ( Express 4.0 ). You need the "native" .env then [ process.env.NODE_ENV ] :: http://scotch.io/bar-talk/expressjs-4-0-new-features-and-upgrading-from-3-0 – sebilasse Mar 20 '14 at 09:18
1

Here is what I use

router.use('/robots.txt', function (req, res, next) {
  res.type('text/plain')
  res.send(
    `User-agent: *
     Disallow: /admin`);
});
Anirudh
  • 2,767
  • 5
  • 69
  • 119
0

For choosing the robots.txt depending the environment with a middleware way:

var env = process.env.NODE_ENV || 'development';

if (env === 'development' || env === 'qa') {
  app.use(function (req, res, next) {
    if ('/robots.txt' === req.url) {
      res.type('text/plain');
      res.send('User-agent: *\nDisallow: /');
    } else {
      next();
    }
  });
}
fernandopasik
  • 9,565
  • 7
  • 48
  • 55
0

This is what I did on my index routes. You can just simply write down in your codes what I does given down below.

router.get('/', (req, res) =>
    res.sendFile(__dirname + '/public/sitemap.xml')
)

router.get('/', (req, res) => {
    res.sendFile(__dirname + '/public/robots.txt')
})
Chen Lay
  • 115
  • 1
  • 2
  • 13
0

I use robots.txt as a normal file for Prod, and a middleware for other envs.

if(isDev || isStaging){
    app.use('/robots.txt', function (req, res) {
        res.type('text/plain');
        res.send("User-agent: *\nDisallow: /");
    });
}
app.use(express.static(path.join(__dirname, 'public')));
Mahmoud
  • 456
  • 3
  • 13
0

Focusing more on the most convenient and simple solution instead of the "best" or "smartest". I simply added the following to the server.ts file.

server.get('/robots.txt', function (req, res) {
  res.type('text/plain');
  res.send("User-agent: *\nAllow: /");
})

What this does is create a robots.txt file on the fly and sends it whenever the /robots.txt file is called for. Now to get this to work, the code fragment must be placed before the other server.get function calls (so it takes priority). I'm implementing Express with Angular, for which the full code fragment for me ended up being:

export function app(): express.Express {
  const server = express();
  const distFolder = join(process.cwd(), 'dist/sophisticatedPrimate/browser');
  const indexHtml = existsSync(join(distFolder, 'index.original.html')) ? 'index.original.html' : 'index';

  // Our Universal express-engine (found @ https://github.com/angular/universal/tree/main/modules/express-engine)
  server.engine('html', ngExpressEngine({
    bootstrap: AppServerModule,
  }));

  server.set('view engine', 'html');
  server.set('views', distFolder);

  server.get('/robots.txt', function (req, res) {
    res.type('text/plain');
    res.send("User-agent: *\nAllow: /");
  })

  // Example Express Rest API endpoints
  // server.get('/api/**', (req, res) => { });
  // Serve static files from /browser
  server.get('*.*', express.static(distFolder, {
    maxAge: '1y'
  }));

  // All regular routes use the Universal engine
  server.get('*', (req, res) => {
    res.render(indexHtml, { req, providers: [{ provide: APP_BASE_HREF, useValue: req.baseUrl }] });
  });

  return server;
}
-1
app.use(express.static('public'))
app.use('/images', express.static('public/images'))
app.use('/videos', express.static('public/videos'))

enter image description here