Angular Universal - Pre-render only for web crawlers?

Question

I am intending to user Angular Universal for server side rendering (SSR) but this should only be done for crawlers and bots from selected search engines.

What I want is the following schema:

source: https://dingyuliang.me/use-prerender-improve-angularjs-seo/

After following the official instructions to set up SSR I can now validate that Googlebot (finally) "sees" my website and should be able to index it.

However, at the moment all requests are rendered on the server. Is there a way to determine whether incoming requests are coming from search engines and pre-render the site only for them?

score 3 · Answer 1 · answered Oct 03 '19 at 22:42

You can achieve that with Nginx.

In Nginx you can forward the request to the universal served angular application via..

        if ($http_user_agent ~* "googlebot|yahoo|bingbot") {
            proxy_pass 127.0.0.1:5000; 
            break;
        }
        root /var/www/html;

..assuming that you are serving angular universal via 127.0.0.1:5000.

In case a browser user agent comes along, we serve the page via root /var/www/html

So the complete config would be something like..

server {
    listen 80 default;
    server_name angular.local;

    location / {
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $remote_addr;
        proxy_set_header Host $http_host;

        if ($http_user_agent ~* "googlebot|yahoo|bingbot") {
            proxy_pass 127.0.0.1:5000; 
            break;
        }

        root /var/www/html;
    }

}

few more bots `linkedinbot|googlebot|yahoo|bingbot|baiduspider|yandex|yeti|yodaobot|gigabot|ia_archiver|facebookexternalhit|twitterbot|developers\.google\.com` — Anwarul Islam, Jan 20 '22 at 01:37

Sasan · Answer 2 · 2020-04-04T11:12:40.910

This is what I came up with IIS:

Add the Angular Universal to your project according to the official guide

In order to get rid of complex folder structures, change the following line in server.ts

const distFolder = join(process.cwd(), 'dist/<Your Project>/browser');

to this:

const distFolder = process.cwd();

Run the npm run build:ssr command. You will end up with the browser and server folders inside the dist folder.

Create a folder for hosting in IIS and copy the files that are in the browser and server folders in to the created folder.

iis\
  -assets\
  -favicon.ico
  -index.html
  -main.js => this is the server file
  -main-es2015.[...].js
  -polyfills-es2015.[...].js
  -runtime-es2015.[...].js
  -scripts.[...].js
  -...

Add a new file to this folder named web.config with this content:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <system.webServer>
    <rewrite>
      <rules>
        <rule name="Angular Routes" stopProcessing="true">
          <match url=".*" />
          <conditions logicalGrouping="MatchAll">
            <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
            <add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" />
            <add input="{HTTP_USER_AGENT}" pattern="(.*[Gg]ooglebot.*)|(.*[Bb]ingbot.*)" negate="true" />
          </conditions>
          <action type="Rewrite" url="/index.html" />
        </rule>
        <rule name="ReverseProxyInboundRule1" stopProcessing="true">
          <match url=".*" />
          <conditions>
            <add input="{HTTP_USER_AGENT}" pattern="(.*[Gg]ooglebot.*)|(.*[Bb]ingbot.*)" />
          </conditions>
          <action type="Rewrite" url="http://localhost:4000/{R:0}" />
        </rule>
      </rules>
    </rewrite>
    <directoryBrowse enabled="false" />
  </system.webServer>
</configuration>

Inside this folder open a Command Prompt or PowerShell and run the following:
```
> node main.js
```
Now you should be able to view your Server-Side Rendered website with localhost:4000 (if you haven't changed the port)
Install the IIS Rewrite Module
Add the folder to your IIS for hosting

IIS will redirect requests that have googlebot or bingbot in them to localhost:4000 which is handled by Express and will return server-side rendered content.

You can test this with Google Chrome, open Developer Console, from the menu select "More tools>Network conditions". Then from the User Agent section disable "Select automatically" and choose Googlebot.

Vinz · Answer 3 · 2023-02-12T18:14:22.813

Just managed what you wanted but did not find any anwser providing a detailed step by step with Angular Universal and Express server. So I post here my solution, any idea of improvement welcomed !

First add this function to the server.ts

function isBot(req: any): boolean {
  let botDetected = false;
  const userAgent = req.headers['user-agent'];
  if (userAgent) {
    if (userAgent.includes("Googlebot") ||
      userAgent.includes("Bingbot") ||
      userAgent.includes("WhatsApp") ||
      userAgent.includes("facebook") ||
      userAgent.includes("Twitterbot")
    ) {
      console.log('bot detected with includes ' + userAgent);
      return true;
    }
    const crawlers = require('crawler-user-agents');
    crawlers.every(entry => {
      if (RegExp(entry.pattern).test(userAgent)) {
        console.log('bot detected with crawler-user-agents ' + userAgent);
        botDetected = true;
        return false;
      }
      return true;
    })
    if (!botDetected) console.log('bot NOT detected ' + userAgent);
    return botDetected;
  } else {
    console.log('No user agent in request');
    return true;
  }
}

this function uses 2 modes to detect crawlers (and asumes that the absence of user-agent means that the request is from a bot), the first is a 'simple' manual detection of a string within the header's user-agent and secondly a more advanced detection based on the package 'crawler-user-agents' that you can install to your Angular project like this :

npm install --save crawler-user-agents

Second, once this function added to your server.ts, just use it in each

server.get(`/whatever`,  (req: express.Request, res: express.Response) => {
}

of your Express server export function, for which the 'whatever' route should have a different behaviour based on Bot detection.

Your 'server.get()' functions become :

server.get(`/whatever`,  (req: express.Request, res: express.Response) => {
  if (!isBot(req)) {
   // here if bot is not detected we just return the index.hmtl for CSR
    res.sendFile(join(distFolder + '/index.html'));
    return;
  }
   // otherwise we prerend
  res.render(indexHtml, {
    req, providers: [
      { provide: REQUEST, useValue: req }
    ]
  });
});

To further improve the server load for SEO when a bot is requesting a page I also implemented 'node-cache' because in my case SEO bots do not need the very lastest version of each page, for this I found a good answer here : #61939272

Angular Universal - Pre-render only for web crawlers?

3 Answers3

Linked