Just managed what you wanted but did not find any anwser providing a detailed step by step with Angular Universal and Express server.
So I post here my solution, any idea of improvement welcomed !
First add this function to the server.ts
function isBot(req: any): boolean {
let botDetected = false;
const userAgent = req.headers['user-agent'];
if (userAgent) {
if (userAgent.includes("Googlebot") ||
userAgent.includes("Bingbot") ||
userAgent.includes("WhatsApp") ||
userAgent.includes("facebook") ||
userAgent.includes("Twitterbot")
) {
console.log('bot detected with includes ' + userAgent);
return true;
}
const crawlers = require('crawler-user-agents');
crawlers.every(entry => {
if (RegExp(entry.pattern).test(userAgent)) {
console.log('bot detected with crawler-user-agents ' + userAgent);
botDetected = true;
return false;
}
return true;
})
if (!botDetected) console.log('bot NOT detected ' + userAgent);
return botDetected;
} else {
console.log('No user agent in request');
return true;
}
}
this function uses 2 modes to detect crawlers (and asumes that the absence of user-agent means that the request is from a bot), the first is a 'simple' manual detection of a string within the header's user-agent and secondly a more advanced detection based on the package 'crawler-user-agents' that you can install to your Angular project like this :
npm install --save crawler-user-agents
Second, once this function added to your server.ts, just use it in each
server.get(`/whatever`, (req: express.Request, res: express.Response) => {
}
of your Express server export function, for which the 'whatever' route should have a different behaviour based on Bot detection.
Your 'server.get()' functions become :
server.get(`/whatever`, (req: express.Request, res: express.Response) => {
if (!isBot(req)) {
// here if bot is not detected we just return the index.hmtl for CSR
res.sendFile(join(distFolder + '/index.html'));
return;
}
// otherwise we prerend
res.render(indexHtml, {
req, providers: [
{ provide: REQUEST, useValue: req }
]
});
});
To further improve the server load for SEO when a bot is requesting a page I also implemented 'node-cache' because in my case SEO bots do not need the very lastest version of each page, for this I found a good answer here :
#61939272