9

I was reading Google's guidelines about SEO and I found this.

Help Google find your content

The first step to getting your site on Google is to be sure that Google can find it. The best way to do that is to submit a sitemap. A sitemap is a file on your site that tells search engines about new or changed pages on your site. Learn more about how to build and submit a sitemap.

Obs.: My web app is an ecommerce/blog in which I have a shop that I have products to sell and I have a blogging section where I create and post content about those products.

So, each product has a product page, and each blog post has a blogPost page.

Then I went looking for some examples of Sitemaps from websites like mine that have good SEO ranking.

And I've found this good example:

robots.txt

User-Agent: *
Disallow: ... // SOME ROUTES

Sitemap: https://www.website.com/sitemap.xml

I.E: Apparently the crawler robot finds the Sitemap location from the robots.txt file.

And I've also found out that they keep separate sitemap files for blogPost and product pages.

sitemap.xml

<sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd">
  <sitemap>
    <loc>https://www.website.com/blogPosts-sitemap.xml</loc> // FOR POSTS
    <lastmod>2019-09-10T05:00:14+00:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://www.website.com/products-sitemap.xml</loc>  // FOR PRODUCTS
    <lastmod>2019-09-10T05:00:14+00:00</lastmod>
  </sitemap>
</sitemapindex>

blogPosts-sitemap.xml

// HUGE LIST WITH AN <url> FOR EACH BLOGPOST URL

<url>
  <loc>
    https://www.website.com/blog/some-blog-post-slug
  </loc>
  <lastmod>2019-09-03T18:11:56.873+00:00</lastmod>
  <changefreq>weekly</changefreq>
  <priority>0.8</priority>
</url>

products-sitemap.xml

// HUGE LIST WITH AN <url> FOR EACH PRODUCT URL

<url>
  <loc>
    https://www.website.com/gp/some-product-slug
  </loc>
  <lastmod>2019-09-08T07:00:16+00:00</lastmod>
  <changefreq>yearly</changefreq>
  <priority>0.3</priority>
</url>

QUESTION

How can I keep updated Sitemap files like that if my web app is a Single Page App with client site routing?

Since I'm using Firebase as my hosting, what I've thought about doing is:

OPTION #1 - Keep sitemap.xml in Firebase Hosting

From this question Upload single file to firebase hosting via CLI or other without deleting existing ones?

Frank van Puffelen says:

Update (December 2018): Firebase Hosting now has a REST API. While this still doesn't officially allow you to deploy a single file, you can use it creatively to get what you want. See my Gist here: https://gist.github.com/puf/e00c34dd82b35c56e91adbc3a9b1c412

I could use his Gist to update the sitemap.xml file and run this script once a day, or whenever I want. This would work for my current project, but it would not work for a project with a higher change frequency of dynamic pages, like a news portal or market place, for example.

OPTION #2 - Keep sitemap.xml in Firebase Storage

Keep the sitemap files in my Storage bucket and update it as frequently as I need via a admin script or a cloud scheduled function.

Set a rewrite in my firebase.json and specify a function to respond and serve the sitemap files from the bucket, when requested.

firebase.json

"hosting": {
 // ...

 // Add the "rewrites" attribute within "hosting"
 "rewrites": [ {
   "source": "/sitemap.xml",
   "function": "serveSitemapFromStorageBucket"
 } ]
}

FINAL QUESTION

I'm leaning towards OPTION #2, I want to know if it will work for this specific purpose or if I'm missing something out.

Community
  • 1
  • 1
cbdeveloper
  • 27,898
  • 37
  • 155
  • 336
  • Hi, I have the same issue like you, and wonder if your solution works for google search console? – Jimmy Lin Jul 17 '20 at 02:46
  • 1
    @JimmyLin I have a cloud function that generates the `sitemap.xml` on the fly. Ex: `https://www.mywebsite.com/sitemap.xml` will be redirect to a `http` cloud function that will build the file and respond. This way, the sitemap "file" does not exist. It is generated on-demand and it is always updated with the latest data. – cbdeveloper Jul 17 '20 at 10:19
  • @JimmyLin I've posted an answer. – cbdeveloper Jul 17 '20 at 10:23
  • 2
    We're going in the wrong direction when something this simple, ends up being so complex. – WebDev-SysAdmin Sep 10 '21 at 22:47

3 Answers3

4

I ended up creating a cloud function to build the sitemap file on-demand.

firebase.json

"rewrites": [
  {
    "source": "/sitemap.xml",
    "function": "buildSitemap"
  },
]

buildSitemap.js (this is a cloud function)

import * as admin from 'firebase-admin';

async function buildSitemap(req,res)  {

  // Use firebase-admin to gather necessary data
  // Build the sitemap file string
  // and send it back

  res.set('Content-Type', 'text/xml');
  res.status(200).send(SITEMAP_STRING);
  return;

}

export default buildSitemap;

cbdeveloper
  • 27,898
  • 37
  • 155
  • 336
  • Are you still using this method? Because I have a similar approach, however, I feel like there are a few disadvantages with this. You can only store up to 50.000 urls per sitemap, there is a potential for lots of unnecessary reads (I fetch all post ids from Firestore) and lastly, it takes some seconds to create the sitemap each time from scratch. – LukyFoggy Feb 23 '21 at 10:42
  • I'm still using it. So far it's working fine. I got around 100 urls, though. I know that you can create a sitemap index and break it into multiple sitemap files, so you get 50k on each one. You can also cache for a day to avoid too many reads. – cbdeveloper Feb 23 '21 at 11:28
  • Thanks for the quick response. Yes, I´m currently trying the sitemap-index approach and splitting the urls in several files. I can let you know should I be able to implement it. One last thing, do you perhaps have a reference or a snippet on how I can cache the sitemap result for a day? – LukyFoggy Feb 23 '21 at 12:19
  • @LukyFoggy It depends on your implementation details. If your hosting provider has a CDN, you can cache it on the CDN by setting a `Cache-Control: s-maxage=SOME_VALUE_IN_SECONDS` header. Or if you are not behind a CDN, you can cache directly on your server. – cbdeveloper Feb 23 '21 at 12:33
  • I´m using Firebase Hosting, so adding `res.set('Cache-Control', 'public, max-age=86400, s-maxage=86400');` should do the trick, thank you. – LukyFoggy Feb 23 '21 at 13:16
  • 1
    That's it. You can either set it in the `firebase.json` config file or you can set it on your server / cloud function. Be aware that `firebase.json` will overwrite what you set on your server. See [this question](https://stackoverflow.com/questions/66107249/where-to-set-cache-control-when-using-firebase-hosting-cloud-run-express-serve). At least, those were the results I got after testing this behavior. – cbdeveloper Feb 23 '21 at 13:24
0

Remove src/sitemap.xml from angular.json

      "assets": [
          "src/assets",
          "src/favicon.ico",
          "src/manifest.json",
          "src/robots.txt"
        ],
0

Put your sitemap.xml inside public, firebase deploy, and that's it. It works.

If you're coming from Nuxt.js question, then I do have a little bonus for you regarding this comment, this commit and this commit. Remember to use yarn generate, not yarn build before deployment, otherwise the it won't work as explained here.

Daniel Danielecki
  • 8,508
  • 6
  • 68
  • 94