Is it a good idea to use a hash (sha1) as id for a firestore document?

Question

My scenario is like follows:

I'm using the BING news api and the return from the api is a list of the following object:

{
    "name": "Eterna Resenha contará com as participações de Neto e Vampeta",
    "url": "https://www.terra.com.br/esportes/lance/eterna-resenha-contara-com-as-participacoes-de-neto-e-vampeta,82e493e511734febfcdfda6fbd22c105xjafr9k2.html",
    "image": {
        "contentUrl": "http://p2.trrsf.com/image/fget/cf/800/450/middle/images.terra.com/2020/05/27/5ece8e302d1fb.jpeg",
        "thumbnail": {
            "contentUrl": "https://www.bing.com/th?id=ON.4E1CF6986982B70A3D6009F435822EF2&pid=News",
            "width": 700,
            "height": 393
        }
    },
    "description": "Durante a quarentena, as lives tomaram conta do país, tentando arrecadar doações para ajudar quem sofre com o coronavírus...",
    "provider": [
        {
            "_type": "Organization",
            "name": "Terra"
        }
    ],
    "datePublished": "2020-05-28T00:00:00.0000000Z",
    "category": "Entertainment"
}

Note that there is no id field in this object, so I improvised an id by turning the datePublished field to Date and used the getTime method to return a long and then concatenated with the news language as follows:

const time = new Date(news.datePublished).getTime()
const id = `${language}${time}`

await database.collection(`news`).doc(`${id}`).set(news, { merge: true })

This solution becomes inefficient when the same news is returned from the BING api with an updated date which causes the object to be duplicated in my firestore database.

The solution I plan to use

Transform the news url into a hash using the sha1 algorithm as follows:

const CryptoJS = require("crypto-js");
const id = `${CryptoJS.SHA1(news.url)}`

await database.collection(`news`).doc(`${id}`).set(news, { merge: true })

The firestore document creation best practices guide leaves scope for using ids in this format. But my main concern is with the performance with big id (d40e5b8df6462e138fe617a84ddabae7f78360a6) since I will have thousands of news in at least 5 languages.

Remeber: I need to create traceable IDs (based on some object property) because some news can be retrieved from BING news with the same content and the different datePublished, then I will need update them.

I would like to know if there are any counter points that make me choose another solution?

I don't think it's against Firestore patterns to do so. At least, I haven't seen anything in their documentation that warn developers. Firebase allows developers to define their own document ID in both Firestore and RTDB. So, I believe it's safe to use a sha1 (or other hash-related methods). The only thing they say about using your own document identifier methods is to [avoid "hotspot"](https://cloud.google.com/firestore/docs/best-practices). But hash-related methods won't create hotspot. — Frenchcooc, Mar 22 '22 at 12:14

score 1 · Answer 1 · answered May 28 '20 at 08:19

You can use Firestore's default ID generator function. I am pretty sure "a big ID" won't cause a noticable performance issue, hence why Google is using such a function for generating unique IDs in their databases.

Here's the function I've extracted and been using for my projects for a long while:

        const generateId = function () {
        // Alphanumeric characters
        const chars =
            'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
        let autoId = '';
        for (let i = 0; i < 20; i++) {
            autoId += chars.charAt(Math.floor(Math.random() * chars.length));
        }
        return autoId;
    };

Probability of running into same ID for two documents is virtually impossible with this function, but you can go ahead and also add a timestamp to the result, just to ease your mind.

Thanks Ogulcan, but I need to create traceable IDs (based on some object property) because some news can be retrieved from BING news with the same content and the different `datePublished` — Abner Escócio, May 28 '20 at 13:41

Is it a good idea to use a hash (sha1) as id for a firestore document?

1 Answers1