1

I'm using a library called sitemap to generate files from an array of objects constructed during runtime. My goal is to upload these generated sitemaps to an S3 bucket. So far, the function is hosted on AWS lambda and uploading generated files correctly to the bucket.

My problem is that, the generated sitemaps are corrupted. When I run the function locally, they get generated correctly without any issues.

Here's my handler:

module.exports.handler = async () => {
  try {
    console.log("inside handler....");
    await clearGeneratedSitemapsFromTmpDir();
    const sms = new SitemapAndIndexStream({
      limit: 10000,
      getSitemapStream: (i) => {
        const sitemapStream = new SitemapStream({
          lastmodDateOnly: true,
        });

        const linkPath = `/sitemap-${i + 1}.xml`;
        const writePath = `/tmp/${linkPath}`;
        sitemapStream.pipe(createWriteStream(resolve(writePath)));
        return [new URL(linkPath, hostName).toString(), sitemapStream];
      },
    });

    const data = await generateSiteMap();
    sms.pipe(createWriteStream(resolve("/tmp/sitemap-index.xml")));
    // data.forEach((item) => sms.write(item));
    Readable.from(data).pipe(sms);
    sms.end();
    await uploadToS3();
    await clearGeneratedSitemapsFromTmpDir();
  } catch (error) {
    console.log(" ~ file: index.js ~ line 228 ~ exec ~ error", error);
    Sentry.captureException(error);
  }
};

The data variable has an array of around 11k items, so according to the code above, two sitemap files would be generated(first 10k, rest to second sitemap) in addition to a sitemap index where it lists the two generated sitemaps.

Here's my uploadToS3 function:

const uploadToS3 = async () => {
  try {
    console.log("uploading to s3....");
    const files = await getGeneratedXmlFilesNames();
    for (let i = 0; i < files.length; i += 1) {
      const file = files[i];
      const filePath = `/tmp/${file}`;
      // const stream = createReadStream(resolve(filePath));
      const fileRead = await readFileAsync(filePath, { encoding: "utf-8" });
      const params = {
        Body: fileRead,
        Key: `${file}`,
        ACL: "public-read",
        ContentType: "application/xml",
        ContentDisposition: "inline",
      };

      // const result = await s3Client.upload(params).promise();
      const result = await s3Client.putObject(params).promise();
      console.log(
        " ~ file: index.js ~ line 228 ~ uploadToS3 ~ result",
        result
      );
    }
  } catch (error) {
    console.log("uploadToS3 => error", error);
    // Sentry.captureException(error);
  }
};

And here's the function that cleans up the generated files from lambda's /tmp directory after upload to S3:

const clearGeneratedSitemapsFromTmpDir = async () => {
  try {
    console.log("cleaning up....");
    const readLocalTempDirDir = await readDirAsync("/tmp");
    const xmlFiles = readLocalTempDirDir.filter((file) =>
      file.includes(".xml")
    );
    for (const file of xmlFiles) {
      await unlinkAsync(`/tmp/${file}`);
      console.log("deleting file....");
    }
  } catch (error) {
    console.log(
      " ~ file: index.js ~ line 207 ~ clearGeneratedSitemapsFromTmpDir ~ error",
      error
    );
  }
};

My hunch is that the issue is related to streams as I haven't fully understood them yet. Any help here is highly appreciated.

Side note: I tried to sleep for 10s before uploading, but that didn't work either.

Omar Dulaimi
  • 846
  • 10
  • 30

1 Answers1

0

As a workaround, I did this:

const data = await generateSiteMap();
const logger = createWriteStream(resolve("/tmp/all-urls.json.txt"), {
  flags: "a",
});
data.forEach((el) => {
  logger.write(JSON.stringify(el));
  logger.write("\n");
});
logger.end();

const stream = lineSeparatedURLsToSitemapOptions(
  createReadStream(resolve("/tmp/all-urls.json.txt"))
)
  .pipe(sms)
  .pipe(createWriteStream(resolve("/tmp/sitemap-index.xml")));

await new Promise((fulfill) => stream.on("finish", fulfill));
await uploadToS3();
await clearGeneratedSitemapsFromTmpDir();

Will keep question open in case somebody answers it correctly.

Omar Dulaimi
  • 846
  • 10
  • 30