1

My web-extension fails to initiate file download for filenames having a pair of emojis with invalid filename error, this seems to be some unicode surrogate pair issue when multiple emojis are used. Here is the offending filename example:

<a href="https://www.example.com/filestream.xyz"
   download="The New World Order Presentation ‍.pdf"
   target="_blank">Download File</a>

As evident from the 'Chrome devtools DOM elements' screenshot below the farmer emoji (https://emojipedia.org/man-farmer/) seem's to be a combination of multiple code-points and is the reason causing the filename to be invalid. When the code is pasted here as above the emoji's are correctly parsed as farmer and flag but when we see it in Dev-tools DOM they are different. Inspecting the filename shared above in devtools displays the issue.

The Farmer Emoji

The code which pushes the download:

function notifyExtension(e) {
  var elem = e.currentTarget;
  var fileSaveName = elem.getAttribute("download");
  e.returnValue = false;

  if (e.preventDefault) {
    e.preventDefault();
  }
  var loop = elem.getAttribute("loop");
  if (loop) {
    chrome.runtime.sendMessage({
      url: elem.getAttribute("href"),
      filename: fileSaveName,
    });
  }
  return false;
}

The background code which starts the download using browser api:

chrome.runtime.onMessage.addListener(function (message) {
  
  let fname = message.filename
    .trim()
    .replace(/[`~!@#$%^&*()_|+\-=?;:'",<>{}[\]\\/]/gi, "-")
    .replace(/[\\/:*?"<>|]/g, "_")
    .substring(0, 240)
    .replace(/\s+/g, " ");
  chrome.downloads.download({
    url: message.url,
    filename: fname,
    conflictAction: "uniquify",
    saveAs: true,
  });
});

The error we get in browser console:

Unchecked lastError value: Error: filename must not contain illegal characters

Error in browser console

How to sanitise the string to have only valid filenames for such situations in javascript? It seems emojis are not an issue, but multiple emojis are !!!

WannabeCoder
  • 57
  • 1
  • 1
  • 12
  • Where and how is the error reported? – Pointy Jul 19 '22 at 12:24
  • Invalid filename Error is reported in Chrome extension Error page. chrome://extensions/?errors=ExtensionUUID_HERE – WannabeCoder Jul 20 '22 at 02:54
  • @Pointy Did you got my last comment? – WannabeCoder Jul 22 '22 at 03:54
  • Please talk about that in your post, and show the actual error(s) - [in text, not images](/help/how-to-ask). Also please explain what you consider sanitizing in this context. Stripping emoji entirely? Just removing zero-width characters? – Mike 'Pomax' Kamermans Jul 23 '22 at 03:34
  • why aren't use using [`download`](https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/downloads/download) api – bogdanoff Jul 23 '22 at 03:39
  • @Mike'Pomax'Kamermans @bogdanoff I am using `download` api. See edited question. – WannabeCoder Jul 24 '22 at 04:27
  • That still leaves the question of what you consider "sanitizing" to mean. Because changing the string is trivial, just update `fileSaveName` as needed. You can, of course, just strip out all emoji using a regex (by removing anything code point 0x1F600 or higher, as well as the zero width joiner 0x200D). – Mike 'Pomax' Kamermans Jul 24 '22 at 14:35
  • @Mike'Pomax'Kamermans We need to strip emoji's only if fails as a filename, i.e. when we encounter a situation like one above having complex emoji's. We would not want to filter out emoji's in 99% of cases which saves fine as a filename. – WannabeCoder Jul 25 '22 at 03:01
  • In that case you probably want to just check for the zero-width joiner, and if there is on, remove "one or more" of "emoji or zwj" in a row. – Mike 'Pomax' Kamermans Jul 25 '22 at 14:34

1 Answers1

1

you can use Unicode properties class to find emojis in a string

syntax is \p{...}

example

console.log("‍aa".replace(/\p{So}/gu, ""))

there are more options to use class \p{...}, you can see them in docs

If single emojis do not cause a failure, but man farmer does cause of the problem is zero width joiner. It is an invalind symbol in filenames in chrome. Run a search for U+200D

Resulting regex

/\p{So}\u{200D}\p{So}/gu
Aziz Hakberdiev
  • 180
  • 2
  • 9