12

I'm interested in knowing where font fallback fits in the font shaping/rendering stack. In other words, at what point are missing glyphs detected and how are they substituted?

I see in this document that the FontConfig tool does font fallback "based on glyph coverage transparently."

So the questions are:

  1. How exactly does this algorithm work?
  2. Is this the standard algorithm used by most browsers - webkit, gecko (probably not IE)?
  3. How does font fallback based on missing glyphs within a font that does exist relate to CSS font fallback (which specifies which fonts to use in turn, when a font is entirely missing)?

Edit: I found this document which explains the "what" of FontConfig, but not the "how." Question 1 is about the "how."

To summarize - this post really has to do with one thing only - how does font fallback work when glyphs are missing in a font.

bright
  • 4,700
  • 1
  • 34
  • 59

2 Answers2

15

Font fallback in browsers (as opposed to, say, in an OS) is based on two things:

  1. The CSS specification, which gives the fonts that are to be used for fallback, and
  2. The text engine, which does text shaping.

The CSS spec is fairly trivial in this respect, simply giving the list of fonts using their system names, but several possible "catch all" fonts that are in no way guaranteed to be the same from computer to computer (there is no reason to assume that serif maps to Times or Times New Roman, for instance).

The fallback algorithm used by text engines is entirely up to the engine, but usually kicks in during the glyph lookup step: the text engine sees a string of code points, and tries to use a font to shape that string. For each point in the sequence, it checks whether the font has a matching glyph (by consulting the CMAP table and subtables), or a rule that tells the engine that there may be a glyph to use only if more code points follow, through the GSUB mechanism (For instance, a font without glyphs for the individual letters e, t and c, but with a glyph for & and a GSUB rule that says the sequence e+t+c should be in-text replaced with the single glyph &), and when it's finished accumulating this kind of "unit of points", it shapes the text and hands it back to whatever asked it to shape text.

If, during glyph lookup, it turns out the font doesn't contain anything that lets the engine shape a particular code point (i.e. running through the CMAP data as well as the GSUB rules still shows "there is no glyph") then the text engine can do two things:

  1. Give up. There is no glyph, instead use the .notdef outline defined as glyph id 0, and generally give you text with lovely empty boxes (lovingly called "tofu" by font folks) or question marks.
  2. Attempt font fallback, where it will try another font to find a glyph for the unsupported code point in.

When using fallback, an engine can go down a list of alternative fonts until either: (a) a glyph is found, or (b) the list is exhausted, at which point the engine has to give up, and will use the .notdef glyph. Whether the engine grabs the .notdef glyph from the original font, or from the last font in the list, is entirely up to the engine (although usually it'll go with the first font, for legibility)

There is no "standard" algorithm for this defined anywhere; font fallback is basically a convenience mechanism offered by text engine authors, like how browsers come with bookmark managers (handy, and not part of any spec). As far as OpenType is concerned, there are no requirements on whether an engine should just serve up .notdef when a glyph is not found, or whether it should serve up the part it could shape, then find the missing glyph somewhere else, and render text that way. CSS implies that your text engine should have at least some form of font fallback, but it doesn't specify how it should work, or when it should kick in.

Mike 'Pomax' Kamermans
  • 49,297
  • 16
  • 112
  • 153
  • Thanks - this is really informative. I'd like more detail on how the font fallback actually happens in any browser, to get some insight into this process. It seems far more critical than "a convenience" - much web content depends on it. I'm assuming the fallback isn't simply to the css font fallback list (or is it?) I've removed the css tag - stackoverflow insists on placing that first, giving it a misleading emphasis. I'm not so interested in CSS per se - as you point out the spec is trivial relative to font fallback. – bright Mar 24 '15 at 21:51
  • Edit: So I was mistaken above. I thought the css spec only dealt with font fallback in the situation where a font is entirely absent. But on reading the spec a little more closely it looks like it addresses the very case of missing glyphs. So I'm marking your answer as accepted. – bright Mar 24 '15 at 22:05
  • it's not the browser so much as "the text engine", so Firefox and Chrome, for instance, use [harfbuzz](http://www.freedesktop.org/wiki/Software/HarfBuzz/), IE I believe relies on [Uniscribe](https://msdn.microsoft.com/en-us/library/windows/desktop/dd317713%28v=vs.85%29.aspx). And yeah, it's definitely CSS related, fonts are used on a per-glyph basis (thankfully =D) – Mike 'Pomax' Kamermans Mar 25 '15 at 16:37
  • Thanks for this answer and for [this post of yours](https://tug.org/pipermail/xetex/2009-November/014853.html) which helped me recently. Do you know if there's a hope of getting such fallback in XeTeX? Right now a major problem with automated processing using *TeX (e.g. when Pandoc turns Markdown into PDF) is that characters from unexpected scripts just go missing in output. The [ucharclasses package](https://ctan.org/pkg/ucharclasses) (by you!) helps, but as XeTeX already uses Harfbuzz it would be nice if [fallback](https://tex.stackexchange.com/q/323575/48) could work. – ShreevatsaR Jul 17 '17 at 01:49
  • I'm not sure that question makes a lot of sense, mostly because XeTeX is a typesetting markup language. Unlike webpages, it's a fully controlled publishing chain so you *explicitly* control which fonts get used. For XeTeX you *want* things to fail, and hard, when a glyph needs to be typeset that is unavailable in the font you're using, so you can update you source code to make sure an appropriate alternative is explicitly used? – Mike 'Pomax' Kamermans Jul 19 '17 at 05:42
  • @ShreevatsaR perhaps you were looking for something like https://tex.stackexchange.com/questions/224584/define-fallback-font-for-specific-unicode-characters-in-lualatex ? but even then: that solution relies on you being a proper editor for your content, and noting which characters you're using that are not in your font of choice. With the guiding principle of course being "if it's not in the typeface you chose, choose a different, more complete typeface" to preserve type cohesion. – Mike 'Pomax' Kamermans Jul 19 '17 at 05:47
5

On Windows:

Firefox font fallback

Firefox has different algorithm for CJK glyphs and non-CJK glyphs:

non-CJK

non-CJK algorithm is very simple: try all the configured fonts of the given html language. These include both config font.name.{generic}.{language} and the list of config font.name-list.{generic}.{language}.

CJK

CJK is by nature complicated due to the shear number of glyphs, encodings and language variations. Firefox uses a dynamic search algorithm to resolve the glyphs.

  1. Use the configured fonts for the given html language.
  2. Use the configured Japanese (ja) fonts.
  3. Use the configured Korean (ko) fonts.
  4. Use the configured Simplified Chinese (zh-CN) fonts.
  5. Use the configured Traditional Chinese (Hong Kong) (zh-HK) fonts.
  6. Use the configured Traditional Chinese (Taiwan) (zh-TW) fonts.

The algorithm is currently implemented in GetLangPrefs(). In both CJK and non-CJK cases, there is a limit of how many fonts to be searched (32). The script search order is hard coded and thus can't be user configured at the moment.

The advantage of Firefox's fallback algorithm is that, thanks to its dynamic nature, more fonts are searched thus minimizing the chance of user encountering missing glyphs. Additionally, by understanding the search order, users can manipulate the configuration to choose desired fonts for missing glyphs.

The disadvantage is inconsistency: because the search list is hard coded, fonts from certain languages are prioritized for all webpages. For instance, Japanese optimized fonts might be used in tag-missing Korean webpages. Also, since more fonts are tried, the performance might deteriorate.

Chromium font fallback

Unlike Firefox, Chromium chooses a more static approach to search fonts. Instead of dividing CJK cases and going through font list, Chromium hard codes several "core" fonts for each script. Chromium assumes these fonts should always be available, thus only search these fonts. The mapping of script to font can be found in InitializeScriptFontMap(). This mapping cannot be user configured at the moment.

The advantage of this algorithm is simplicity, consistency and performance, at the cost of flexibility and configurability.

The implementation may change in the future. More detail in https://gist.github.com/CrendKing/c162f5a16507d2163d58ee0cf542e695.

Reci
  • 4,099
  • 3
  • 37
  • 42