How to use combining characters in UFO 3?

Question

I'm creating a font by writing XML files in the Unified Font Object 3 format.

It's a bit unclear how one would create combining characters in the format, so I was hoping someone could point me in the right direction with a quick example?

In my case, I'm using the private use area of Unicode (U+E000-U+F8FF). For example, I would like U+E000 and U+E001 to display on top of each other if typed one after the other.

Can you elaborate on what you actually want to know? Are you having problems defining a glyph for the specific unicode codepoint of your combining character of choice? Or are you having a problem with defining the positioning in combining pairs? (e.g. Opentype's GPOS, or deprecated kern, information?) — Mike 'Pomax' Kamermans, Apr 30 '18 at 14:39
@Mike'Pomax'Kamermans I don't have any problems creating a `.glif` file for any particular Unicode point, I just don't know how to define the character as something that would combine with another character if that makes sense? In my case, I'm using the private use area (U+E000-U+F8FF). And I want a case where say U+E000 + U+E001 creates a combined character with the 2 characters on top of each other. — 鈴木雪, Apr 30 '18 at 19:33

Mike 'Pomax' Kamermans · Accepted Answer · 2018-04-30T20:26:31.813

0

This sounds like a bit of term confusion; unicode combining characters are a well-defined thing with a precise meaning, and not related to PUA codepoints (which are basically "unregulated, but codepoint-addressable glyphs). PUA in modern fonts is best avoided, instead relying on GSUB rules to resolve human-typable code sequences to internal glyph ids, and then having GPOS rules that perform the necessary repositioning based on (pairs of) internal ids.

So it sounds a little like you're trying to figure out how to define the rules that in an OpenType font is GPOS data, for custom positioning of code pairs. If so, that's something you define in a feature definition, in which (for this particular case) you set up GPOS rules to effect the repositioning you need.

Unfortunately, there are no "quick" examples here: GSUB/GPOS features are anything but quick and easy, and you usually don't write them by hand (things like FontForge, Fontlab, FontCreator, etc. all come with UI for automating parts of, or all of, writing OpenType script/feature/lookup definitions). If you absolutely have to, you're probably going to have to read through the feature documentation several times to understand the precise syntax and which GPOS lookup type you'll need to use.

edited Apr 30 '18 at 20:26

answered Apr 30 '18 at 20:21

Mike 'Pomax' Kamermans

49,297
16
112
153

For further clarification, I'm using PUA for the script of an artificial language, in which the script functions as an abugida; where vowels are diacritics to consonants. For example, `ć` might be pronounced as `/ca/` and `ĉ` as `/ce/`. From what I understand, Unicode offers both combining characters such as `U+0302` (◌̂) and precomposed characters such as `U+0109` (ĉ). So I can't simply declare a PUA codepoint as a combining character, correct? Then my options would be either precomposed characters, or OpenType GPOS features? Would ligatures be more appropriate? – 鈴木雪 May 01 '18 at 03:57
Indeed, PUA is not "anything" in unicode other than addressable glyph space, so if you have an artificial language with compositional rules, you don't want to use PUA so much as a combination of GSUB and GPOS, with non-addressable internal glyphs for the real outlines. For instance, the user types 'ca', the font has GSUB ligature rule for `c a -> internal_ca_glyph`, the user types 'ce', GSUB substitutes to internal_ce_glyph, then GPOS kicks in with an `internal_ca internal_ce -> repostion in some way' rule for more complex positioning work. – Mike 'Pomax' Kamermans May 01 '18 at 08:26
How exactly do internal glyphs work? As in, they have no assigned codepoint? `ca` is just rendered as the glyph in the artificial script? I'm not sure if that would work for me, as I would still want English text to display properly in this font. I was thinking an IME would be needed for a user to type in this language. I was reading the Wikipedia article on ligatures, and it says _The Brahmic abugidas make frequent use of ligatures in consonant clusters._ As my script is an abugida, would ligatures be more appropriate to use? Can those be used with PUA codepoints? – 鈴木雪 May 01 '18 at 14:37
corrent: technically "unicode" and "glyphs" have nothing to do with each other. There is a character map that says "if the context in which this font is used is unicode, then input codepoint X maps to glyph id Y" but the important thing is that the font operates on those glyph ids, *not* on those codepoints, those are only used to resolve byte strings to shaped text. So you can have thousands of glyphs, all of which have ids, and only some of which are "real" things (like the letter 'a'), and some of which don't map to any known byte sequence in any character set. – Mike 'Pomax' Kamermans May 01 '18 at 14:59
If you want the same font to do both English and your artificial script, then you can do that with GSUB ligatures just fine, as long as the text engine used (word, latex, browser, whatever) can be told to turn ligatures on or off for stretches of text. There are quite a few different established "kinds" of ligature features, so just pick the most suitable one (with `liga` being the most generic feature). – Mike 'Pomax' Kamermans May 01 '18 at 15:02

How to use combining characters in UFO 3?

1 Answers1