2

If I have glyph ids like below how can I get the unicode from them, the language is python that I am working on ? Also what I understand the second value is the glyph id but what do we call the first value and the third value?

 (582, 'uni0246', 'LATIN CAPITAL LETTER E WITH STROKE'), (583, 'uni0247', 'LATIN SMALL LETTER E WITH STROKE'), (584, 'uni0248', 'LATIN CAPITAL LETTER J WITHSTROKE'), (585, 'uni0249', 'LATIN SMALL LETTER J WITH STROKE')

Kindly reply.

Actually I am trying to get the unicode from a given ttf file in python.Here is the code :

 from fontTools.ttLib import TTFont
 from fontTools.unicode import Unicode
 from ttfquery import ttfgroups
 from fontTools.ttLib.tables import _c_m_a_p
 from itertools import chain

 ttfgroups.buildTable() 
 ttf = TTFont(sys.argv[1], 0, verbose=0, allowVID=0,
            ignoreDecompileErrors=True,
            fontNumber=-1)

 chars = chain.from_iterable([y + (Unicode[y[0]],) for y in x.cmap.items()] for x in ttf["cmap"].tables)
 print(list(chars))`

This code I got from stackoverflow only but this gives the above output and not what I require. So could anybody please tell me how to fetch the unicodes from the ttf file or is it fine to convert the glyphid to unicode, will it yield to actual unicode ?

YepMe
  • 159
  • 2
  • 11
  • What does that mean, "the Unicode"? The actual character in a string? The Unicode character code (U+....)? – deceze Mar 19 '15 at 07:12
  • 246 with the base 16 (`0x246` in Python) is the same value as 582. I haven't checked this particular value, but I guess that this is the so-called codepoint of a character described as "Latin capital letter E with stroke". Is that what you are asking? – Ulrich Eckhardt Mar 19 '15 at 09:28
  • What I am trying to ask : 1) Is the second value actually the glyph id? If no, then what are those three values that are printed. 2) If yes, then is it possible to convert it to the corresponding unicode value. If yes , then how? 3) If it's not possible to fetch the unicode value from a given glyph id then how can we get the unicode from a given ttf file? – YepMe Mar 19 '15 at 09:35
  • 2
    What is "the unicode value" of a glyph? Do you mean the Unicode codepoint? Getting clear what you mean is important here! You add to the confusion because not only is there a Unicode standard, but there is also a Python builtin type `unicode` and you imported an object `Unicode` from a module `fontTools.unicode`. That said, the same glyph can have multiple Unicode codepoints and you can convert from the numeric Unicode codepoint to a Python `unicode` string using `unichr()`, as shown below. Lastly, you say "above output [is] not what I require" but you fail to mention which output you want. – Ulrich Eckhardt Mar 19 '15 at 09:56
  • Sorry Ulrich , if I have created any confusion , what I want is to get the unicode codepoint of each character present in a ttf file using python but my above code prints (582, 'uni0246', 'LATIN CAPITAL LETTER E WITH STROKE') none of which is the unicode code point. – YepMe Mar 19 '15 at 10:06
  • The first number is the Unicode code point, in decimal. – Mark Tolonen Mar 19 '15 at 11:49

2 Answers2

4

You can use the first field: unichr(x[0]), or equivalently the second field. Then you remove the "uni" part ([3:]) and you convert it to a hexadecimal valu'Ɇ'e, then to a character. Of course, the first method is faster and simpler.

unichr(int(x[1][3:], 16)) #for the first item you've showed, returns 'Ɇ', for the second 'ɇ'

If you use python3, chr instead of unichr.

vermillon
  • 543
  • 2
  • 8
  • 19
0

Here is a simple way to find all unicode character in ttf file.

chars = []
with TTFont('/path/to/ttf', 0, ignoreDecompileErrors=True) as ttf:
    for x in ttf["cmap"].tables:
        for (code, _) in x.cmap.items():
            chars.append(chr(code))
# now chars is a list of \uxxxx characters
print(chars)
alijandro
  • 11,627
  • 2
  • 58
  • 74