Related to #914
U+200D is the zero-width joiner character used for multi-codepoint
Emojis. When faced with a multi-codepoint grapheme, the font shaper must
find a font that provides _all codepoints_ consistently. However, U+200D
isn't meant to be provided by any font. As a result, the font shaper
search ends up iterating over every font looking for a match.
Fixes#845
Quick background: Emoji codepoints are either default text or default
graphical ("Emoji") presentation. An example of a default text emoji
is ❤. You have to add VS16 to this emoji to get: ❤️. Some font are
default graphical and require VS15 to force text.
A font face can only advertise text vs emoji presentation for the entire
font face. Some font faces (i.e. Cozette) include both text glyphs and
emoji glyphs, but since they can only advertise as one, advertise as
"text".
As a result, if a user types an emoji such as 👽, it will fallback to
another font to try to find a font that satisfies the "graphical"
presentation requirement. But Cozette supports 👽, its just advertised
as "text"!
Normally, this behavior is what you want. However, if a user explicitly
requests their font-family to be a font that contains a mix of test and
emoji, they _probably_ want those emoji to be used regardless of default
presentation. This is similar to a rich text editor (like TextEdit on
Mac): if you explicitly select "Cozette" as your font, the alien emoji
shows up using the text-based Cozette glyph.
This commit changes our presentation handling behavior to do the
following:
* If no explicit variation selector (VS15/VS16) is specified,
any matching codepoint in an explicitly loaded font (i.e. via
`font-family`) will be used.
* If an explicit variation selector is specified or our explicitly
loaded fonts don't contain the codepoint, fallback fonts will be
searched but require an exact match on presentation.
* If no fallback is found with an exact match, any font with any
presentation can match the codepoint.
This commit should generally not change the behavior of Emoji or VS15/16
handling for almost all users. The only users impacted by this commit
are specifically users who are using fonts with a mix of emoji and text.
When font shaping grapheme clusters, we erroneously used the font index
of a font that only matches the first codepoint in the cell. This led to the
combining characters being [usually] unknown and rendering as boxes.
For a grapheme, we must find a font face that has a glyph for _all codepoints_
in the grapheme.
This also fixes an issue where we now properly render the unicode replacement
character if we can't find a font satisfying a codepoint.