9 Commits

Author SHA1 Message Date
Ryan Liptak
2d3db866e6 unigen: Remove libc dependency, use ArenaAllocator
Not linking libc avoids potential problems when compiling from/for certain targets (see https://github.com/ghostty-org/ghostty/discussions/3218), and using an ArenaAllocator makes unigen run just as fast (in both release and debug modes) while also taking less memory.

Benchmark 1 (3 runs): ./zig-out/bin/unigen-release-c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.75s  ± 15.8ms    1.73s  … 1.76s           0 ( 0%)        0%
  peak_rss           2.23MB ±    0      2.23MB … 2.23MB          0 ( 0%)        0%
  cpu_cycles         7.22G  ± 62.8M     7.16G  … 7.29G           0 ( 0%)        0%
  instructions       11.5G  ± 16.0      11.5G  … 11.5G           0 ( 0%)        0%
  cache_references    436M  ± 6.54M      430M  …  443M           0 ( 0%)        0%
  cache_misses        310K  ±  203K      134K  …  532K           0 ( 0%)        0%
  branch_misses      1.03M  ± 29.9K      997K  … 1.06M           0 ( 0%)        0%
Benchmark 2 (3 runs): ./zig-out/bin/unigen-release-arena
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.73s  ± 6.40ms    1.72s  … 1.73s           0 ( 0%)          -  1.0% ±  1.6%
  peak_rss           1.27MB ± 75.7KB    1.18MB … 1.31MB          0 ( 0%)        - 43.1% ±  5.4%
  cpu_cycles         7.16G  ± 26.5M     7.13G  … 7.18G           0 ( 0%)          -  0.9% ±  1.5%
  instructions       11.4G  ± 28.2      11.4G  … 11.4G           0 ( 0%)          -  0.8% ±  0.0%
  cache_references    441M  ± 2.89M      439M  …  444M           0 ( 0%)          +  1.2% ±  2.6%
  cache_misses        152K  ±  102K     35.2K  …  220K           0 ( 0%)          - 50.8% ± 117.8%
  branch_misses      1.05M  ± 13.4K     1.04M  … 1.06M           0 ( 0%)          +  2.0% ±  5.1%

Benchmark 1 (3 runs): ./zig-out/bin/unigen-debug-c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.75s  ± 32.4ms    1.71s  … 1.77s           0 ( 0%)        0%
  peak_rss           2.23MB ±    0      2.23MB … 2.23MB          0 ( 0%)        0%
  cpu_cycles         7.23G  ±  136M     7.08G  … 7.34G           0 ( 0%)        0%
  instructions       11.5G  ± 37.9      11.5G  … 11.5G           0 ( 0%)        0%
  cache_references    448M  ± 1.03M      447M  …  449M           0 ( 0%)        0%
  cache_misses        148K  ± 42.6K     99.3K  …  180K           0 ( 0%)        0%
  branch_misses       987K  ± 5.27K      983K  …  993K           0 ( 0%)        0%
Benchmark 2 (3 runs): ./zig-out/bin/unigen-debug-arena
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.76s  ± 4.12ms    1.76s  … 1.76s           0 ( 0%)          +  0.4% ±  3.0%
  peak_rss           1.22MB ± 75.7KB    1.18MB … 1.31MB          0 ( 0%)        - 45.1% ±  5.4%
  cpu_cycles         7.27G  ± 17.1M     7.26G  … 7.29G           0 ( 0%)          +  0.6% ±  3.0%
  instructions       11.4G  ± 3.79      11.4G  … 11.4G           0 ( 0%)          -  0.8% ±  0.0%
  cache_references    440M  ± 4.52M      435M  …  444M           0 ( 0%)          -  1.7% ±  1.7%
  cache_misses       43.6K  ± 19.2K     26.5K  … 64.3K           0 ( 0%)        - 70.5% ± 50.8%
  branch_misses      1.04M  ± 2.25K     1.04M  … 1.05M           0 ( 0%)        💩+  5.8% ±  0.9%
2025-01-20 18:30:22 -08:00
Mitchell Hashimoto
fd1201323e unicode: emoji modifier requires emoji modifier base preceding to not break
Fixes #2941

This fixes the rendering of the text below. For those that can't see it,
it is the following in UTF-32: `0x22 0x1F3FF 0x22`.

```
"🏿"
```

`0x1F3FF` is the Fitzpatrick modifier for dark skin tone. It has the
Unicode property `Emoji_Modifier`. Emoji modifiers are defined in UTS
#51 and are only valid based on ED-13:

```
emoji_modifier_sequence := emoji_modifier_base emoji_modifier
emoji_modifier_base := \p{Emoji_Modifier_Base}
emoji_modifier := \p{Emoji_Modifier}
```

Additional quote from UTS #51:

> To have an effect on an emoji, an emoji modifier must immediately follow
> that base emoji character. Emoji presentation selectors are neither needed
> nor recommended for emoji characters when they are followed by emoji
> modifiers, and should not be used in newly generated emoji modifier
> sequences; the emoji modifier automatically implies the emoji presentation
> style.

Our precomputed grapheme break table was mistakingly not following this
rule. This commit fixes that by adding a check for that every
`Emoji_Modifier` character must be preceded by an `Emoji_Modifier_Base`.
This only has a cost during compilation (table generation). The runtime
cost is identical; the table size didn't increase since we had leftover
bits we could use.
2024-12-12 12:53:08 -08:00
Mitchell Hashimoto
004405ccf9 terminal: only apply VS15/16 to emoji
Fixes #1482
2024-02-10 17:26:45 -08:00
Mitchell Hashimoto
5275d44e7d unicode: precompute grapheme break data 2024-02-09 20:50:13 -08:00
Mitchell Hashimoto
132fbb3a46 unicode: use packed struct for break state 2024-02-09 20:29:36 -08:00
Mitchell Hashimoto
c47ad97f62 unicode: remove unused 2024-02-09 20:23:29 -08:00
Mitchell Hashimoto
5f3574a4bf unicode: direct port of ziglyph to start 2024-02-09 19:44:57 -08:00
Mitchell Hashimoto
0632410857 unicode: get grapheme boundary class 2024-02-09 12:22:23 -08:00
Mitchell Hashimoto
9755d0696e unicode: generate our own lookup tables 2024-02-08 21:01:11 -08:00