This uses an LRU to cache text shaping results per row. Rows are marked "dirty" and the cache invalidated whenever any cell on a row changes. Rows are identified by an ID in the screen circular buffer and not by y-value so even as the screen scrolls, the row cache remains valid. Benchmarks via tracing:
* Heavy output (`cat <big>.txt`) - ~2% slower due to overhead of low hit rate on cache
* Mostly static screen with only text - ~10% faster, high cache hit rate
* Mostly static screen with a mix of ligatures, emoji, text - ~40% faster, high cache hit rate
Terminals are "mostly static". Usually we sit with our prompt sitting there flashing, just re-rendering the history that already exists. We might scroll back a bit to see some recent results. Ultimately, `cat <big>.txt` is not that common so I think trading 2% slowdown on that for 10-40% speedup on the common case is worth it.
These percentages can eventually be thought of as direct FPS improvements (40% more FPS) once I split tty stream processing and rendering into separate threads and remove the FPS limiter that exists today.
This introduces a naive first pass at integrating ligatures. The basic
ligatures (such as "==" in some fonts) work great. Skin-toned emoji are
struggling a bit.
This isn't the most performant way to do this, either, and I plan on
improving that.
This rewrites the "screen" implementation. The immediate user-impacting highlights:
* `time cat <large file>.txt` is ~25% faster
* Fully loaded (full scrollback) memory usage is 25% smaller
* Scrollback is now dynamically allocated. This lowers initial memory usage
* Resize without reflow (common with full screen apps like vim) is now ~10% faster
* Resize with text reflow is ~20% faster when the column count doesn't change
* All resize operations now maintain the cursor position correctly in more cases
Besides those impacts, the goals of this rewrite was to lay the foundation for the screen state to performantly handle text shaping (ligatures) and grapheme clusters (unicode zero-width joiners, combination markers, etc.). This new structure will let us track dirty state on a row-by-row basis for better caching of GPU resources, eventually (not in this PR), which is particularly important for text shaping (ligatures).