* This completes ours multi-threaded architecture started with #21 by moving pty IO to a dedicated thread.
* A bounded, blocking queue implementation is introduced for inter-thread communication.
* The Window thread (which previously also did IO) no longer uses libuv, and is purely a windowing event loop.
## Performance
On IO heavy workloads such as `cat big-file.txt`, throughput has increased by more than 400%. In general, I'm noticing more consistent frame rates across all workloads, with dramatic differences in IO heavy workloads.
## Architectural Notes
There are now three threads per window as shown below:
```
┌─────────────────────────────────┐
│ │
│ Window │
│ │
└─────────────────────────────────┘
│
┌───────────────────┴──────────────────┐
│ │
▼ ▼
┌─────────────────────────────────┐ ┌─────────────────────────────────┐
│ │ │ │
│ Renderer │◀───│ IO │
│ │ │ │
└─────────────────────────────────┘ └─────────────────────────────────┘
```
Notes:
* The window thread is responsible purely for windowing events: focus, mouse movement, keyboard input, etc.
* The IO thread is responsible for pty IO such as reading and writing to the pty fd. This thread also owns the terminal state.
* The renderer is responsible for turning terminal state into screen data, and auxiliary visual functions such as cursor blink.
The arrows in the diagram above show how threads can communicate. Communication is done through one-way, MPSC (multi-producer single-consumer) bounded queues. The MPSC queue implementation is _not optimal_ and can be improved but our workload is not a message-heavy workload.
Threads _also use shared memory_, as noted in #21. For large data structures such as terminal state, mutexes are utilized to avoid large copies.
Ghostty now loads the config file in `$XDG_CONFIG_HOME/ghostty/config` if it exists on startup. This follows the XDG base dir specification so if $XDG_CONFIG_HOME is not set, we default to `$HOME/.config/ghostty/config`.
This implements a pure Metal renderer for macOS targets.
Performance:
- Average frame time: 0.7ms (Metal) vs. 1.5ms (OpenGL)
- Average fps while `cat`-ing a 1GB file (vsync disabled): 100 (Metal) vs. 70 (OpenGL)
* Note: while the frame time is 2x faster in Metal, the FPS is not 2x for what I assume to be lock contention on terminal state.
Why?
- OpenGL has been deprecated on macOS since 2018.
- All OpenGL has to go through a Metal translation layer anyways, which has a non-zero cost.
- There is a bug on Mac where rendering OpenGL on a separate thread from the windowing thread can cause crashes, so most OpenGL software just don't multi-thread render on Mac.
- Metal is more explicit about resource management compared to OpenGL, so we gain performance.
- Metal is much more multi-thread friendly, so our multi-threaded renderer works great! (with resizes!)