This introduces a new package `src/synthetic` for generating synthetic
data, currently primarily for benchmarking but other use cases can
emerge.
The synthetic package exports a runtime-dispatched type `Generator` that
can generate data of various types. To start, we have a bytes, utf8,
and OSC generator. The goal of each generator is to expose knobs to tune the
probabilities of various outcomes. For example, the UTF-8 generator has
a knob to tune the probability of generating 1, 2, 3, or 4-byte UTF-8
sequences.
Ultimately, the goal is to be able to collect probability data
empirically that we can then use for benchmarks so we can optimize
various parts of the codebase on real-world data shape distributions.
This commit adds a few new mode flags to the `bench-stream` program
to generator synthetic OSC sequences. The new modes are `gen-osc`,
`gen-osc-valid`, and `gen-osc-invalid`. The `gen-osc` mode generates
equal parts valid and invalid OSC sequences, while the suffixed variants
are for generating only valid or invalid sequences, respectively.
This commit also fixes our build system to actually be able to build the
benchmarks. It turns out we were just rebuilding the main Ghostty binary
for `-Demit-bench`. And, our benchmarks didn't run under Zig 0.14, which
is now fixed.
An important new design I'm working towards in this commit is to split
out synthetic data generation to a dedicated package in
`src/bench/synth` although I'm tempted to move it to `src/synth` since
it may be useful outside of benchmarks.
The synth package is a work-in-progress, but it contains a hint of
what's to come. I ultimately want to able to generate all kinds of
synthetic data with a lot of knobs to control dimensionality (e.g. in
the case of OSC sequences: valid/invalid, length, operation types,
etc.).
by adding a `--size` flag we make the size of buffers not comptime known, which prevents certain unrolling optimizations from happening.
secondly, `noinline` creates a more reproducable env for benchmarking by isolating the contents of the functions