erxi or how I learned to love the fast testing suite

19. Februar 2026

rust
LLM
AI
w3x
XML
JSON
EXI4JSON
spec
optimization
java

erxi is available on GitHub.

The Mission and the Goalpost
Testing, testing, testing
The Flames of Never-Ending Optimization
So, what gives?
- Long-Term Archiving of Structured, Repetitive Data
Messaging in bandwidth-limited environments
Key message
EXI4JSON
What now?
Project Timeline

The Mission and the Goalpost

One or two months ago I read this wonderful blogpost about the lost art of XML, and it took hold of me (at least for a bit). I work professionally around XML, and I was always convinced that XML was the better option because of the potential for formal verification of messages, and I felt elated that someone made that case far better than I ever could. I was also drawn to the EXI Specification, a way of transforming ordinary XML documents into byte-efficient representations without losing any of the advantages that XML has. It was an idea I was never confronted with, and which made immediate sense as soon as I grasped it. One of those “of course, how obvious” moments that are harder to come by the older you get.

Some time later, I wanted to try to apply the EXI spec or XML more generally to the collaborative creativity apps I am currently developing. They did (and still do) use the Protobuf format for the exchanged messages because of its efficient byte-size. However, it turns out, the best available reference implementation is the Java implementation implemented by the “EXI man” Daniel Peintner (PBUH) called EXIficient, and I was not gonna force a user to load the JVM on his device just so I could flex with my choice of communication protocol. So, the idea arose to implement my own Rust implementation of the spec. Of course, Claude made fun of the idea: Start a whole software project just for the messages? When Protobuf was working perfectly?

But the idea had a certain appeal. First and foremost, I hoped that this was a project where close supervision of the LLM was not as necessary as in my other projects (and boy, was I wrong) because of three facts: 1) There exists a specification document laying out the formal steps to generate EXI files from XML files and vice-versa, 2) there exists a reference implementation, the code of which is freely available, and 3) there exists a full suite of interoperability tests to see if you fucked up. And besides all this, there were also obvious advantages to a Rust implementation: The ability to use it in WASM for web development without any hassle, the ability to use it for memory-capped environments like embedded devices without loading the full JVM, the inherent coolness befalling everybody that achieves a thing like that in the programming language the cool kids code in nowadays. Besides, there was a certain thrill to see if some script kiddie armed with an LLM could outdo the multi-year project of Siemens. So, yes Claude, I WILL embark on a whole software project just for some messages!

So, I did what I always do when I start a project with Claude: start my repository and let Claude implement an implementation plan that is issue-sized, so that I can just give it the consecutive issues with my magic words (hocus pocus for first implementation with planning that is verified with codex via hex hex, fidibus for internal/codex review, and simsalabim for simplification). Claude happily chugged along, and gave a list of 45 issues that were to be worked at consecutively to arrive at a correct and blazingly fast EXI implementation. Where I should have batted my first eye was when the first formal verification of interoperability with EXI happened in issue number 28, already halfway through the project. Turns out, that’s way too late because your LLM WILL fuck up before that point, and it’s much harder to fix a bug the later you find it. But I was young and innocent then, so I followed the infinite wisdom of our robot overlords.

It took some three days of mainly writing my magic words and making some decisions here and there (mainly to tell Claude to be less lazy) until we arrived at Issue 28, and all hell broke loose. Those were the first real interoperability tests with the reference spec, and they were brutal. Turns out, even if you have a well-defined spec lying there as an electron on memory cell copy of the literal truth, ready to be inspected at every convenient moment, LLMs just choose to interpret this however. Bugs abounded, and it took me a full day to fix them all, with constant LLM chugging. But okay, in the end it worked! The first steps were successful! Let’s fucking do this!

Testing, testing, testing

At this point, I want to emphasize the need for constant, true tests from the beginning. And I don’t mean telling the LLM to achieve 100% test coverage, I mean real-world, non-cheatable tests that cover what you really want to achieve. Coding with LLMs can ease the burden for the HOW, but you constantly need to remind them of the WHAT, and automatic tests, ideally with traces attached, are your easiest tool for them. And the thing is, if you start a project, you know what you want to achieve. So pour that into a suite of tests, and make them run fast. When I finally implemented the W3C test suite for EXI, I made the mistake to load the whole damn JVM for every goddamn fixture, making it run for more than 10 minutes. If it takes you 10 minutes to check for regressions, you push it to the end of the whole workflow, and by then you have made so many changes that it is very hard to pinpoint where the regressions come from. Time poured into a useful test suite WITH as much debug information as possible attached to every run is NEVER wasted if you’re coding with an LLM. Bonus points if there is already a readily available reference implementation you can also shove tracing code into…

Well, let me tell you, afterwards it was smooth sailing again! We rushed through the rest of the issues, until we were at the full interop tests with schema-informed mode again. By this point, I was sure that this phase would take a long time, maybe even two days, because of the experience I had with the first interop phase. But in the end, it took 5 days, almost longer than the whole project took before that. It didn’t help that this phase coincided with the training period for Claude 4.6, during which the performance on the other models degraded. I also noticed the LLM to be very lazy and starting to fine-tune the code for the tests, but with the help of Codex and a lot of oversight I managed to land at functional parity with EXIficient. A few commits before, I started to wonder how I could test the performance of my new baby against the corporate overlord, and I devised a benchmark, downloading Wiki dumps and generating some sensor data files, and letting some well-known compression algorithms and EXIficient loose on them, measuring wall time, peak memory usage, and throughput. The first runs (I have included the data here) showed that EXIficient did very, very well on the large Wiki dumps. So naturally, my curiosity for seeing the first runs of erxi was insatiable, as I was pretty sure that Rust code could pretty easily beat Java on performance. And in that I was - disappointed. The first runs did pretty, pretty terrible. Important life lesson learned - choice of language may influence performance, but performance is much more influenced by the sheer stupidity of your code. And never, ever be dependent on the LLM “just working”. The only thing that counts is to get a real-world testing suite in, and let the LLM run against it and come out smarter.

The “Before” (Commit `8c23728`)

The initial implementation was a memory-hungry monster. For a simple 10 MiB dataset, it behaved like this:

Metric	Value
Wall Time	14.8s
Peak Heap	3.60 GB
RSS	3.64 GB
Allocations	129 Million

In short: it was slow, it was bloated, and it was embarrassing.

The Flames of Never-Ending Optimization

So, now the new goalpost was clear: Show that Rust actually could beat Java. For this, a lot of optimization was needed.

It took me a lot of time to land on my perfect setup for optimization with Claude, but the main takeaways are: Generate a small dataset for optimization (optimally 2-3 different sets with different characteristics) that you can run in under 2 minutes, and always generate LLM-readable output with the runs (SVGs of flamegraphs, for example, are ideal). This way your coding slave can generate its own suggestions for improvements and always check them. I’ll include a history of the changes and improvements as well as a summary of the main improvements here. And by “it took me a lot of time” I of course mean that my stupidity knew no bounds! My first idea was to let erxi run against EXIficient on a 30 GB (!) Wiki dump, to show its superiority. You can see the results. Small realistic testing suite, fast run, as much machine-readable information as you can per test run. Get it into your head, and be smarter than me.

The “After” (Phase 7b, Commit `5af9827`)

After 11 days of never-ending optimizations, the numbers changed drastically:

Metric	Before (`8c23728`)	After (`5af9827`)	Improvement
Wall Time (Encode)	14.8s	1.10s	13.5x
Peak Heap	3.60 GB	20 KB	184,000x
RSS	3641 MB	30 MB	121x
Allocations	129M	1029	125,000x

Key Optimization Fields

To get there, I had to rethink my whole life, and also the way in which XML is turned into EXI. Here are the fields that brought the most significant gainz (>20%):

1. Bitstream u64-Accumulator

Instead of writing bits directly to a Vec<u8> (which involved constant bounds checking and byte-level manipulation), I implemented a u64 accumulator. It buffers bits in a local register and only flushes to the vector when there are full bytes.

Impact: ~73% speed improvement in the encoder.

2. Zero-Alloc Tier 2 Grammars

The EXI spec uses “Tier 2” grammars for schema-informed encoding. Initially, these were materialized as fresh Vec<Terminal> for every element. I tried caching them with Arc, but the overhead of atomic reference counting was actually slower. The breakthrough was moving to a zero-allocation approach: calculating offsets directly from the Tier 1 grammar without ever creating a new collection.

Impact: 1.25x faster decoding, 42% fewer allocations.

3. QName Interning & Fast-Paths

I introduced a fast-path for ASCII strings and an internal cache for frequently used QNames (Qualified Names). This avoided repeated string allocations and expensive UTF-8 validation during the hot-path of event processing.

4. Input Streaming

By moving from “load-everything-to-string” to a buffered reader approach (using quick-xml’s BufRead and my own periodic flushes), I enabled erxi to handle files that are much larger than the available RAM.

After 6 days of continuous improvements I arrived at the goal: An EXI implementation in Rust, with 2-3x runtime improvements and a lot less RAM consumption than EXIficient. 2 weeks to beat the corporate giant. Now, what can this be used for?

erxi vs EXIficient

Here is the raw data comparing erxi (Rust) and EXIficient (Java) across various scales. Note the performance dominance of Rust in schemaless modes and the memory stability at large scales.

Input Size	Mode	Implementation	Encode	Decode	Peak RAM (RSS)
10 MB	Compression	erxi	0.51s	0.13s	57 MB
		EXIficient	2.08s	0.99s	222 MB
	Bitpacked	erxi	0.10s	0.09s	16 MB
		EXIficient	0.41s	0.30s	126 MB
	Schema-Bitpacked	erxi	1.47s	1.28s	74 MB
		EXIficient	0.46s	0.37s	197 MB
1 GB	Compression	erxi	48.0 MB/s	125.9 MB/s	3.1 GB
		EXIficient	5.9 MB/s	11.9 MB/s	3.4 GB
	Bitpacked	erxi	93.0 MB/s	186.6 MB/s	1.1 GB
		EXIficient	52.9 MB/s	69.0 MB/s	2.8 GB
10 GB	Compression	erxi	16.2 MB/s	68.3 MB/s	15.0 GB
		EXIficient	OOM	OOM	OOM

* All benchmarks performed on AMD Ryzen 7 3700X. “OOM” signifies a crash or stall due to memory exhaustion.

So, what gives?

Now, to be honest, if you want to compress stuff without thinking too much about it, I think xz or zstd are probably better choices. In most settings, they have similar compression rates against EXI, with similar or better runtimes and less RAM used. If you want to compress stuff fast, LZ4 or pigz are much better choices. So was it all for naught? I mean, I had a lot of fun and learned a lot of stuff about development and optimization, so of course not. But in real-world applications, I don’t think there is too much out there. For me, there are two main fields that EXI can really shine:

Long-Term Archiving of Structured, Repetitive Data

First of all is long-term (I mean disk, not RAM) storage of huge amounts of structured, repetitive data (think banks, robotics, government offices, that kind of stuff). In this regard, no other algorithm can beat the compression rates of EXI because it has schema-aware compression, which the other algorithms don’t.

Benchmark: Realistic Sensor Data (1 GB)

In this run, I compared standard MT-compressors against erxi combined with xz and zstd on simulated, realistic sensor data.

Tool	Format	File Size	Ratio	Encode Time
xz-6-mt	Generic	78 MB	13.24:1	37.5s
zstd-19-mt	Generic	79 MB	13.12:1	83.2s
pbzip2-9	Generic	69 MiB	14.91:1	8.78s
erxi-precomp + xz-6	Hybrid	46 MB	22.38:1	45.5s
erxi-precomp-schema + xz-6	Hybrid	45 MB	23.02:1	51.9s

Findings

As you can see, pbzip blows xz or zstd out of the water on this one, taking a lot less time (almost a tenth of zstd) and achieving better compression results. This was not the case on the Wiki dumps, so I think pbzip also shines on structured, repetitive data. However, what is also apparent is that nothing can beat EXI precomp and xz on these kinds of datasets, reaching compression ratios of up to 23:1.

Another interesting takeaway: while schema-informed mode provides the absolute best compression (23.02:1), the advantage over the schemaless pre-compression (22.38:1) is minimal (less than 3%). However, the performance penalty is real—encoding takes about 15% longer and decoding is significantly slower. I think this happens because EXI learns the schema of these datasets pretty fast even if you don’t supply them explicitly, since it is not overly complicated, and the extra size for storing the schema information is insignificant on these datasets. So, for massive datasets, you should carefully consider if that last 1% of disk space is worth the extra CPU cycles. Often, sticking to schemaless pre-compression is the sweet spot.

Messaging in bandwidth-limited environments

The other one is messaging in bandwidth-limited IoT devices, as you get all the advantages of XML (controlled input, easier parsing) without the overhead.

Benchmark: Startup & Small Messages

For IoT devices, the “warm-up” time of a JVM is a dealbreaker. erxi provides instant-on performance with a tiny memory footprint.

Feature	erxi (Rust)	EXIficient (Java)
Startup Time	~1-2ms	~200-300ms (JVM)
RSS (Small Msg)	~16 MB	~120 MB
Binary Size	~4 MB	~100 MB+ (JRE)

Ironically, both areas are exactly the areas where EXIficient is worse - it doesn’t work on large datasets because it is not optimized for it, and for small messages, you always need to load the whole JVM in memory and get no advantages of JIT to show for it.

Key message

I feel I repeat myself too often, but TESTING. No choice of language or LLM guarantees good performance or feature completeness. The only source of truth are concrete, real-world tests, the output of which your LLM can read and use as a basis for optimization/bug hunting. This is the only way you will make progress. The LLM (no matter which) can, and abso-fucking-lutely WILL do the most mind-bogglingly stupid shit you have ever seen, and take the laziest shortcuts known to machine. The only way to get usable output from it (besides coding yourself, and let’s face it, we are in 2026, no one is coding anymore) is to let it actually SEE what it did wrong, so it can get it right in the tenth attempt. So having a test suite is unavoidable, but the upside is: Your LLM buddy can help you with it! Just tell it what you want to test and how, and it will do most of the actual work.

EXI4JSON

While I was wrapping this up, I added an EXI4JSON implementation so JSON workflows can be bridged into EXI without first authoring XML. That makes it easier to adopt EXI in projects that are already JSON-first, while still keeping the compression benefits of the EXI stack (schema-based validation still only applies in XML schema-informed mode, not in EXI4JSON).

What now?

I plan to use it in my webapps for messaging, as it is also compilable into WASM. As for development of erxi itself: I just added an EXI4JSON implementation so JSON messages can be compressed via EXI, and next I want to build a visualizer to show the translation of XML (and JSON) messages into EXI streams and back. In the far future, I also want to turn my eyes towards a whole XML stack in Rust, as I had so much fun with this one.

If you want to give me a lot of money for optimizing your slow as hell software or using erxi in any of the aforementioned fields, hit me up. erxi is available on GitHub. If you want to use it commercially, you have to pay.

Project Timeline

Date	Milestone
Feb 19	EXI4JSON implementation added
Feb 18	Benchmarks finalized
Feb 17	Optimization phase ended
Feb 10	Interoperability achieved
Feb 05	W3C Interop tests start
Feb 04	Early interop failed
Feb 04	Early interop started
Feb 02	Project start

Reaktionen

Noch keine Kommentare.

Table of Contents