replace-custom-serialization
Replace Bespoke Trace Serialization with External Library
Status: in-progress
Date: 2022-10-16
Deciders: Matthew
Issue
To enable reverse debugging, we must capture any change to architectural state. The current trace format contains all relevant architectural state, as well as metadata to traverese the log in either direction.
Decision
We will replace our existing trace serialization code with zpp::bits.
Assumptions
The binary trace format does not need to be portable to other machines. An offline translation step to a portable format is allowed.
In-memory trace format does not need to be stable between builds.
We are already targetting C++20 for concepts, so projects depending on this standard are allowed.
The speed at which we write traces is more important than the speed at which we read them. I guess 99.9% of all traces will never be read/used by a user.
Constraints
The trace buffer must be iterable from beginning-to-end, allowing replay of a recorded run.
The trace buffer must be iterable from end-to-beginning, allowing instructions to be undone. In practice, this means that the trace buffer needs to support random access to read the last entry.
Accessing traces for undo must be in O(1), as undo will be the most common user operation on a trace buffer.
An individual trace must not use dynamic memory allocations. Practical worst case could generate ~100M traces per second, in which case memory allocation would comprise a signifcant overhead.
Positions
Continue using our bespoke serialization
While our internal serialization format "works", it suffers from several drawbacks
We will need on our memory allocator if we want something like circular buffers of traces
Decoding a trace packet is a good bit of bit magic, very prone to undefined behavior
We must maintain our own trace buffer, whose locking scheme is somewhat complex
Use Protocol Buffers
It has built-in support in new versions of Qt.
It is a widely used and well-understood library.
Requires data specification in a DSL.
The resulting protobufs can be read on any systems.
Of general-purpose serialization frameworks, it has the worst performance.
The resulting library code from protocol buffers can be large.
Use FlatBuffers
Removes some of the encoding complexity of protobufs.
It is a widely used and well-understood library.
Codebase is very large
Resulting flatbufs can be read on any system.
Requires data specification in a DSL.
Usage in our project would require compiling the DSL compiler as a 3rd-party dependency.
Access to underlying buffers appears non-trivial
Use CapNProto
Removes some of the encoding complexity of protobufs.
It is a widely used and well-understood library, but less-so than FlatBuffer.
Codebase is very large
Resulting can be read on any system.
Requires data specification in a DSL.
Usage in our project would require compiling the DSL compiler as a 3rd-party dependency.
Inheritance is difficult to describe
Use zpp::bits
Header-only library using C++20
Serialization code is ~5k LoC
Modification is rather easy, I have already submitted a PR
Community is small
Library provides accress to underlying data storage, and you can manually move the position within those buffers.
Argument
In order to iterate backwards in O(1) time, we need access to the underlying buffer, with the ability to decode random bytes. I was not able to implement this in limited time in anything but zpp::bits and our current bespoke serialzer.
However, let's throw the O(1) assumption out of the window. When we start writing a trace for an instruction, we will not know until the entire cycle is finished which/how many traces will be created. If we can only write once, we will have to collect traces from the entire cycle and serialize them together with some library-dependent metadata to help you find the previous trace. Without deep control of the serialization internals, we are likely limited to writing once. While we may be able to use some static byte array somewhere as a backing store for temp objects, that's no less ugly than I have today.
Additionally, encoding data in a architecture-agnostic way is not going to be a zero-cost abstraction for all systems. For protobuf/flatbuf/capnproto, we are paying for an abstraction we do not need. Additionally, we gain nothing from the forwards/backwards compatibilities provides by these libraries; we explicity assume that the binary representation could change over time in breaking ways.
zpp::bits handles all of the ugly memory-allocation code that I was about to write for our bespoke solution. It also handles most normal C++ classes with ease, and provides the ability to change the behavior of serialization in complex cases. While zpp::bits requires I understand the serialization format (unlike the other listed libraries), it provides arbitrary position control in the buffer. Any limitations of zpp::bits I could overcome with position manipulation and custom serialization code.
zpp::bits has the smallest codebase of any listed option, meaning I am most likely to be able to maintain that library in face of abandonment. Lastly, zpp::bits was the fastest serialization in the listed benchmark,
Implications
No libraries will allow us to user our current trace format as-is, so these elements of the API will need to be re-written. In the process of updating the trace API, we can make some improvements regarding allocation failures, which will be another ADR. Switching to a 3rd-party introduces a risk that we may need to fork the code and maintain it ourselves.
Notes
Benchmarks
See Github
GCC 11 (Ubuntu 20.04 x64)
bitsery
general
70904B
6913B
1470ms
1524ms
boost
general
279024B
11037B
15126ms
12724ms
cereal
general
70560B
10413B
10777ms
9088ms
flatbuffers
general
70640B
14924B
8757ms
3361ms
msgpack
general
89144B
8857B
2770ms
14033ms
protobuf
general
2077864B
10018B
19929ms
20592ms
zpp_bits
general
52192B
8413B
733ms
693ms
zpp_bits
fixed buffer
48000B
8413B
620ms
667ms
Clang 12.0.1 (Ubuntu 20.04 x64)
bitsery
general
53728B
6913B
2128ms
1832ms
boost
general
237008B
11037B
16011ms
13017ms
cereal
general
61480B
10413B
9977ms
8565ms
flatbuffers
general
62512B
14924B
9812ms
3472ms
msgpack
general
77384B
8857B
3563ms
14705ms
protobuf
general
2032712B
10018B
18125ms
20211ms
zpp_bits
general
47128B
8413B
790ms
715ms
zpp_bits
fixed buffer
43056B
8413B
605ms
694ms
deserialization using
brief\_syntax, similar tocerealforward/backward compatibility enabled for
Monsterall components of Vec3 is compressed in [-1.0, 1.0] range with precision 0.01
use non-resizable buffer uint8_t[150000] for serialization
use stream input/output adapter, underlying type is std::stringstream
on deserialization do not check for errors
check buffer size on reading, but writing buffer is preallocated std::array<uint8_t, 1000000>
doesn't check for buffer size when reading, buffer: std::array<uint8_t, 1000000>
use std::stringstream's internal std::string
use arena allocator
use yas::mem_<io>stream as buffer
with yas::no_header and yas::compacted
using std::stringstream
Last updated
Was this helpful?