UUID Version 7 Implementation and Conundrums

During otherwise uninteresting summer, without too much noise, we got new UUID version(s). While 6 and 8 are nice numbers, version 7 got me intrigued. It's essentially just a combination of Unix timestamp with some random bits mixed in. Exactly what a doctor might order if you want to use such UUID as a primary key in a database whose index you don't want to fragment to hell.

Format is easy enough as its ASCII description would suggest:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           unix_ts_ms                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          unix_ts_ms           |  ver  |       rand_a          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var|                        rand_b                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            rand_b                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

It's essentially 48 bits of (Unix) timestamp, followed by 74 bits of randomness, with 6 remaining bits being version and variant information. While timestamp ensures that IDs generated close in time get sorted close to each other, randomness is there to ensure uniqueness for stuff that happens at the same millisecond. And, of course, I am simplifying things a bit, especially for rand_a field but you get the gist of it.

That all behind us, let me guide you through my implementation of the same.

Generating the first 48-bits is straightforward with the only really issue being endianness. Since BinaryPrimitives doesn't really deal with 48-bit integers, a temporary array was needed.

var msBytes = new byte[8];
BinaryPrimitives.WriteInt64BigEndian(msBytes, ms);
Buffer.BlockCopy(msBytes, 2, Bytes, 0, 6);

Generating randomness has two paths. Every time new millisecond is detected, it will generate 10 bytes of randomness. The lowest 10 bits of first 2 bytes will be used to initialize random starting value (as per Monotonic Random method in 6.2). After accounting for the 4-bit version field that shares the same space, we have 2 bits remaining. Those are simply set to 0 (as per Counter Rollover Guards in the same section). I decided not to implement "bit stealing" from rand_b field nor to implement any fancy rollover handling. If we're still in the same millisecond, 10-bit counter is just increased and it's lower bits are written.

if (LastMillisecond != ms) {
    LastMillisecond = ms;
    RandomNumberGenerator.Fill(Bytes.AsSpan(6));
    RandomA = (ushort)(((Bytes[6] & 0x03) << 8) | Bytes[7]);
} else {
    RandomA++;
    Bytes[7] = (byte)(RandomA & 0xFF);
    RandomNumberGenerator.Fill(Bytes.AsSpan(8));
}

Despite code looking a bit smelly when it comes to multithreading, it's actually thread safe. Why? Well, it comes to both RandomA and LastMillisecond field having a special ThreadStatic attribute.

[ThreadStatic] private static long LastMillisecond;
[ThreadStatic] private static ushort RandomA;

This actually makes each thread have a separate copy of these variables and thus no collisions will happen. Of course, since counters are determined for each thread separately, you don't get a sequential output between threads. Not ideal, but a conscious choice to avoid a performance hit a proper locking would introduce.

The last part is to fixup bytes in order to add version bits (always a binary 0111) and variant bits (always a binary 10).

Bytes[6] = (byte)(0x70 | ((RandomA >> 8) & 0x0F));
Bytes[8] = (byte)(0x80 | (Bytes[8] & 0x3F));

Add a couple of overrides and that's it. You can even convert it to Guid. However...

Microsoft's implementation of UUID know to all as System.Guid is slightly broken. Or isn't. I guess it depends from where you look at it from. If you look at it as how RFC4122 specifies components, you can see them as data types. And that's how the original developer thought of it. Not as a binary blob but as a structure containing (little-endian on x86) numbers even though the specification clearly says all numbers are big endian.

Had it stopped at just internal storage, it would be fine. But Microsoft went as far as to convert endianness when converting 128-bit value to the string. And that is fine if you work only with Microsoft's implementation but it causes issues when you try to deal with almost any other variant.

This also causes one peculiar problem when it comes to converting my version 7 UUID to Microsoft's Guid. While their binary representations are the same, converting them to string format yields a different value. You can have either binary or string compatibility between those two. But never both. In my case I decided that binary compatibility is more important since you should really be using UUID in its binary form and not space-wasting hexadecimal format.

As always, the full code is available on GitHub.

[2023-01-12: Code has been adjusted a bit in order to follow the current RFC draft. Main changes are introduction of longer monotonic counter and altenate text conversion methods (Base35 and Base58).]

UUID Version 7 Implementation and Conundrums

One thought to “UUID Version 7 Implementation and Conundrums”

Leave a Reply Cancel reply