Making CSS URL Unique

I already wrote how I switched to 11ty. For most of time, I am still in the honeymoon phase. Essentially the only major issue (still) remaining is missing public comment system. Whether I will solve this or not, remains to be seen.

But the most annoying issue for me was not really 11ty’s fault. At least not completely. It was more interaction between 11ty, CloudFlare caching layer, and my caching policies.

You see, for site’s performance, it’s quite beneficial to server CSS files from cache whenever possible. If you can cache it, you don’t need to transfer it. That means site is that much faster. And that is good.

However, due to multiple reasons that include preloading and long life I give to CSS files, this also means that my CSS changes don’t always propagate immediately. And thus, my CSS changes would occasionally work just fine locally, only to be invisible on public internet. At least until cache expires or all layers agree that loading the new version is in order.

My workaround was simple - every time I significantly changed CSS file, I would also change its name, thus forcing the update. But I wanted something that can be more easily automated. So, I decided to use query string.

My site doesn’t really use any query strings. But caching layers don’t know that. If they see a file with a new query string, they will treat it as a completely new cache entry. So, I added a step to my build process that will add random query string to each CSS file. Something like this:

CSS_UNIQ=$(date +%s | md5sum | cut -d' ' -f1)
find ./_site -type f -name "*.html" -exec \
  sed -Ei 's|(link rel="stylesheet" href="/\S+\.css)"|\1?id=$CSS_UNIQ"|g' {} +

This step comes after my 11ty site has already been built. My unique ID is just a current time. In order to make it a slightly obscure, I hash it. For caching purposes, this hasing is completely unnecessary. But, to me, seeing hash instead of integer just looks nicer so I use it. You can use whatever you want - be it git commit, hash of a CSS file (not a bad idea, actually), or any other reasonably unique source. Remember, we don’t really need to have it cryptographically secure - just different from run to run.

With ID in hand, using find, we go over each .html file and update its CSS links. This is where sed comes in - essentially any CSS that is part of a link element will just ?id= appended to it.

This code can be improved. One example already mentioned is tying ID to a hash of CSS file. Another might be just not updating ID if CSS files haven’t changed. And probably many more other optimizations that will help. But this code is a good starting point that can be adjusted to fit your site.

Lock Object

Lock statement existed in C# from the very beginning. I still remember the first example.

lock (typeof(ClassName)) {
    // do something
}

Those who use C# will immediatelly yell how perilous locking on the typeof is. But hey, I am just posting an official Microsoft’s advice here.

Of course, Microsoft did correct their example (albeit it took them a while) to now common (and correct) pattern.

private object SyncRoot = new object();lock (SyncRoot) {
    // do something
}

One curiosity of C# as a language is that you get to lock on any object. And here we just, as a convention, use the simplest object there is.

And yes, you can improve a bit on this if you use later .NET versions.

private readonly object SyncRoot = new();lock (SyncRoot) {
    // do something
}

However, if you are using C# 9 or later, you can do one better.

private readonly Lock SyncRoot = new();lock (SyncRoot) {
    // do something
}

What’s better there? Well, for starters we now have a dedicated object type. Combine that with a code analysis and now compiler can give you a warning if you make a typo and lock onto something else by accident. And also, …, wait …, wait …, yep, that’s it. Performance in all of these cases (yes, I am exluding typeof one) is literally the same.

As features go, this one is small and can be easily overlooked. It’s essentially just a syntatic sugar. An I can never refuse something that sweet.

Modulo or Bitwise

I had an interesting thing said to me: “Did you know that modulo is much less efficient than bitwise comparison?” As someone who spent time I painstakingly went through all E-series resistor values to find those that would make my voltage divider be power of 2, I definitely saw that in action. But, that got me thinking. While 8-bit PIC microcontroller doesn’t have a hardware divider and thus any modulo is a torture, what about modern computers? How much slower do they get?

Quick search brought me a few hits and one conclusive StackOverflow answer. Searching a bit more brought me to another answer where they even did measurements. And difference was six-fold. But I was left with a bit of nagging as both of these were 10+ years old. What is a difference you might expect on a modern CPU? And, more importantly for me, what are differences in C#?

Well, I quickly ran some benchmarks and results are below.

TestParallelMeanStDev
(i % 4) == 0No202.3 us0.24 us
(i & 0b11) == 0No201.9 us0.12 us
(i % 4) == 0CpuCount206.4 us7.78 us
(i & 0b11) == 0CpuCount196.5 us5.63 us
(i % 4) == 0CpuCount*2563.9 us7.90 us
(i & 0b11) == 0CpuCount*2573.9 us6.52 us

My expectations were not only wrong but slightly confusing too.

As you can see from table above, I did 3 tests, single threaded, default parallel for, and then parallel for loop with CPU overcommitment. Single threaded test is where I saw what I expected but not in amount expected. Bitwise was quite consistently winning but by ridiculous margins. Unless I was doing something VERY specific, there is no chance I would care about the difference.

If we run test in Parallel.For, difference becomes slightly more obvious. And had I stayed just on those two, I would have said that assumption holds for modern CPUs too.

However, once I overcommitted CPU resources, suddely modulo was actually better. And that is something that’s hard to explain if we take assumption that modulo just uses divide to be true.

So, I decided to sneak a bit larger peek - into .NET CLR. And I discovered that bitwise operation was fully omitted while modulo operation was still there. However, then runtime smartly decided to remove both. Thus, I was testing nothing vs. almost nothing.

Ok, after I placed a strategic extra instructions to prevent optimization, I got the results below.

TestParallelMeanStDev
(i % 4) == 0No203.1 us0.16 us
(i & 0b11) == 0No202.9 us0.06 us
(i % 4) == 0CpuCount1,848.6 us13.13 us
(i & 0b11) == 0CpuCount1,843.9 us6.76 us
(i % 4) == 0CpuCount*21,202.7 us7.32 us
(i & 0b11) == 0CpuCount*21,201.6 us6.75 us

And yes, bitwise is indeed faster than modulo but by really low margin. The only thing new test “fixed” was that discrepancy in speed when you have too many threads.

Just to make extra sure that the compiler wasn’t doing “funny stuff”, I decompiled both to IL.

ldloc.1
ldc.i4.4
rem
ldc.i4.0
ceq
ldloc.1
ldc.i4.3
and
ldc.i4.0
ceq

Pretty much exactly the same, the only difference being usage of and for bitwise check while rem was used for modulo. In modern CPUs these two instructions seem pretty much equivalent. And when I say modern, I use that lossely since I saw the same going back a few generations .

Interestingly, just in case runtime changed those to the same code, I also checked modulo 10 just to confirm. That one was actually faster than modulo 4. That leads me to believe there are some nice optimizations happening here. But I still didn’t know if this was .NET framework or really something CPU does.

As a last resort, I went down to C and compiled it with -O0 -S. Unfortunately, even with -O0, if you use % 4, it will be converted for bitwise. Thus, I checked it against % 5.

Bitwise check compiled down to just 3 instructions (or just one if we exclude load and check).

movl	-28(%rbp), %eax
andl	$3, %eax
testl	%eax, %eax

But modulo went crazy route.

movl	-28(%rbp), %ecx
movslq	%ecx, %rax
imulq	$1717986919, %rax, %rax
shrq	$32, %rax
movl	%eax, %edx
sarl	%edx
movl	%ecx, %eax
sarl	$31, %eax
subl	%eax, %edx
movl	%edx, %eax
sall	$2, %eax
addl	%edx, %eax
subl	%eax, %ecx
movl	%ecx, %edx
testl	%edx, %edx

It converted division into multiplication and gets to remainder that way. All in all, quite impressive optimization. And yes, this occupies more memory so there are other consequences to the performance (e.g. uses more cache memory).

So, if you are really persistent with testint, difference does exist. It’s not six-fold but it can be noticeable.

At the end, do I care? Not really. Unless I am working on microcontrollers, I won’t stop using modulo where it makes sense. It makes intent much more clear and that, to me, is worth it. Even better, compilers will just take care of this for you.

So, while modulo is less efficient, stories of its slowness have been exaggerated a bit.

PS: If you want to run my tests on your system, files are available.

Never Gonna BOM You Up

.NET supported Unicode from its very beginning. Pretty much anything you might need for Unicode manipulation is there. Yes, as early adopters, they made a bet on UTF-16 that didn’t pay off since rest of the world has moved toward UTF-8 as an (almost) exclusive encoding. However, if we ignore a bit higher memory footprint, C# strings made Unicode as easy as it gets.

And, while UTF-8 is not a native encoding for its strings, C# is no slouch and has a convenient Encoding.UTF8 static property allowing for easy conversion. However, if you do use that Encoding.UTF8.GetBytes() function, you will get a bit extra.

That something extra is Byte order mark. Its intention is noble - to help detect endianess. However, its usage for UTF-8 is of dubious help since 8-bit encoding doesn’t really have issues with endianness to start with. Unicode specification itself does allows for one but doesn’t recommend it. It merely acknowledges it might happen as a side-effect of data conversion from other unicode encodings that do have endianness.

So, in theory, UTF-8 with BOM should be perfectly acceptable. In practice, only Microsoft really embraced UTF-8 BOM. Pretty much everybody else decided to have UTF-8 without BOM as that allowed for full compatibility with 7-bit ASCII.

With time, .NET/C# stopped being Windows-only and, by today, became really good multiplatform solution. And now, helper function that ought to simplify things is actually producing output that will annoy many command-line tools that don’t expect it. If you read the documentation, solution exists - just create your own UTF-8 converter instance.

private static readonly Encoding Utf8 = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false);

Now you can call Utf8.GetBytes() instead and you will get expected result on all platforms, including Windows - no BOM, no problems.

So, one could argue that Encoding.UTF8 default should be changed to what is more appropriate value. I mean, .NET is multiplatform and the current default doesn’t work everywhere. One could argue but this default is not changing, ever.

When any project starts, decisions must be made. And you won’t know for a while if those decisions were good. On the other hand, people will start depending on whatever behavior you selected.

In the case of BOM, it might be that developer got so used to having those three extra bytes that, instead checking the file content, they simply use <=3 as a signal file is empty. Or they have a script that takes output of some C# application and just strips the first three bytes blindly before moving it to non-BOM friendly input. Or any other decision somebody made in project years ago. It doesn’t really matter how bad someones code is. What matters is that code is currently working and new C# release shouldn’t silently break somebody’s code.

So, I am reasonably sure that Microsoft won’t ever change this default. And, begrudgingly, I agree with that. Some bad choices are simply meant to stay around.


PS: And don’t let me start talking about GUIDs and their binary format…

CoreCompile into the Ages

For one project of mine I started having a curious issue. After adding a few, admittedly a bit complicated, classes my compile times under Linux shot to eternity. But that was only when running with dotnet command line tools. In Visual Studio under Windows, all worked just fine.

Under dotnet I would just see CoreCompile step counting seconds, and then minutes. I tried increasing log level - nothing. I tried not cleaning stuff, i.e. using cached files - nothing. So, I tried cleaning up my .csproj file - hm… things improved, albeit just a bit.

A bit of triage later and I was reasonably sure that .NET code analyzer are the culprit. Reason why changes to .csproj reduced the time was because I had AnalysisMode set quite high. Default AnalysisMode simply checks less.

While disabling .NET analyzers altogether was out of question, I was quite OK with not running them all the time. So, until .NET under Linux gets a bit more performant, I simply included EnableNETAnalyzers=false in my build scripts.

  -p:EnableNETAnalyzers=false

Another problem solved.