Incorporating 3D Gaussian Splats into the graphics pipeline

3D Gaussian splatting is the emerging rendering technique that is overtaking NeRFs. Since it is centered around point primitives, it is more compatible with traditional graphics pipelines that already support point rendering.

Gaussian splats essentially enhance the concept of point rendering by converting the point primitive into a 3D ellipsoid, which is then projected into 2D during the rendering process.. This concept was initially described in 2002 [3], but the technique of extending Structure from Motion scans in this way was only detailed more recently [1].

In this post, I explore how to integrate Gaussian splats into the traditional graphics pipeline. This allows them to be used alongside triangle-based primitives and interact with them through the depth buffer for occlusion (see header image). This approach also simplifies deployment by eliminating the need for CUDA.

Storage

The original implementation uses .ply files as their checkpoint format, focusing on maintaining training-relevant data structures at the expense of storage efficiency, leading to increased file sizes.

For example, it stores the covariance as scaling and a rotation quaternion, necessitating reconstruction during rendering. A more efficient approach would be to leverage orthogonality, storing only the diagonal and upper triangular vectors, thereby eliminating reconstruction and reducing storage requirements.

Further analysis of the storage usage for each attribute shows that the spherical harmonics of orders 1-3 are the main contributors to the file size. However, according to the ablation study in the original publication [1], these harmonics only lead to a modest PSNR improvement of 0.5.

Therefore, the most straightforward way to decrease storage is by discarding the higher-order spherical harmonics. Additionally, the level 0 spherical harmonics can be converted into a diffuse color and merged with opacity to form a single RGBA value. These simple yet effective methods were implemented in one of the early WebGL implementations, resulting in the .splat format. As an added benefit, this format can be easily interpreted by viewers unaware of Gaussian splats as a simple colored point cloud:

Results using a non Gaussian-splat aware renderer

By directly storing the covariance as previously mentioned we can reduce the precision from float32 to float16, thereby halving the storage needed for that data. Furthermore, since most splats have limited spatial extents, we can also utilize float16 for position data, yielding additional storage savings.

With these changes, we achieve a storage requirement of 22 bytes per splat, in contrast to the 44 bytes needed by the .splat format and 236 bytes in the original implementation. Thus, we have attained a 10x reduction in storage compared to the original implementation simply by using more suitable data types.

Blending

The image formation model presented in the original paper [1] is similar to the NeRF rendering, as it is compared to it. This involves casting a ray and observing its intersection with the splats, which leads to front-to-back blending. This is precisely the approach taken by the provided CUDA implementation.

Blending remains a component of the fixed-function unit within the graphics pipeline, which can be set up for front-to-back blending [2] by using the factors (one_minus_dest_alpha, one) and by multiplying color and alpha in the shader as color.rgb * color.a. This results in the following equation:

\begin{aligned}C_{dst} &= (1 - \alpha_{dst}) \cdot \alpha_{src} C_{src} &+ C_{dst}\\ \alpha_{dst} &= (1 - \alpha_{dst})\cdot\alpha_{src} &+ \alpha_{dst}\end{aligned}

However, this method requires the framebuffer alpha value to be zero before rendering the splats, which is not typically the case as any previous render pass could have written an arbitrary alpha value.

A simple solution is to switch to back-to-front sorting and use the standard alpha blending factors (src_alpha, one_minus_src_alpha) for the following blending equation:

C_{dst} = \alpha_{src} \cdot C_{src} + (1 - \alpha_{src}) \cdot C_{dst}

This allows us to regard Gaussian splats as a special type of particles that can be rendered together with other transparent elements within a scene.

References

  1. Kerbl, Bernhard, et al. “3d gaussian splatting for real-time radiance field rendering.” ACM Transactions on Graphics 42.4 (2023): 1-14.
  2. Green, Simon. “Volumetric particle shadows.” NVIDIA Developer Zone (2008).
  3. Zwicker, Matthias, et al. “EWA splatting.” IEEE Transactions on Visualization and Computer Graphics 8.3 (2002): 223-238.

stb_image_resize2.h – performance

Recently there was an large rework to the STB single-file image_resize library (STBIR) bumping it to 2.0. While the v1 was really slow and merely usable if you needed to quickly get some code running, the 2.0 rewrite claims to be more considerate of performance by using SIMD. So lets put it to a test.

As references, I chose the moderately optimized C only implementation of Ogre3D and the highly optimized SIMD implementation in OpenCV.

Below you find time to scale a 1024x1024px byte image to 512x512px. All libraries were set to linear interpolation. The time is the accumulated time for 200 runs.

RGBRGBA
Ogre3D 14.1.2660 ms668 ms
STBIR 2.01632 ms690 ms
OpenCV 4.8245 ms254 ms

For the RGBA test, STIBIR was set to the STBIR_4CHANNEL pixel layout. All libraries were compiled with -O2 -msse. Additionally OpenCV could dispatch AVX2 code. Enabling AVX2 with STBIR actually decreased performance.

Note that while STBIR has no performance advantage over a C only implementation for the simple resizing case, it offers some neat features if you want to handle SRGB data or non-premultiplied alpha.

Do not fall for the Synology Hardware SCAM

I recently needed some NAS and went with the “Synology RS1221+” barebone system. The system is competitively priced when compared to the similar “QNAP TS-873AeU-4G”.

Synology HDD

For storage, the sweet spot between price and capacity was at 18TB. Lets look at some options:

Toshiba MG09ACA 18TB270€
Seagate Exos X X18280€
Synology HAT5310-18T700€

Depending on the benchmark sometimes the Toshiba comes out on top and sometimes the Seagate. Both are similarly priced, so thats fine.
However, talking of the price the Synology HDD stands out by asking a 150% premium.
You might now wonder whether you also get a better performance or other features in return. Well.. guess which is the only 18TB HDD that is verified by Synology for the RS1221+?

The scammy part here however is that the HAT5300 series are just rebranded Toshiba Drives with a different firmware. So the HAT5310 likely is just the MG09ACA and the main difference is the profit margin.
Note that different firmware does not result in any noticeable difference in performance.

I went with the unverified Seagate drives and – as one might expect – there are zero issues with doing so.

Synology RAM

At this point you might say, well Synology just did not get to test more 18TB drives.
Well.. I found the 4GB RAM rather tight and wanted to upgrade to 32GB as RAM is currently quite cheap anyway.

The options here are

Kingston KSM26SED8/16HD50€
Synology D4ECSO-2666-16G350€

I think there appears to be a pattern here. Again, both options have the same specs i.e. DDR4 2666, ECC SO-DIMM. Maybe Synology even rebranded the Kingston modules too, but I did not verify this.

While the DiskManager did not complain about the Seagate HDD, there is a warning when going with Kingston now. I guess this is because it matters even less.

To conclude this, I first want to emphasizes that both the Synology NAS Hardware and their DiskManager software work great with non Synology Hardware – just as one would expect of a standard x86 platform.

It is just a pity that they try to FUD you into buying their overpriced HDD and RAM.
Basically this is the same game as with printer vendors predicting ravages and annihilation when using 3rd party ink.

Logitech M720 Triathlon mouse – long-term review

In this post I want to take a look at the Logitech M720 mouse after having used it for 2.5 years.

Table of Contents

Specs and durability

The specs are pretty common for a mouse you get today, so lets start with the special features:

  • There are side buttons, which I find pretty handy for navigating front/ back in the browser or a file manager
  • It can be paired with up to 3 devices at the same time, which makes it easy to use with your PC, Laptop and Tablet
  • It supports both Bluetooth LE and the Logitech Wireless Receiver
  • It is powered by a single, replaceable AA battery

Especially the last two points make this seem to be future-proof product that you can use for a long time.

Logitech is currently replacing their Wireless Receiver dongles by Logitech Bolt, so in the near future the Wireless Receivers will go away. But thanks to the Bluetooth support you will still be able to use the mouse without having to occupy a USB port just for using it.

Then, using standard AA batteries means that you just use some nice rechargeable ones. This means that you will never have to wait for the mouse to charge and that the mouse can out-live the battery. As you are probably aware from using your phone, rechargeable batteries wear-out over time until the device cannot be properly used any more.

So we finally got a mouse for the years to come? Well..

Built-in obsolescence

Unfortunately, Logitech made some design decision that drastically shorten the life-span of the device, even though they must have known better.

Rubber coating

The most obvious one is likely the rubber coating of the mouse.

Note how the plastic buttons look still perfectly fine in comparison

I took the images for this post after cleaning the mouse. So the dirt you see there is not the skin from my greasy hands, but rather said rubber coating disintegrating.
This is caused by your sweat which is slightly acidic and thus takes hold of the rubber.
There is a reason that Gamepads do not have such coating, even though having good grip is even more important there.
Also, the way the coating is used here, all it does is making the mouse look greasy after some time.

Bad switches

The less obvious issue are the used switches i.e. the things that perform the clicks.
Did you ever notice that after some time your mouse does incorrect double clicks or releases the click while drag and dropping on its own? Well, that means the switch starts wearing out.

The mouse uses OMRON D2FC-F-7N micro-switches in a cheap variant that is only rated for 10 million clicks (10M). While this sounds a lot, it yields to 6850 clicks/ per day for 4 years, which is not all that much if you think about playing a shooter or using photoshop.
The crazy part is that going for the 20M rated variant (2x the durability) only costs 50 ct more (pack of 5 on amazon). This would make the mouse merely 1€ more expensive – probably way less even as Logitech can negotiate bulk discounts on these things.
Given that the mouse is priced at 50€, I do not think we can pass this off as cost optimization.

Note, that even more expensive Logitech Mice, like the MX Master have the rubber coating issue and use the same cheap 10M rated switches.

Introducing ODRS Browser

GNOME Open Desktop Ratings is the service that enables user ratings in various Linux app stores like the Snap-Store, Gnome Software and KDE Discover.

While it nowadays works for users by providing a mostly useful star rating, from a application developer perspective the story is very grim.

Basically one only gets the users view, which provides an average rating and some reviews in the current locale.
This means you might see something like “2 Stars from 80 Reviews” – but the 3 reviews in your current locale are all 4-5 Star.
To see something else you have to change the locale and restart the app store – which is inconvenient and confusing.
As a developer, seeing the negative reviews is crucial, as people often just post bug reports there and this is the only way to find out why the app did not work for them.

Therefore I quickly hacked together a web-based browser for the ODRS service, skillfully named

This allows accessing the ODRS service from the web and shows the reviews from multiple locales at once. The idea here is that often people write reviews in english – regardless of their current locale. Currently, ODRS has no logic to detect that.

Also, if your app is packaged in different formats like snap and flatpack and deb, you can see the reviews of all variants in the overview.

Unfortunately, ODRS currently does not set the CORS header which prevents browsers from accessing it directly. The data that you see right now was scraped with python script. But once this issue is fixed, the ODRS Browser will be able to use live data.

Debugging Python with GDB on Ubuntu

Lets say you want to debug a python process that is either already running or crashing in native code. Pythons PDB is of no help here and you will have to use low-level GDB debugger. Fortunately, it comes with support for debugging high level python scripts.

However, while the actual python-gdb commands are nicely described here, that page lacks important details on how to get python-gdb in the first place. We are merely told that a python-gdb.py is needed.

On Ubuntu/ Debian, this file is included in the python3-dbg package:

sudo apt install python3.10-dbg

Installing that is sufficient, if you use the matching python3 package. You can go ahead and connect to some running python process via:

gdb -p <PID>
# verify that the script is loaded
(gdb) info auto-load
# get a python backtrace
(gdb) py-bt
Traceback (most recent call first):
  File "/usr/lib/python3.10/selectors.py", line 416, in select
    fd_event_list = self._selector.poll(timeout)
  File "/usr/lib/python3.10/socketserver.py", line 232, in serve_forever
...

In case Ubuntu is merely a host and you use coda, you can still use the host python-gdb.py – even if the python versions dont match. You will have to load the script manually though like:

(gdb) source /usr/share/gdb/auto-load/usr/bin/python3.10-gdb.py

Fix Steam Deck Input in Desktop Mode

While older SteamOS releases used to map the right trigger to the left mouse button by default, in current SteamOS you can only click by using the touchpad. However due to the way you hold the device it is really fiddly – especially if you try to drag and drop something.

Fortunately, there is a way to fix this via a setting in Steam. For this you need launch Steam when in Desktop Mode. There, switch to big picture mode and go to

Settings > Base configuration > Desktop Configuratiom

In this view you can configure the inputs to your liking

I suggest you to go with the following setup

  • Right trigger for left click (sounds counter-intuitive, but works well)
  • Left trigger for right click
  • Left touchpad for moving the mouse (doh)
  • Right touchpad for scroll wheel

With this configuration you can use the desktop mostly pain-free.

Using Docker with SLURM

The SLURM documentation provides you with the basic information that you can use Docker withing SLURM – as long as you use rootless Docker. However some crucial pieces are missing.

The issue that you will immediately run into is that the SLURM resource allocation is not propagated to docker at all. E.g. if you start your job with srun --gpus 1 docker ... all GPUs will be available to docker nevertheless.

The issue here is that Docker uses a manager daemon that the docker CLI communicates with. And that daemon does not know anything about SLURM or any resources it allocated for the job.

The solution is to start a daemon per job (instead of per user) as one user might want to run different jobs with different allocations on the same machine. The docker documentation gives you an idea on how to do that.

You will need to set at least the following parameters to make the daemon fully job-specific

# dockerd-rootless.sh requires XDG_RUNTIME_DIR
XDG_RUNTIME_DIR=/somewhere/including/$SLURM_JOB_ID
# export, so docker client sees it later on
export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/docker.sock
dockerd-rootless.sh --host=$DOCKER_HOST --data-root=... --exec-root=...

Here, exporting DOCKER_HOST makes the docker CLI use the correct daemon.

The drawback of this method is that each job needs to pull the container again due to the separate data-root paths. Switching to podman might solve that.

Steam Deck SSD Upgrade

If you, like me, went with the entry level Steam Deck option with only 64 GB of internal storage, you likely realized quite soon that some games wont fit on it.

One option is to use the microSD expansion card slot. For current-gen games the throughput of only about 150 MB/s does not seem to degrade loading performance compared to a NVMe SSD.
However, given that the internal storage is upgradable, the only logical choice for keeping your PC master race status is to cram in the fastest NVME SSD inside that thing.

Specifically, you will need a one-sided SSD in the M.2 2230 for factor so it fits the space inside the Steam Deck.
I went with the KIOXIA Client-SSD BG5 512GB. Kioxia is the Toshiba spin-off for SSD drives, if you wonder about the brand. Although it is a PCIe 4.0 drive, its peak read throughput of 3.5 GB/s is within the practical limits of PCIe 3.0 of the Steam Deck.
Also, the active power consumption of 4.1W is quite close to the 3.8W drawn by the custom PHISON PS5013 E13 SSD that Valve uses.

You can follow the iFixit Guide for the steps to actually swap the SSD. Make sure to transfer the ESD shielding wrap to the new SSD.

To get Steam OS on the new drive, follow the official recovery instructions and select the “Re-image Steam Deck” script.
This will install Steam OS on the blank SSD – similar to how you would install Ubuntu from a live USB.

Benchmarking results

Next, I wanted to actually compare the speed of the upgraded NVMe SSD with the one of the stock eMMC memory. To this end I used KDiskMark – an open-source alternative to CrystalDiskMark that runs on Linux natively.

The tests were performed on SteamOS 3.3.1 using KDiskMark 2.3.0.


In short, the NVME offers roughly one order of magnitude faster throughput over the eMMC.
Whether you feel this in-game, highly depends on the given game. For older titles, even the eMMC is so fast, that you cannot read the hints on the loading-screen. However, for something like the Flight Simulator 2020 that shuffles huge assets around, it will surely be noticeable.

Finally, the peak read performance of 3.5GB/s is not reached. This might be due to the PCIe 3.0 bottleneck – I did not bother putting the drive in a PCIe 4.0 device. Still, there is a significant advantage in writing performance over the older Kioxia BG4 series, that only do 1.4 GB/s.

Drifting with WLtoys 284131 Mini RC-Car

In this post I will discuss how to convert the WLToys 284131 (new K989) into a drift-car.

Table of Contents

Overview

This car is the latest iteration of the K989 (rally car) platform, of which there is also variant specifically for drifting, namely the K969 porsche.
However you still should go with the more recent 284131 as it comes with an upgraded radio that has no dead-zone when compared to the previous one. This will give you better control of the car.
Additionally, the 284131 now has metal ball-heads on the shocks and on the servo horn which allow those parts to move more smoothly.
Then, it comes with a preinstalled light-kit, that helps guessing the direction of the car from far away.
Also, some adjustments were made compared to the K989, to cope with the heat generated by the motor; the transmitter module was rotated by 90° to move it away from the motor and the motor pinion is now all brass, making it more heat resistant.

The only downside is really the ugly hoonitruck chassis, but at least this will be authentic after we do the drift-conversion.

The included battery lasts for about 20min and can be fully charged in about 25min, if your USB charger can deliver 2.5W. If you use an USB port older than 3.0 charging will take much longer.
Note, that even if you get a kit with multiple batteries you should take a break of about 10min between runs to allow the motor to cool down. Otherwise it will break much faster.

Drift conversion

Out of the box, the 284131 is tuned for fast acceleration and handling at high speed

  • The differentials are so stiff, that you can consider them locked. This gives you best acceleration
  • The turning-radius is limited which prevents flipping over at high speed
  • The stiff shocks reduce body-lean, additionally lowering the risk of flipping
  • Traction is mainly achieved by the grippy rubber-tires

For drifting however, we generally run at lower speed and need precise handling there. This basically means undoing all of the choices listed above.

Drifting with the changes suggested in this post

Some of the changes are easy to do, others are more involved

  • Replace the rubber-tires by some hard-plastic ones. We must get rid of some grip to be able to slide sideways. I suggest just going with the K969 tires, that only cost about 5€.
  • Remove the spacers from both front and back suspension to make it soft. This will increase forward grip while drifting.
  • Use the upper hole on the servo-horn to get a tighter turning-radius. Unfortunately the stock ball-head does not fit in the upper hole and you cannot get the old servo-horn any more. I suggest using the “MINI-Q 3.5mm ball-head” instead. It will set you off by about 3€.
  • Most crucially, we need locked differentials in the back and opened differentials in the front. The locked differential will cause the back to lose traction and drift. Contrary, the open differential will keep traction and allow us to control the drift.
    This is a difference to the K969, where both differentials are locked and the car merely slides (like on ice) instead of drifting.

The good news is that the stock diffs are so stiff that you can just keep them in the back and they will behave as if they were locked.

Making differentials work

The bad news is that getting actually working (i.e. open) differentials is not that easy. The cheapest option is to dissemble the stock one and loose it up. For this, I recommend using a (3mm) drill to widen the diff housing. If you try to use sandpaper on the diff arms, you will probably not make it uniform enough to run smoothly.


One thing to watch out when adjusting the diff is that both diff-arms have the same resistance. You can hold down the center and rotate each arm to test this by hand. Also, when assembled, the car should accelerate in a straight line.

If you dont want to go through the hassle, you can also just buy the Mini-Z MD005 diff (15€) and a pair of extended 11mm swing-shafts (10€) to compensate for the shorter diff arms.

If you want the best diff possible, you can go for the Mini-Z MDW018 ball diff or the MDW017 one way diff. Especially the latter gives you even better controls for drifting. However each of those costs as much as the whole 284131 RTR kit.

Lipo tester for storing the batteries

When ordering stuff anyway, make sure to also get a Lipo tester. Those cost about 2€ and allow monitoring the charge of the battery. This is useful when you want to take a break for a few days. In this case the battery should be at 3.8V per cell. Otherwise you risk permanently damaging the battery. To get there, you can keep the tester connected to the white-plug while driving and set the beeper to that voltage. If the beeper is too loud, you can dampen it by putting some cotton wool into the housing.

Bad upgrades

There are also some bad upgrades you can buy. Those either are wither unnecessary or actually worse than the stock parts. Particularly, this concerns the metal replacement parts. Metal parts are harder to manufacture at high precision, so you might actually degrade the performance by installing them. Also, they make the car heavier and thus decrease acceleration.

Generally, I would say that you do not need any of them for drifting. However, if you do touring and any of the plastic parts break, you might consider replacing those with a metal equivalent.

All metal ball differentials

Stock diff, good pinion – Metal diff, eaten pinion

You can get a all-metal ball diff on Aliexpress for about 8€. After some run-in those work very well and are smoother than what you get by fixing the stock ones.
Unfortunately those all-metal cogs (which are also shorter then stock) will eat-up the plastic center-shaft pinion in no time as we have high traction on the front wheels when drifting.