Bitpacking

Bitpacking

Using bit magic to compress data!

ยท

4 min read

I came across this fun little way to pack together a few integers into one and extract data from that integer. This way, you save a few bytes! It may not seem so significant to someone like a backend developer because saving a few bytes may not matter much. Bitpacking is commonly used by people who develop programs for embedded systems. You need to make sure you use the least amount of memory possible since resources are limited. It is also used in data compression algorithms. You can also find it being used in implementing protocols such as the Internet Protocol. In this case, you need to make sure that the fields in the packet's headers are a stream of packed integers. This basically means that the fields in headers should be stored next to each other in memory.

Okay, let's get to the fun part. I will explain this using an example. Let's say you need to store details of products of an e-commerce site. You will need to hold the product weight in grams, the type of product and cost. Of course, we also need to have the product name, seller details, etc. But for the sake of simplicity, let's consider only those three.

weight = 1000
type = 2 # Lets say type 1 - edibles, type 2 - electronics, type 3 - others
cost = 3500

We have 3 integers. A typical integer is 4 bytes, so these variables combined take a total of 12 bytes.

Before packing these together, we need to decide on the bit structure. For example, let's say the binary for the packed integer looks like this. 2021-05-06-235223_1368x768_scrot.png The W's hold the binary form of the weight, the T's hold the binary form of the type and the C's hold the binary form of the cost.

Since we have 10 W's, we restrict the maximum weight to 1023 (the binary form of 1023 is ten 1's).

Similarly, maximum type is 3 (the binary form of 3 is two 1's), and maximum cost is 4095 (the binary form of 4095 is twelve 1's). Now let's pack it to a single integer using simple bit manipulation.

product = weight
product = product << 2
product = product | type
product = product << 12
product = product | cost

Hmm, what just happened here? Let's break this down.

First, we assigned weight to product. product has the value 1000, binary form of product now is 0b1111101000

Then product is left-shifted by two and is assigned to itself. Left shift basically adds zeros at the end in the binary form of the number. So the binary form of product is now 0b111110100000 (Just added two zeros at the end of bin(1000)).

Next, we applied OR to product and type and assigned it to product. What does that do?

2021-05-07-002159_1368x768_scrot.png After we left shifted two times to add the 2 zeros at the end, using OR we added type to the product variable. In the above image, the first 10 binary digits converts to 1000, which is the weight. The next two binary digits, 10 is basically the type, which has the value 2.

Similarly, we append the cost in the same way to product. cost can hold a maximum of 4095, which is 12 binary digits, so we left shift 12 times. Then we use OR to put the cost in product. The binary form of product now looks like this.

2021-05-07-001751_1368x768_scrot.png

The value of product is now 16395692. Which is just one single integer of 4 bytes. We just saved 8 bytes from the initial 12 bytes used to store 3 integers. 66.667% compression ๐Ÿ˜Š

But wait, how do I get those values of weight, type and cost back? Let's see how

weight = product >> 14
type = (product >> 12) & 0b11
cost = product & 0b111111111111

Right shift removes the least significant bit (rightmost). By removing the 14 such bits, we remove the cost and type bits and are left with only the weight bits. Then to get type, we first remove 12 bits (cost) and apply a mask to get the type. Since the type is just 2 bits, we use 0b11 as a mask that filters out the type bits alone.

2021-05-07-004442_1368x768_scrot.png We do the same to get cost with 0b111111111111 as a mask since cost is 12 bits.

Voila! We have successfully compressed and also unpacked stuff using simple bit manipulation. Implementing this in C++ is easy.

struct product {
    unsigned int weight: 10;
    unsigned int type: 2;
    unsigned int cost: 12;
} prod;

prod.weight = 1000;
prod.type = 2;
prod.cost = 3500;

cout<<prod.cost;

unsigned int weight: 10 is called a bitfield. It is a 10-bit long int variable. We can pack custom bit length variables such as these as shown above.

This looks more readable than the python method, which I explained above. But if you want to strictly use python but still want the readability and speed of C++ you can use the ctypes module. More on that in another article!

Cover photo credits: Alexander Sinn

Did you find this article valuable?

Support Sidharth Shambu by becoming a sponsor. Any amount is appreciated!

ย