Kolmogorov Complexity Explained (with Sample Exercises)

The Kolmogorov Complexity is also known as Solomonoff–Kolmogorov–Chaitin complexity, program-size complexity, descriptive complexity, algorithmic complexity or even algorithmic entropy.

Index

What is Randomness?

A more intuitive approach to randomness.

Actually Computing the Kolmogorov Complexity

Showing how it is computed with concrete examples.

Formal Definition of Randomness

Defining randomness using Kolmogorov Complexity.

Subsection: How long is Bin(n)?

This is not as obvious as you might think.

Typical Kolmogorov Exercise

A detailed guide to solving a common Kolmogorov Complexity exercise, plus some common mistakes and pitfalls.

In this post I’ll attempt to explain the theoretical construct of Kolmogorov Complexity. My main sources are the book “Theoretische Informatik: Formale Sprachen, Berechenbarkeit, Komplexitätstheorie, Algorithmik, Kommunikation und Kryptographie” by Juraj Hromkovic and the Theoretical Computer Science (theoretische Informatik) lectures at ETH. Sooo without further ado let’s get to it!

Kolmogorov Complexity allows us to (at least attempt to) answer the following questions:

How much information does a particular string of numbers contain?
What exactly makes a number truly random?
Is there information you cannot compress?

What is Randomness?

To really understand Kolmogorov Complexity, we need to look at the concept of randomness first. What does it mean for a sequence to be random? Let’s look at these 3 sequences¹:

0000000000000000

0101010101010101

0001011011100010

Statistically, they are all equally likely to appear if we just pick 16 random digits. However, from an intuitive standpoint, doesn’t the last one seem much more random than the first?

The first sequence is just 0 repeated 16 times. We can simply write this as 0¹⁶. Similarly we can write the second sequence as (01)⁸. But how would we write the third sequence? I can’t see any pattern, so It’s easiest to just write down the sequence 0001011011100010 itself.

This leads us to a “new” definition of randomness: how hard it is to detect a pattern (the harder, the more random). You can also look at it this way: if someone gives you n digits of a sequence of numbers, can you predict what the (n+1)th digit will be?

Kolmogorov Complexity Defined

This is where Kolmogorov comes in. The easier it is to find a pattern in a sequence, the less random it is and the lower its Kolmogorov Complexity. Conversely the harder it is to find a pattern, the “more random” it is and the higher its Kolmogorov Complexity. For truly random numbers it should be impossible to find any pattern at all! (We’ll come back to a formal definition of random further down).

The idea behind the Kolmogorov Complexity is that if you can find a pattern of a sequence, it’s very easy for you to generate the same sequence yourself. Put differently, the Kolmogorov Complexity asks “How easy is it to generate this sequence?”

Def: Kolmogorov Complexity
The Kolmogorov Complexity of a number is the length of the shortest possible Pascal program² that generates that number.

We denote the Kolmogorov Complexity of a binary string x as K(x). We can also define the Kolmogorov Complexity of a natural number n. For this we simply use the binary encoding of n, which we’ll call Bin(n). Thus, K(n) = K(Bin(n)).

Def. Length: We denote the length of a number x as |x|. So |0ⁿ| = n and |(111)ⁿ| = 3n etc.

Actually Computing the Kolmogorov Complexity

It’s important to mention that it’s impossible to actually compute the Kolmogorov Complexity of a specific sequence. We can only operate in upper bounds.

Say we want to write the number 0ⁿ (so 0 n times). Intuitively, the complexity of this number should be very low, since it’s just 0 repeated again and again. It doesn’t seem random at all.

How would we write a program that generates 0ⁿ? First, let’s do it the “stupid” way:

begin 
    write(0...0) //0 repeated n times
end

Now we count how long the program is (in bits):
The letters (everything that isn’t blue) are encoded in binary somehow, for instance using ASCII. No matter what n is, this part of the program doesn’t change. It stays equally long for all choices of n. Therefore we can represent the length of the entire program without 0ⁿ with a constant d. We don’t care what exactly this constant is though.

The length of 0ⁿ is simply n. Thus we get:

K(0^n) \leq n + d

We can definitely do better than this, though! Since the computer is just writing “0” over and over again, we can simplify the program by using loops.

begin
    k := n
    for(i=0 to k) 
    begin
        write(0)
    end
end

Length of new program:
Again, everything but n stays the same length, regardless of what n is. Therefore we again represent it with the constant d'. The length of n is just the binary encoding of n, we’ll call it Bin(n). Thus:

K(0^n) \leq d' + |Bin(n)|

By defining |Bin(n)| the way I explain further down we get:

K(0^n) \leq d' + \lceil log(n+1) \rceil - 1

As you can see the length of our first program grows linearly as n grows, while the second program only grows logarithmically. The bigger n gets the bigger the difference between the two programs.

From all of this we can deduce the following:

Lemma (Upper Bound of the Complexity):

\forall x: K(x) \leq |x| + d

This is because for any string x we could simply create a program that just… writes x, without any fancy loops or anything (as seen in the 1st code example).

Formal Definition of Randomness

Hopefully you have the intuition to fully understand this definition now! A binary sequence x is random iff:

K(x) \geq |x|

Similarly, a natural number n is random iff:

K(n) \geq |Bin(n)|

It’s important to realize the link between randomness, compressibility and information content of a sequence. If you can’t create a program to generate a sequence that is shorter than that very sequence (i.e. the sequence is Kolmogorov random), that sequence is not compressible (by definition). If it’s impossible to compress a sequence into a smaller sequence, you might say that its information content is very high. In this way, all these concepts are intertwined.

We now know randomness can be used as a synonym for non-compressibility. With the help of Kolmogorov Complexity we can now show that there is at least 1 non-compressible sequence for all sequences of length n (excluding 0). This is just one of many ways Kolmogorov Complexity is used in proofs.

Proof Outline:

How many sequences are there of length n? Since we work in binary, it’s simply 2ⁿ.

How many possible programs are there that are shorter than n but generate a sequence of length n? As programs are also just binary sequences of 0s and 1s, we know there are at most 2^k functioning programs of length k. Thus:

\sum_{i=1}^{n-1} 2^i = 2^n-2 \leq 2^n

Even if we assumed that every string of 0s and 1s was a working program, there still wouldn’t be enough unique programs to generate all the sequences we need. QED

Subsection: How long is Bin(n)?

You might think this is obvious but hold on! There is a caveat.

Using the standard binary encoding (so 1₁₀ = 1₂, 3₁₀ = 11₂, 4₁₀ = 100₂ etc.) we would get something like this:

|Bin(n)| = \lfloor log(n) \rfloor + 1, n \neq 0

We can however use a more efficient binary encoding! The encoding mostly stays the same, bar one thing: Every encoding that starts with a 1 and is followed by several 0s (so 4₁₀ = 100₂, 8₁₀ = 1000₂ and any other power of 2 that isn’t 1 or 2), can be encoded using only the 0s. So 4₁₀ = 00₂, 8₁₀ = 000₂ etc. This works because we treat the sequences like strings, not numbers. If we see several 0s beside each other we know there has to be an “invisible” 1 in the front (otherwise the encoding would just have been 0). Thus we can use the following (for n greater than 2):

|Bin(n)| = \lceil log(n+1) \rceil - 1

In the end this usually doesn’t matter though. When working with Kolmogorov Complexities, you usually have some constants that “swallow” any +1 or -1 you could have, and the rest of the difference is usually negligible and probably not important for whatever you’re trying to prove.

TLDR: I’ll use the 2nd equation as the length of Bin(n) (this is also what Hromkovic uses in his book).

Typical Kolmogorov Exercise

Find a good upper bound for the Kolmogorov Complexity of x (expressed in terms of |x|).

x = (010)^{3^{2n^{3}}}

First we define n via |x|:

|x| = 3\cdot3^{2n^{3}} = 3^{2n^{3}+1} \Leftrightarrow log_3(|x|) = 2n^3+1

\Leftrightarrow n = (\frac{log_3(|x|)-1}{2})^{\frac{1}{3}}

Now for the code:

begin
    j = n
    j = 2*j*j*j
    k = 1
    for (i=0 to j) 
    begin
        k = 3*k
    end

    for (i=0 to k)
    begin
        write(010)
    end
end

We denote the length of everything but n with the constant c. By looking at the program, we see:

K(x) \leq \lceil log_2(n+1)\rceil -1 + c \leq log_2(n+1) + c'

We define c' = c - 1 (this is usually done implicitly). Using |x| we get:

K(x) \leq log_2( (\frac{log_3(|x|)-1}{2})^{\frac{1}{3}} + 1) + c'

You might be able to simplify/make the result a bit more elegant, but as long as you get the point I’m content.

Common Mistakes/Points of Confusion

It’s important to realize that the Kolmogorov Complexity is about the length of the written program (in binary). You don’t look at how much storage space a variable needs, there is no loop unrolling etc. etc. No, it’s about actually counting the 0s and 1s a program needs to generate your sequence (though you can never actually count them, since Kolmogorov Complexity is purely theoretical).
NEVER write K(x) = ……. Only ever use <=. Intuition: It’s very hard to prove that the program you have defined is ACTUALLY the shortest program that generates the sequence x. There could always be even shorter programs. But you have definitely defined an upper bound. Formally: It has actually been proven to be impossible to calculate the Kolmogorov Complexity of any specific number (you can find the proof in the book). It still is a very helpful theoretical construct.

Summary

The Kolmogorov Complexity is one way of defining the complexity of a sequence. It also defines whether a sequence is random and how much information the sequence contains. It is a purely theoretical concept: you cannot calculate the Kolmogorov Complexity of a specific sequence and you only work with upper and lower bounds respectively. This might seem pointless, but it is still very useful for a lot of proofs.

Footnotes

¹ We’ll only be looking at binary sequences in this post. Since you can easily represent any string of numbers or letters as binary sequences, that’s all we really need. Back

²It actually doesn’t matter which programming language you use, as the length of programs in different languages only differ by a constant factor (you can find the proof in the book). Therefore I’ll stick to “Pascal-esque” pseudocode in this post. Back