Enough! Why rand() cant compress!!!

4.9
(74)

Unless you are born an information theory genius, at some point during your programming learning you might have considered the following hypothesis:

A file is a sequence of bytes. If I could find the appropriate seed for srand(), surely I could generate the file, just by successively calling rand() % 256? Therefore I could represent the file with just the seed and the expected file length.

Now most of you may laugh, but I bet some of you may be thinking about it, or have considered such a thing in the past. I know for a fact some of you have tried this on the Discord server, building brute force algorithms to determine appropriate seeds.

Well it’s not possible. And not because rand() is considered a poor random number generator. Consider the following…

Assume you specify an 8-bit seed. This will give you 256 possible sequences of bytes. This means for a 1-byte seed, there are only 256 possible files you can represent. The likelihood of one of those sequences matching your file is very small.

Let’s assume the file you are generating is only 10 bytes long, and that rand() is a completely fair generator (it’s not). The probability of the sequence generated matching the sequence you want is:

(1/256)^10 = 0.000000000000000000000000827%

i.e. Ain’t gonna happen. Well it might, you do have 256 goes at it, since you have 256 possible sequences available to choose from. So let’s multiply that by 256:

0.00000000000000000000021%

Hmmm, chances are better, but still very, very, very remote. How many goes would we need to guarantee your chances of finding the 10-byte sequence you need? Well lets take our original chance and determine the reciprocal:

(1/0.000000000000000000000000827) = 1,209,189,842,805,320,435,308,343

Note, some rounding is coming into play here, but as you can see you’re going to need quite a few possible sequences generated, to guarantee the sequence you want is one of them. If we had to represent that number in binary, we can determine the number of bits by taking the 2nd log:

log2(1,209,189,842,805,320,435,308,343) ~= 80 bits, or, sigh, 10 bytes…

So your seed value needs to be at least 10 bytes long to guarantee you can generate a predetermined 10 byte sequence. You may as well just store the file…

Information is funny like that.

Rate this post!

Average rating 4.9 / 5. Vote count: 74

No votes so far! Be the first to rate this post.