Testing for randomness

The important thing to remember when running randomness tests on numbers is
this: You can't test too many numbers! All these test come a lot closer to
the ideal result as our datapool increases to infinity. If you only test a
small amount of numbers, your data could be anything. When we test numbers,
we test thousands at a time. We don't care about
instances, but the limit of those instances as you take an infinite amount
of them.
There is test data available for download, which
are text files of random integers between 0 and 255. Also available are
c++ sources for all the programs we use to test
randomness. The test results for the numbers
NetRand produced are also here, you can see how they stack up against the
UNIX random number generator by the c function srand().
There are 5 ways we have been checking our numbers so far:
1. Entropy -The number of
bytes it takes to represent your data when your information is optimally
compressed. Compression removes patterns from your data.
from your data. If you compress a 10 byte (80 bits) string into
53 bits, you
are using 5.3 bits to represent each character instead of 8. For truly random
numbers, it should take 8 bits to represent a byte. If you can represent
a string of bytes with less, then you have some
patterns in your numbers,
which indicate non-randomness. Look into hamming codes for more information.
2. Chi Square Method
This is the most common test for randomness.
Chi-square distribution calculated as an absolute number, then a
percentage of times true random data would exceed that value. This
percentage should be around 50 percent.
This test is extremely sensitive to
shifts in distribution, and if your numbers are not random, you can
expect to see percentages that fall under 1% or higher than 99%.
3. Arithmetic Mean
This is the first test one would think of, the most obvious of them all.
If you have 8 random bits, you get signed magnitude numbers between 0
and 255. The ideal average around 127.5. If the mean departs too far from
this value, the results are consistently too high or too low.
4. Monte Carlo Value if PI
each successive 4 bytes are used as 16
bit x and y coordinates in a square. If the distance to the point is less
than the radius of a circle inscribed in the square, it's a hit. The
percentage of hits can be used to calculate Pi. The closer your value is to
pi, the more random your data is.
5. Serial Correlation Coefficient
(SCC)
Measures the extent towhich each byte of the data relies on the previous
byte. SCC's of random sequences should approach zero, while very predictable
data such as uncompressed bitmaps yield SCC's close to 1.
Send mail to netrand
Return to main page