Software CRC Computation


[simple_series title=”CRC Tutorial”]

/* Created with crc4tab[i] = crc(i); */
char crc4tab = {
/* 0:*/ 0,  3,  6,  5,
/* 4:*/12, 15, 10,  9,
/* 8:*/11,  8, 13, 14,
/*12:*/ 7,  4,  1,  2};

int nibble, sum = 0; /* getNextNibble() also needs to return a placeholder for the CRC when sending */
while (0 >= (nibble = getNextNibble())) {
  sum = nibble ^ crc4tab[sum];
}
return sum;

Lookup table and sample code for our four-bit CRC

In software, you could do the same process as done in hardware: Shift each bit in and then conditionally XOR with the generator (in the C code, this is represented by the ^ operator). Unfortunately, this is often very slow, as software typically accesses data in byte (or even larger) chunks. Extracting the bits from the bytes and then processing them individually looks counterintuitive. Therefore, we need a way to process software-sized chunks.

To make the examples small, the example to the left operates on four bits (aka nibble) at a time, which matches the generator size. In your application, you may chose the chunk size independently of the generator size [1]Generators smaller than the chunk size require splitting of the chunk into two parts, the more significant part consisting of sb bits, which are XORed with the previous sum (shifted left by s-2b bits) and then used as an index into the table. The rest of the chunk is then applied as shown in the example code. (an example with 8-bit processing size and a 32-bit generator is shown in IETF RFC 1952: GZIP file format specification 4.3 (Deutsch 1996)).

The process is sped up by using a lookup table returning the combined result of applying several single-bit operations, instead of using bit-by-bit processing. This is comparable to the fast hardware approach described above, but using the preferred mechanism available in software (lookup tables) instead of those suitable to hardware (combinatorial logic).

First, a lookup table is used, with a size corresponding to the chunk size, s: For s bits, 2s entries are needed. For s=4, we thus need 16 entries in our table. This table is calculated using the bit-by-bit method, and can be precomputed (i.e., the table is already included in the code and has been computed at or before compile time) or at run-time, when the program first starts up. Each table entry consists to the CRC corresponding to each index.

Second, we need code, that for each chunk translates the previous intermediate “sum” to what it would look like after shifting in s more bits. Then, the current data chunk is XORed to that value and gives the next intermediate sum. When creating the CRC, also the appropriate number of zero-filled chunks needs to be added to obtain the CRC to be placed in that zero-filled area. For verification, you do not zero-fill, but send the message and the transmitted CRC through the function, which returns 0 on a
match. [2]Of course, you may also pass the received message (without CRC) and again zero-filled through the CRC function and then compare the computed CRC with the one received.

Step sum in crc4tab[sum] nibble sum out
1 00002=0 00002=0 (0)0112=3 00112=3
2 00112=3 01012=5 01112=7 00102=2
3 00102=2 01102=6 00002=0 01102=6

CRC of (0)011 01112 is 01102

An example using our well-known generator and test message can be seen to the left. The message has been expanded to be a multiple of the chunk size. For this, a zero has been added at the MSB. [3]Prepending zeros has no impact on the CRC, as shifting in zeros into a register that is zero has no impact, as it can never set the MSB of the register to one, which could make the contents non-zero. (Such modifications are prevented by using pre-whitening.) In the first step, sum has been initialized with 0. The first nibble is
then XORed in (there is not enough data yet to trigger an XOR with the generator).

In step two, the previous data has been shifted left, and needs to go through zero to four XORs with the generator (reduction), depending on the input sum. These XORs are performed by looking up the previous sum in the lookup table. As the last operation in step two, the new nibble is XORed in.

These first two steps are (because the XOR operation is commutative) identical to having a register 2b bits wide, which starts out zeroed. Then, four bits of data are shifted in from the lower end, similar to the hardware approach. The value is guaranteed to still fit into b bits, so no reduction is necessary. Next, b more bits are then shifted in. This will not cause the register to overflow, as the top half was guaranteed to be zero before. Then, shifted versions of the generator are applied in turn, starting with the one shifted all the way left (by b-1 bits). After that, the upper half is again guaranteed to be zero.

For a longer message, of course, the steps would be repeated, terminating with the passing of the zero-fill data into the function (in our case, this is step 3).

Step sum in crc4tab[sum] nibble sum out
1 00002=0 00002=0 (0)0112=3 00112=3
2 00112=3 01012=5 01112=7 00102=2
3 00102=2 01102=6 01102=6 00002=0

Output of (0)011 0111 01102 is 00002: Correct

When verifying data, instead of zeroing the field after the message and then comparing to the CRC, the entire message including CRC can be passed through, as shown in the figure to the left. By looking at step 3, we now can see why this happens: The output of the last crc4tab[sum] step is the CRC, when the nibble is zero (crc4tab[sum] XOR 0 = crc). When the nibble has been filled with the CRC, we get 0 as the output (crc4tab[sum] XOR crc = 0). The two parenthesized statements are mathematically equivalent, as 0 is the neutral element of the XOR operation (i.e., XORing with 0 does not change the result), and XOR is its own inverse operation (XORing in a value can be undone by another XOR with the same value).

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.