Check Digits


[simple_series title=”CRC Tutorial”]

While reordering of words may be uncommon in data communication networks, it is quite common for humans to swap digits in large numbers consisting of seemingly random digits, such as those used in the ISBN system or UPC/EAN product codes. Therefore, a checksum is not suitable for these numbers. A common algorithm is to multiply each digit with a different number before it is added to the sum. In addition to digit substitutions, this helps detect many swapped digits, as the position now becomes significant as well.

Using ISBN as an example, we find the following definition: The number consists of 9 decimal digits, followed by the check digit which is decimal or X, a digit representing 10. Each of these digits is weighted: The first has weight 10, the second, 9, and so on until the last data digit with weight 2 and the check digit with weight 1. The check digit is chosen such that the weighted sum is zero (modulo 11).

ISBN calculation
ISBN Check Digit Rule

What is the check digit of 3-423-10735-_ (the dashes in the sequence are for grouping only)? We have to ensure that 3*10 + 4*9 + 2*8 + 3*7 + 1*6 + 0*5 + 7*4 + 3*3 + 5*2 + c*1 = 0 (mod 11). Simplified, this is 156 + c = 0 (mod 11). As 156 mod 11 = 2, we get c = 9 and thus an ISBN of 3-423-10735-9.

This behavior is much closer to CRCs than you probably would expect. If you understand this, you are a good step into understanding CRCs.

Error detection capability

ISBN after Modification
Adding 1 to the digit weighted 7 adds ( (1*7) modulo 11)=7 to the checksum

As 11 is relatively prime to each of the weights used, the check digit will protect against any single-digit modification. A single check digit is unable to detect all two-digit modifications, the simplest example being the adjustment of the check digit after a modification of a data digit. [1]Most modifications to one data digit can be compensated with a modification to another data digit, it is not necessary to revert to updating the check digit. (It is only “most”, not “all”, as only the check digit allows for all necessary 11 values, the data digits are limited to 0…9, lacking the possibility for the special value X.)

 

ISBN digit swap
Swapping adjacent digits differing by 2 adds or subtracts 2 to the checksum, depending on which digit was larger

It is obvious that the check digit protects against at least some digit swaps, but does it protect against all? The answer lies in the solution to e*df*d = 0 (mod 11), where e and f are the distinct weights of the positions being swapped and d being the (non-zero) difference between the digits. When changing the formula to (ef)*d = 0 (mod 11), it becomes apparent that with ef and d both being nonzero and relatively prime to 11, we cannot find a solution fulfilling the formula, therefore any swapping of digits will be detected.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.