Brute-forcing CRC parameters

February 10, 2012 at 11:20 AM | categories: tools, reverse engineering

Recently, I had to discover the parameters for a cycling redundancy check (CRC) calculation. Because it was the second time within six months that there was a need for determining a CRC polynomial, I implemented a brute-forcing tool. This tool reveals the (truncated) polynomial, the initial value, input and output reflection and a finally applied XOR-mask. Obviously, brute-forcing fails for large polynomials like for a CRC-32, because you have to try different polynomials and initial values. But for smaller CRCs this approach works well. Usually, you do not have truncated polynomials larger than 16, especially not in simple telegrams with some bytes.

What is a truncated polynomial? In general, a polynomial has a grade. E.g., CRC-16-CCITT has a polynomial of grade 16. This is x^16 + x^12+x^5+1. To describe the polynomial by its coefficients the leading digit is often skipped, e.g., 1.0001.0000.0010.0001 becomes 1.0000.0010.0001. Thus, the term truncated polynomial is used to indicate this truncation.

Let’s see how this tool works. In a first step, we generate some test data. Therefore, the tool generate-test-data might be used. It generates some random bit strings and calculates a CRC, which is appended to the string.

$ ./generate-test-data --final-xor 0 --messages 10 | tee testmessage_1
# width                    : 10 bits
# CRC's offset             : 49
# calc CRC for bit offsets : 0 .. 49 (not included)
# final XOR                : 0
# reflect in               : true
# reflect out              : false
# 
# truncated polynom        : 189 (MSB not shown)
# initial value            : 149

0110110010000011101000011000110101111000000000100 1111111010
0001000000001100101100100110011011111100000110100 0101000101
1101011100111000110110110010111011110110101001001 0011100111
0100000011100011110000100111010001101100010010011 0110100011
0110110000010111010110011001010011100101110000101 1100000001
0110101000000111010111101101001000110010100111110 1011001001
1000101011110110010100001110001111101110000011000 1000010010
0110111010111000011100100111110101110101111011010 1001111001
1001101001111000010110001000011101110000111001001 0101110000
1101110110101001101001110010011101010001110000110 0000110111

Here, the bit string starts at offset 0 and ends at offset 49. Please note, that these are not the common eight bits per byte. I avoided this byte-centric view, because in low-level reverse engineering messages might be odd-formed, especially when analyzing radio transmission.

In a second step, these test messages are reformatted to fit the brute-forcing tool’s input format. Therefore, I wrote a script in Perl. It reads bit strings (“101001…”), hex strings (“0xdeadbeef…”) and comma-separated lists of hex values (“0x23, 0x42, 0x5”) and reformats them to either a hex or a bit string.

$ perl ../rewrite-as.pl bits testmessage_1 | tee testmessage_1.bits
01101100100000111010000110001101011110000000001001111111010
00010000000011001011001001100110111111000001101000101000101
11010111001110001101101100101110111101101010010010011100111
01000000111000111100001001110100011011000100100110110100011
01101100000101110101100110010100111001011100001011100000001
01101010000001110101111011010010001100101001111101011001001
10001010111101100101000011100011111011100000110001000010010
01101110101110000111001001111101011101011110110101001111001
10011010011110000101100010000111011100001110010010101110000
11011101101010011010011100100111010100011100001100000110111

Finally, we launch the CRC brute-forcing tool, which requires specifying several offsets: The tool needs to know, where the message’s CRC relevant portion starts and stops and where the checksum is placed. When reverse engineering a telegram format, one might determine the checksum part without problems. This is caused by the CRC algorithm’s property that it is a Galois type shift register, which produces pseudo-random numbers. But it loops not on its own. Instead the algorithm is fed with message bits. Thus, the checksum looks more random than the rest of the message.

$ ./bruteforce-crc --width 10 --start 0 --end 49 --offs-crc 49 \
       --file testmessage_1.bits --probe-reflections true 
number of threads        : 4
width                    : 10 bits
CRC's offset             : 49
calc CRC for bit offsets : 0 .. 49 (not included)
final XOR                : 0
reflect in               : false
reflect out              : false

truncated polynom        : from 0 to 1023 (MSB not shown)
initial value            : from 0 to 1023
probe reflections        : true
probe final xor          : false

extracted crc 03fa
extracted crc 0145
extracted crc 00e7
extracted crc 01a3
extracted crc 0301
extracted crc 02c9
extracted crc 0212
extracted crc 0279
extracted crc 0170
extracted crc 0037
----------------------[ MATCH ]--------------------------------
Found a model for the CRC calculation:
Truncated polynom : 0xbd (189)
Initial value     : 0x95 (149)
Final XOR         : 0x0 (0)
Reflected input   : false
Reflected output  : false
Message offset    : from bit 0 .. 49 (end not included)

The brute-forcing tool extracts the checksum field at the specified offset and compares the calculated CRCs with the extracted CRCs. If a CRC matches the extracted one, the tool continues with testing the other messages. And finally, if a CRC parameter set satisfies all messages a CRC model is found.

If you have to reverse engineer an undocumented bus-protocol or some radio messages and don’t want to reconstruct the CRC parameters by putting on one's thinking cap, you might need such a tool, too. Therefore, I published the code on github.com.

Further readings: