# Pearson hashing

This is the approved revision of this page, as well as being the most recent.

Pearson hashing is a hash function designed for fast execution on processors with 8-bit registers. Given an input consisting of any number of bytes, it produces as output a single byte that is strongly dependent E.g., applying the algorithm on the strings ABC and AEC will never produce the same value.

One of its drawbacks when compared with other hashing algorithms designed for 8-bit processors is the suggested 256 byte lookup table, which can be prohibitively large for a small microcontroller with a program memory size on the order of hundreds of bytes. A workaround to this is to use a simple permutation function instead of a table stored in program memory. However, using a too simple function, such as `T[i] = 255-i` partly defeats the usability as a hash function as anagrams will result in the same hash value; using a too complex function, on the other hand, will affect speed negatively. Using a function rather than a table also allows extending the block size. Such function naturally have to be bijective, like their Pearson table variants.

The algorithm can be described by the following pseudocode, which computes the hash of message C using the permutation table T:

```h := 0
for each c in C loop
h := T[ h xor c ]
end loop
return h
```

## Python implementation to generate a (pseudo) 8-bit output

The 'table' parameter requires a pseudo-randomly shuffled list of range [0..255]. This may easily be generated by using python's builtin range function, and using random.shuffle to mutate it:

``` 1 from random import shuffle
2
3 example_table = range(0, 256)
4 shuffle(example_table)
5
6 def hash8(message, table):
7  hash = len(message) % 256
8  for i in message:
9  hash = table[(hash+ord(i)) % 256]
10  return hash
```

## C implementation to generate 64-bit (16 hex chars) hash

``` 1  void Pearson16(const unsigned char *x, size_t len,
2  char *hex, size_t hexlen)
3  {
4  size_t i;
5  size_t j;
6  unsigned char h;
7  unsigned char hh[8];
8  static const unsigned char T[256] = {
9  // 0-255 shuffled in any (random) order suffices
10  98, 6, 85,150, 36, 23,112,164,135,207,169, 5, 26, 64,165,219, // 1
11  61, 20, 68, 89,130, 63, 52,102, 24,229,132,245, 80,216,195,115, // 2
12  90,168,156,203,177,120, 2,190,188, 7,100,185,174,243,162, 10, // 3
13  237, 18,253,225, 8,208,172,244,255,126,101, 79,145,235,228,121, // 4
14  123,251, 67,250,161, 0,107, 97,241,111,181, 82,249, 33, 69, 55, // 5
15  59,153, 29, 9,213,167, 84, 93, 30, 46, 94, 75,151,114, 73,222, // 6
16  197, 96,210, 45, 16,227,248,202, 51,152,252,125, 81,206,215,186, // 7
17  39,158,178,187,131,136, 1, 49, 50, 17,141, 91, 47,129, 60, 99, // 8
18  154, 35, 86,171,105, 34, 38,200,147, 58, 77,118,173,246, 76,254, // 9
19  133,232,196,144,198,124, 53, 4,108, 74,223,234,134,230,157,139, // 10
20  189,205,199,128,176, 19,211,236,127,192,231, 70,233, 88,146, 44, // 11
21  183,201, 22, 83, 13,214,116,109,159, 32, 95,226,140,220, 57, 12, // 12
22  221, 31,209,182,143, 92,149,184,148, 62,113, 65, 37, 27,106,166, // 13
23  3, 14,204, 72, 21, 41, 56, 66, 28,193, 40,217, 25, 54,179,117, // 14
24  238, 87,240,155,180,170,242,212,191,163, 78,218,137,194,175,110, // 15
25  43,119,224, 71,122,142, 42,160,104, 48,247,103, 15, 11,138,239 // 16
26  };
27
28  for (j = 0; j < 8; ++j) {
29  h = T[(x[0] + j) % 256];
30  for (i = 1; i < len; ++i) {
31  h = T[h ^ x[i]];
32  }
33  hh[j] = h;
34  }
35
36  snprintf(hex, hexlen, "%02X%02X%02X%02X%02X%02X%02X%02X",
37  hh[0], hh[1], hh[2], hh[3],
38  hh[4], hh[5], hh[6], hh[7]);
39  }
```

For a given string or chunk of data, Pearson's original algorithm produces only an 8 bit byte or integer, 0-255. But the Pearson hashing algorithm makes it extremely easy to generate whatever length of hash is desired. The scheme used above is a very straightforward implementation of the algorithm. As Pearson noted: a change to any bit in the string causes his algorithm to create a completely different hash (0-255). In the code above, following every completion of the inner loop, the first byte of the string is incremented by one.

Every time that simple change to the first byte of the data is made, a different Pearson hash, is generated. xPear16 builds a 16 hex character hash by concatenating a series of 8-bit Pearson () hashes. Instead of producing a value from 0 to 255, it generates a value from 0 to 18,446,744,073,709,551,615.

Pearson's algorithm can be made to generate hashes of any desired length, simply by adding 1 to the first byte of the string, re-computing h for the string, and concatenating the results. Thus the same core logic can be made to generate 32-bit or 128-bit hashes.