struct { char *string; } buckets[10]; case 1: buckets[0] = "A"; buckets[1] = "B"; Assuming ASCII: n = key - 'A'; case 2: buckets[0] = "B"; buckets[1] = "A"; Assuming ASCII: n = 1 - (key - 'A'); case 3: buckets[0] = "ABC"; buckets[1] = "DEF"; 'A' + 'B' + 'C' = max Would this work: (('A' + 'B' + 'C') * 3) / numBuckets; 'D' + 'E' + 'F' = max An algorithm sort of evolves from these patterns. We could possibly have even greater performance by only doing keys that are sizeof (int) and aligned to sizeof (int). ABC 65 + 66 + 67 = 198 198 * 3 is 594 594 / numBuckets What if we do patterns likes: K+E+Y = max and the max becomes the last element in a bucket like so: (K+E+Y * numElements) / numBuckets DEF 68 + 69 + 70 = 207 So we have the values 198 and 207. Somehow we must map those to exact keys ALWAYS. Let's say that 207 becomes slot 1, and 198 becomes slot 0 The delta of 207 - 198 is 9 If we did key - 198 we could have a direct mapping of buckets[0] = "ABC" buckets[9] = "DEF" However this has a problem, and that is that keys like this will generate the same hash: CBA, BAC, etc. What if we multiply a character value by 2, 3, 4, .. ? (65 * 2) + (66 * 3) + (67 * 4) That appears to always be unique. One issue with this is that we have expanded the potential array size considerably. The min value (the key with the lowest number) would become 0, and any greater keys would be at high addresses. Would it be enough to do a multiply by 1,2,3, ...? That would probably improve the size. There is a way to produce an algorithm or find the proper algorithm, and the answer lies in mph and the other perfect hash code. Our simplistic hash wastes too much memory, but I wonder how the others fare. I had the idea of using 2 bytes and encoding the pointer value over a series. This could potentially save some space, but I'm not seeing all of the idea yet. We could scale the value produced by the KEY to produce a linear array of keyed values that are closer together. Provided that no collisions occur it should work. One problem would be handling remainders. This should be thought about. "event" % scan "event" %c%c%c%c%c 101 118 101 110 116 "window" % scan "window" %c%c%c%c%c%c 119 105 110 100 111 119 proc calc.key str { set by 1 set n 0 set list [list] foreach c [split $str ""] { lappend list [scan $c %c] } foreach num $list { set n [expr {$n + ($num * $by)}] incr by } return $n } % calc.key event 1660 % calc.key window 2328 We could try: 1660 / 1660 = 1 = n; 2328 / 1660 = 1 = n; if (2328 % 1660) then incr n That solves for 2 keys. Let's solve a more complex set such as this: event = 1660 window = 2328 x = 120 y = 121 width = 1613 height = 2249 Is there a posibility of key collisions with this calc.key algorithm? x is the min. If we subtract 120 from each we will still have a huge array. x / 120 = 1 = n; y / 120 = 1 = n; if (121 % 120) ++n; event / 120 = 13; if (event % 120) ++n; window / 120 = 19; if (event % 120) ++n; /* This conflicts with event! */ width / 120 = 13; if (1613 % 120) ++n; height / 120 = 18; So, we need a different divisor, or to handle the modulus differently. ----- July 26th 2005 appid desktop transients transient_for border_width path [for tclsh] interp alias {} % {} eval % for {set i 0} {$i <= 255} {incr i} {set x($i) $i} % set k 7 % for {set j 0} {$j <= 3} {incr j} { for {set i 0} {$i <= 255} {incr i} { set s $x($i) set k [expr {($k + $s) % 256}] set x($i) $x($k) set x($k) $s } } I think we can do better. proc from.ascii s { set r [list] foreach c [split $s ""] { lappend r [scan $c %c] } return $r } % from.ascii appid 97 112 112 105 100 % from.ascii transient_for 116 114 97 110 115 105 101 110 116 95 102 111 114 proc magic key { set shifter 1 set hash 0 foreach c [split $key ""] { incr hash [expr {([scan $c %c] << $shifter) + $hash}] incr shifter 1 } return $hash } What if we treated it like a tree? We could use 0 and 1 bits for left and right to produce the hash. Or more like a trie with reduce precision. We could have 32-bit for the left->right->left->right->right-ish patterns. set keys [list appid desktop transients transient_for border_width path] foreach key $keys { set tmp($key) [magic $key] } tmp(appid) = 16832 tmp(border_width) = 5230592 tmp(desktop) = 97536 tmp(path) = 6864 tmp(transient_for) = 11517952 tmp(transients) = 1125376 At this point we want to know the lower limit which is 6864 We need to map 6864 >= 0 We could map it to 0 with: % 6864 >> 13 0 % 6864 >> 12 1 So, let's say we chose 13. % foreach {key value} [array get tmp] { set tmp2($key) [expr {$value >> 13}] } % parray tmp2 tmp2(appid) = 2 tmp2(border_width) = 638 tmp2(desktop) = 11 tmp2(path) = 0 tmp2(transient_for) = 1406 tmp2(transients) = 137 That's wasteful for some keys. At least we don't have any collisions. Could we say if {$hash > somevalue} { set hash [expr {$hash - whatever}] } We would probably be more likely to cause a collision that way :( What if we first try using the lower 4 bits, and then 5 bits, and then 6 bits ... ---- study mph This is apparently a Pearson hash algorithm: hash = 0; for (i=0; i