Peter Bowen - Thousand Oaks Perl Mongers

October 21, 2015

- Mark Jason Dominus
- Higher Order Perl?
- Hash History Talk

Totally Ripped off from LA.pm in early October 2015

- We need a good way to quickly look up a value based on a key.

```
my @array = ( ... );
my $index = 100;
my $value = $array[$index];
```

- Arrays only support integer indexes.
- I want to be able to use any data, Strings, objects, whatever!

```
contains(assoc_list, key) {
while (assoc_list && assoc_list.key != key)
assoc_list = assoc_list.next
return assoc_list;
}
get(assoc_list, key) {
node = contains(assoc_list, key)
if (node) return node.value
else return Null
}
store(&assoc_list, key, value) {
node = contains(assoc_list, key)
if (node) node.value = value
else {
newnode = new Node(key, value);
newnode.next = assoc_list
assoc_list = newnode
}
}
```

- Scale...
- All operations run in
*O(n)* - Insering
*n*items takes*O(n*^{2}) - Not practical for very large lists
- Everyone knows how to fix this...

- Faster!
- Typically
*O(log n)*search time. - Inserts are faster at
*O(n log n)*time. - But wait...
- Worst case is
*O(n)* - Surprisingly Common
- Sort the keys before storing
- Stringy trees are slower to search than linked lists
- Just use and algorythm to fix it!

- So Complicated

- The solution should be:
- Fast
- Simple to understand and troubleshoot
- Flexible

- Algorythms are MUCH easier to understand
- Better Average Behavior
- Worst Case is still the same
- Rare - Extremely Rare

- Almost always strings
- Non-strings are often converted to strings - hopefully unique
- Not really useful without unique keys

- Create an array of reasonable size
- Hash the key creating an integer (Magic)
- Put it into the right bucket

- How do you create the hashed value?
- What happens when more than one key hashes?
- What happens when you run out of buckets?
- Is it performant?

- Find a good pseudo random number generator
- Seed it with the key
- Generate a random number
- Divide it by the number of buckets
- The remainder is the index

```
my %hash;
#Give Perl a hint if you know how big it is...
# keys(%hash) = 1_000_000;
foreach my $number ( 1 .. 1_000 ) {
$hash{$number}++;
print $number, " : ", scalar %hash, "\n";
}
```

- Specific Data Types
- Generally not worth your time

- When Storing:
- Use a list
- Append the new item to the front of the list
- When Retreiving:
- Find the right bucket
- Traverse the list looking for the key match

- Double the size of the table
- Redistribute the data

- Perl stores the hashed value with the key
- Doubling the table decreases the frequency

- Store and Retrieve are at
*O(log n)* - Worst Case is
*O(n)*- Although extemely rare - Rebuilding takes
*O(n)*

I have a value that I need to compare to a large set of values.

- Ummmm... have to traverse the array

- Ummm... Memory Usage?

Can I quickly eliminate a whole class of values as not being in the list before I have to go and search for them?

Space/Time Trade-offs in Hash Coding with Allowable Errors

- Rapidly eliminate a value from a set
- Burton H. Bloom - 1970

- Building the Filter
- Create a large binary list
- Get two hashing algorythms
- Apply both values to the binary list

- Apply both algorythms to the value
- If they're not both there, then false
- If they are there, then use the longer check