Review hash functions

I've observed that the test case from #456 (which has about 15,000 variables, IIRC) shows pretty bad collision rates, even after switching from k&r hash to perl hash.

I suspect two things are in play:

1. We use 32 bits hash functions. Per the Birthday Paradox, that means a 50% chance of collision after ~25,000 elements, and growing rapidly<sup>1</sup>. A 64 bits hash function reaches that point only after 1.6 billion elements.

2. We use multiplicative hashes and those tend to have poor entropy in their low bits. Perl hash tries to mitigate that with a final shuffle but I have a hunch that a 64 bits big prime multiplicative hash performs better.

Counterpoint _contra_ (2): we could get better distribution out of our 32 bits multiplicative hash functions with `h >> (32-N)`, where `N` is a power of two, but only if we used power-ot-two hash tables. But we don't because those are more memory hungry.

<hr>

<sup>1</sup> ninja edit: which we then bucket - truncating the result, basically - compounding the effect

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Review hash functions #859

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Review hash functions #859

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions