GetHashCode

GetHashCode is approximate method for checking of equality that can end-up being used by other classes (most popular one is Hashtable).

.NET documentation gives us these rules:

  • Equal objects should return same hash code.
  • Same object should return same hash code every time until object is changed.
  • Distribution of hash codes should be random.

Equal objects should return same hash code

Sometimes checking for equality can be expensive and it may be few times faster to check whether their hash codes are equal. This will not give you definitive answer but it will allow you to remove quite a few objects from consideration. Hashtable does this through it's bucket mechanism. If you insert object that doesn't follow this rule, HashTable may be unable to return correct result (e.g. for ContainsKey method).

Note that this doesn't mean that different objects must return different hash code.

Same object should return same hash code every time until object is changed

This one seems logical since it follows same reasoning as first rule. Whenever property that can make your Equals or Compare methods return different result changes value, you should recalculate your hash code. Small changes to object (something that doesn't affect these methods) should not generate new hash code (they can, but there is no reason to).

Distribution of hash codes should be random.

This one is introduced to help classes that use buckets for divide-and-conquer (HashTable is our example again). This ensures that every bucket is filled to approximately same level so that any search doesn't need to check every object.

Worst case scenario is every object returning same hash code value (e.g. return 0). While this follows rule one and two, performance-wise it is awful since every check will need to take all objects into consideration.

This is important rule, but you should not go through too much effort for this. Since GetHashCode method could get called a lot during search through collection, you should try to make it as fast as possible. Fast GetHashCode with less than optimal distribution will often out-perform elaborate and slow code.

What happens if I don't override GetHashCode?

You will get default hash code that is based on memory address of your object in memory (in future implementations this may change). While this does work good enough, it may fail to recognize two objects as being same if they are created differently (e.g. returning same data row two times). In most of cases it will work, but it is generally bad idea to use it with collections (most of them use buckets) and it can lead to bugs that are difficult to find.

How to generate it then?

If you intend to use class with collection, you have probably already overriden Equals method (or you implemented some of compare interfaces e.g. IComparer). Whatever you have there to check for equality, use it in GetHashCode also. E.g. if your application uses property named Key for Equals, write:

public override int GetHashCode() {
return this.Key.GetHashCode();
}

This makes it both simple and fast (if type of Key is one of .NET types) while following all rules.

Slightly more complicated situation is when you check against more than one property. One path you could take is to return GetHashCode based on element that changes more frequently. This will cause few collisions with hash codes (different objects will have same hash code) but it will not cause bugs. Depending on how many properties you have, it may not even have big hit on performance.

Other approach is combining two hash codes into one. E.g.:

public override int GetHashCode() {
return this.Key.GetHashCode() ^ this.Key2.GetHashCode();
}

If you go that way, always measure speed. In more than one case you will find GetHashCode method that takes all elements into consideration is slower than one that has collisions. It all depends on objects you will use. From my experience, I would recommend avoiding calculating hash code on more than two properties.

Caching?

While caching may sound like a good idea, there is no need for it if you use GetHashCode of .NET Framework's classes (as we did in examples above). Those classes either already have caching in place or they are using operation that is fast enough so that caching is not needed.

Only if you have your own hashing mechanism, you should consider caching results. Do not forget to update hash code also if object is changed.

Is it worth it?

If you are using something from Collection namespace, answer is yes. Almost anything there is either already using GetHashCode or it may use it in future. Even simplest of all hash codes will help performance.

Leave a Reply

Your email address will not be published. Required fields are marked *