Home » Java » How are Trove collections more efficient than the standard Java collections?

How are Trove collections more efficient than the standard Java collections?

Posted by: admin December 28, 2021 Leave a comment

Questions:

In an interview recently, I was asked about how HashMap works in Java and I was able to explain it well and explain that in the worst case the HashMap may degenerate into a list due to chaining. I was asked to figure out a way to improve this performance but I was unable to do that during the interview. The interviewer asked me to look up “Trove”.

I believe he was pointing to this page. I have read the description provided on that page but still can’t figure out how it overcomes the limitations of the java.util.HashMap.

Even a hint would be appreciated. Thanks!!

Answers:

The key phrase there is open addressing. Instead of hashing to an array of buckets, all the entries are in one big array. When you add an element, if the space for it is already in use you just move down the array to find a free space.

As long as the array is kept sufficiently bigger than the number of entries and the hash function is well distributed it’s possible to keep average lookup times small. And by having one array you can get better performance – it’s more cache friendly.

However it still has worst-case linear behaviour if (say) every key hashes to the same value, so it doesn’t avoid that issue.

###

It seems to me from the Trove page that there are two main differences that improve performance.

The first is the use of open addressing (http://en.wikipedia.org/wiki/Hash_table#Open_addressing). This doesn’t avoid the collision issue, but it does mean that there’s no need to create “Entry” objects for every item that goes in the map.

The second important difference is being able to provide your own hash function, which differs from the one provided by the class of the keys. So you could provide a much faster hash function if it made sense to do so.

###

One advantage of Trove is that it avoids object creation, especially for primitives.
For big hashtables in an embedded java device this can be advantageous due fewer memory consumption.

The other advantage, I saw is the use of custom hash codes / functions without the need to override hashcode(). For a specific data set, and an expert in writing hash functions this can be an advantage.