Ruby performance: Hash's #has_key? vs. #[] (square brackets)
February 15, 2018
During my time last year developing performance improvements for Rambling Trie, I stumbled into something quite interesting that happens with Ruby’s Hash class.
A commonly used method from the Hash interface is #has_key? which tells you if the Hash in question contains a particular key. Rambling Trie’s underlying data structure is an n-ary tree backed by a Hash where each key is the letter corresponding to a child and each value is the Node that corresponds to that letter. As you might imagine, #has_key? is a common operation called throughout the gem implementation.
To my surprise, while running some Ruby benchmarks I noticed that accessing the key with #[] and verifying if it was nil instead of calling #has_key? lowered the time it took to complete certain operations. This was suspicious to say the least.
I loaded up benchmark-ips and wrote this quick script:
require 'benchmark/ips'
Benchmark.ips do |bm|
hash = { 'thing' => 'gniht' }
bm.report 'has_key?' do
hash.has_key? 'thing'
end
bm.report '[]' do
!!hash['thing']
end
bm.compare!
end
Which gave me this:
...
Comparison:
[]: 6461337.5 i/s
has_key?: 5140776.3 i/s - same-ish: difference falls within error
I know, I know. It says same-ish: difference falls within error, but notice how #[] had approximately 1.3 million more iterations per second. The relatively small performance hit becomes noticeable when you’re building a trie from a large dictionary.
I’ll take it. From now on, I’ll use #[] instead of #has_key? for relatively low level operations where every millisecond counts.
I ran this benchmark earlier today with Ruby 2.5.0 and the gap between these two operations has gotten bigger!! It is now literally about 1.5 times slower to use #has_key?:
...
Comparison:
[]: 6056391.2 i/s
has_key?: 3956594.2 i/s - 1.53x slower
Notes