| From: | Kenneth Marshall <ktm(at)rice(dot)edu> | 
|---|---|
| To: | pgsql-hackers(at)postgreSQL(dot)org | 
| Subject: | Re: Hash index todo list item | 
| Date: | 2007-09-05 20:07:03 | 
| Message-ID: | [email protected] | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote:
> Dear PostgreSQL Hackers:
> 
> After following the hackers mailing list for quite a while,
> I am going to start investigating what will need to be done
> to improve hash index performance. Below are the pieces of
> this project that I am currently considering:
> 
> 1. Characterize the current hash index implementation against
>    the BTree index, with a focus on space utilization and
>    lookup performance against a collection of test data. This
>    will give a baseline performance test to evaluate the impact
>    of changes. I initially do not plan to bench the hash creation
>    process since my initial focus will be on lookup performance.
> 
Here are very basic results for a table with 1.6m entries:
postgres=# CREATE TABLE dict (word varchar(100));
CREATE TABLE
postgres=# COPY dict FROM '/tmp/words';
COPY 1648379
postgres=# select count(*) from dict;
  count  
---------
 1648379
(1 row)
Time: 11187.418 ms
postgres=# select count(*) from dict;
  count  
---------
 1648379
(1 row)
Time: 6040.912 ms
postgres=# CREATE INDEX wordhash ON dict USING hash (word);
CREATE INDEX
Time: 11108707.160 ms
postgres=# select * from dict where word = 'avatar';
  word  
--------
 avatar
(1 row)
Time: 79.823 ms
postgres=# select * from dict where word = 'zebra';
 word  
-------
 zebra
(1 row)
Time: 9.864 ms
postgres=# select * from dict where word = 'turkey'; 
  word  
--------
 turkey
(1 row)
Time: 18.418 ms
Time: 1.045 ms
Time: 1.257 ms
Time: 1.080 ms
postgres=# CREATE INDEX wordbtree ON dict USING btree (word);
CREATE INDEX
Time: 25438.884 ms
postgres=# select * from dict where word = 'avatar';
  word  
--------
 avatar
(1 row)
Time: 13.400 ms
postgres=# select * from dict where word = 'zebra';
 word  
-------
 zebra
(1 row)
Time: 1.173 ms
postgres=# select * from dict where word = 'turkey';
  word  
--------
 turkey
(1 row)
Time: 1.186 ms
Time: 1.103 ms
Time: 1.099 ms
Time: 1.108 ms
------------------------------
Size of table =       87556096
Size of hash index = 268451840
Size of btree index = 53510144
From my very small sample on an unloaded machine, a hash index lookup
took the least amount of time. It had a much larger initial time which
could be attributable to cache population effects. The size is 5X that
of the Btree index. I will continue to improve the test suite as more
granularity is needed. If anyone has a good data generator, please let
me know. Otherwise I will just roll my own.
Regards,
Ken
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Kevin Grittner | 2007-09-05 20:22:08 | Re: Final background writer cleanup for 8.3 | 
| Previous Message | Simon Riggs | 2007-09-05 20:06:31 | SET TRANSACTION not compliant with SQL:2003 |