使用二次方探查重新实现 hashtable 类

PegasusWang · PegasusWang · commit 6e98745ae3d6 · 2018-04-22T15:09:00.000+08:00
diff --git a/docs/7_哈希表/hashtable.md b/docs/7_哈希表/hashtable.md
@@ -145,7 +145,7 @@ class HashTable(object):
     pass
 ```
 
-具体的实现和代码编写在视频里讲解。
+具体的实现和代码编写在视频里讲解。这个代码可不太好实现，稍不留神就会有错，我们还是通过编写单元测试验证代码的正确性。
 
 # 延伸阅读
 - 《Data Structures and Algorithms in Python》11 章 Hash Tables
diff --git a/docs/7_哈希表/hashtable.py b/docs/7_哈希表/hashtable.py
@@ -60,10 +60,6 @@ def _hash1(self, key):
         """ 计算key的hash值"""
         return abs(hash(key)) % len(self._table)
 
-    def _hash2(self, key):
-        """ key冲突时候用来计算新槽的位置"""
-        return 1 + abs(hash(key)) % (len(self._table) - 2)
-
     def _find_slot(self, key, for_insert=False):
         """_find_slot
 
@@ -72,21 +68,24 @@ def _find_slot(self, key, for_insert=False):
         :return:  slot index or None
         """
         index = self._hash1(key)
-        step = self._hash2(key)
+        base_index = index
+        hash_times = 1
         _len = len(self._table)
 
         if not for_insert:  # 查找是否存在 key
             while self._table[index] is not HashTable.UNUSED:
                 if self._table[index] is HashTable.EMPTY:
-                    index = (index + step) % _len
+                    index = (index + hash_times * hash_times) % _len    # 一个简单的二次方探查
                     continue
                 elif self._table[index].key == key:
                     return index
-                index = (index + step) % _len
+                index = (index + hash_times * hash_times) % _len
+                hash_times += 1
             return None
         else:
             while not self._slot_can_insert(index):  # 循环直到找到一个可以插入的槽
-                index = (index + step) % _len
+                index = (index + hash_times * hash_times) % _len
+                hash_times += 1
             return index
 
     def _slot_can_insert(self, index):
@@ -159,6 +158,7 @@ def test_hash_table():
 
     assert sorted(list(h)) == ['b', 'c']
 
+    # 50 超过了 HashTable 的原始 size，我们测试下是否 reshah 操作能正确工作
     for i in range(50):
         h.add(i, i)
 
diff --git a/docs/8_字典/dict.md b/docs/8_字典/dict.md
@@ -0,0 +1,26 @@
+# 字典 dict
+
+上一章我们介绍了哈希表，其实 python 内置的 dict 就是用哈希表实现的，所以这一章实现 dict 就非常简单了。
+当然 cpython 使用的是 c 语言实现的，远比我们写的复杂得多 (cpython/Objects/dictobject.c)。
+上一章我们用 python 自己写的一个 Array 来代表定长数组，然后用它实现的 HashTable，它支持三个最基本的方法
+
+- add(key ,value): 有 key 则更新，否则插入
+- get(key, default=None): 或者 key 的值，不存在返回默认值 None
+- remove(key): 删除一个 key，这里其实不是真删除，而是标记为 Empty
+
+字典最常使用的场景就是 k,v 存储，经常用作缓存，它的 key 值是唯一的。
+内置库 collections.OrderDict 还保持了 key 的添加顺序，其实用我们之前实现的链表也能自己实现一个 OrderDict。
+
+# 实现 dict
+
+其实上边 HashTable 实现的三个基本方法就是我们使用字典最常用的三个基本方法， 这里我们继承一下这个类，
+然后实现更多 dict 支持的方法，items(), keys(), values()。不过需要注意的是，在 python2 和 python3 里这些方法
+的返回是不同的，python3 里一大改进就是不再返回浪费内存的 列表，而是返回迭代器，你要获得列表必须用 list() 转换成列表。 这里我们实现 python3 的方式返回迭代器。
+
+
+```py
+class DictADT(HashTable):
+    pass
+```
+
+视频里我们将演示如何实现这些方法，并且写单侧验证正确性。
diff --git a/docs/9_集合/set.md b/docs/9_集合/set.md
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -17,3 +17,5 @@ pages:
   - 栈: '5_栈/stack.md'
   - 算法分析: '6_算法分析/big_o.md'
   - 哈希表: '7_哈希表/hashtable.md'
+  - 字典: '8_字典/dict.md'
+  - 集合: '9_集合/set.md'