Python NLP (3) Trie
查找的資料結構及複雜度:
- list: O(n)
- tree: O(log(n))
- dictionary: O(1)
通常查找會採用 tree 資料結構在空間及時間複雜度上取得平衡。
1. Node:
tree 結構的基礎是節點 (node)。節點通常會有 children 和 value 兩個屬性,children 代表節點底下的子節點, value 代表該節點底下紀錄的值。
class Node:
def __init__(self, value) -> None:
self._children = {}
self._value = value
def _add_child(self, char, value, overwrite=False):
child = self._children.get(char)
if child is None:
child = Node(value) # 建立並新增節點
self._children[char] = child
elif overwrite:
child._value = value
return child
2. Trie:
建立 tree 並實現 __getitem__ 和 __setitem__ 方法。
__getitem__ 會在 tree 中以 key 的每個 char 向下搜尋返回最後一個節點所存放的值。
__setitem__ 會在 tree 中以 key 的每個 char 向下新增節點並設定最後一個節點的值。
class Trie(Node):
def __init__(self) -> None:
super().__init__(None)
def __contains__(self, key):
return self[key] is not None
def __getitem__(self, key):
state = self
for char in key:
state = state._children.get(char)
if state is None:
return None
return state._value
def __setitem__(self, key, value):
state = self
for i, char in enumerate(key):
if i < len(key) - 1:
state = state._add_child(char, None, False)
else:
state = state._add_child(char, value, True)
3. 建 tree 查找測試:
CRUD 測試:
if __name__ == "__main__":
trie = Trie()
# create
trie["節點"] = "node"
trie["節日"] = "holiday"
trie["節約用水"] = "save water"
trie["節慶"] = "festival"
trie["寵物"] = "pet"
assert "節日" in trie
# read
print("寵物: ", trie["寵物"])
assert trie["寵物"] == "pet"
# update
print("trie['節日']: ", trie['節日'])
trie["節日"] = "festival"
print("trie['節日']: ", trie['節日'])
assert trie["節日"] == "festival"
# delete
trie["節慶"] = None
assert "節慶" not in trie
Reference:
1. 自然語言處理入門
留言
張貼留言