Python NLP (3) Trie
查找的資料結構及複雜度:
- list: O(n)
- tree: O(log(n))
- dictionary: O(1)
通常查找會採用 tree 資料結構在空間及時間複雜度上取得平衡。
1. Node:
tree 結構的基礎是節點 (node)。節點通常會有 children 和 value 兩個屬性,children 代表節點底下的子節點, value 代表該節點底下紀錄的值。
class Node: def __init__(self, value) -> None: self._children = {} self._value = value def _add_child(self, char, value, overwrite=False): child = self._children.get(char) if child is None: child = Node(value) # 建立並新增節點 self._children[char] = child elif overwrite: child._value = value return child
2. Trie:
建立 tree 並實現 __getitem__ 和 __setitem__ 方法。
__getitem__ 會在 tree 中以 key 的每個 char 向下搜尋返回最後一個節點所存放的值。
__setitem__ 會在 tree 中以 key 的每個 char 向下新增節點並設定最後一個節點的值。
class Trie(Node): def __init__(self) -> None: super().__init__(None) def __contains__(self, key): return self[key] is not None def __getitem__(self, key): state = self for char in key: state = state._children.get(char) if state is None: return None return state._value def __setitem__(self, key, value): state = self for i, char in enumerate(key): if i < len(key) - 1: state = state._add_child(char, None, False) else: state = state._add_child(char, value, True)
3. 建 tree 查找測試:
CRUD 測試:
if __name__ == "__main__": trie = Trie() # create trie["節點"] = "node" trie["節日"] = "holiday" trie["節約用水"] = "save water" trie["節慶"] = "festival" trie["寵物"] = "pet" assert "節日" in trie # read print("寵物: ", trie["寵物"]) assert trie["寵物"] == "pet" # update print("trie['節日']: ", trie['節日']) trie["節日"] = "festival" print("trie['節日']: ", trie['節日']) assert trie["節日"] == "festival" # delete trie["節慶"] = None assert "節慶" not in trie
Reference:
1. 自然語言處理入門
留言
張貼留言