Python NLP (3) Trie
查找的資料結構及複雜度:
- list: O(n)
- tree: O(log(n))
- dictionary: O(1)
通常查找會採用 tree 資料結構在空間及時間複雜度上取得平衡。
1. Node:
tree 結構的基礎是節點 (node)。節點通常會有 children 和 value 兩個屬性,children 代表節點底下的子節點, value 代表該節點底下紀錄的值。
class Node:
    def __init__(self, value) -> None:
        self._children = {}
        self._value = value
    
    def _add_child(self, char, value, overwrite=False):
        child = self._children.get(char)
        if child is None:
            child = Node(value) # 建立並新增節點
            self._children[char] = child
        elif overwrite:
            child._value = value
        return child
2. Trie:
建立 tree 並實現 __getitem__ 和 __setitem__ 方法。
__getitem__ 會在 tree 中以 key 的每個 char 向下搜尋返回最後一個節點所存放的值。
__setitem__ 會在 tree 中以 key 的每個 char 向下新增節點並設定最後一個節點的值。
class Trie(Node):
    def __init__(self) -> None:
        super().__init__(None)
    def __contains__(self, key):
        return self[key] is not None
    def __getitem__(self, key):
        state = self
        for char in key:
            state = state._children.get(char)
            if state is None:
                return None
        return state._value
    def __setitem__(self, key, value):
        state = self
        for i, char in enumerate(key):
            if i < len(key) - 1:
                state = state._add_child(char, None, False)
            else:
                state = state._add_child(char, value, True)
3. 建 tree 查找測試:
CRUD 測試:
if __name__ == "__main__":
    trie = Trie()
   
    # create
    trie["節點"] = "node"
    trie["節日"] = "holiday"
    trie["節約用水"] = "save water"
    trie["節慶"] = "festival"
    trie["寵物"] = "pet"
    assert "節日" in trie
   
    # read
    print("寵物: ", trie["寵物"])
    assert trie["寵物"] == "pet"
    # update
    print("trie['節日']: ", trie['節日'])
    trie["節日"] = "festival"
    print("trie['節日']: ", trie['節日'])
    assert trie["節日"] == "festival"
    
    # delete
    trie["節慶"] = None
    assert "節慶" not in trie
Reference:
1. 自然語言處理入門
 
留言
張貼留言