Web Scraping (2) Navigating Trees

9月 13, 2018

Web Scraping (2) Navigating Trees

1. BeautifulSoup Objects:

BeautifulSoup objects: BeautifulSoup() 回傳的物件
Tag objects： BeautifulSoup objects 呼叫 find() 和 findAll() 回傳的物件，帶有 get_text() 方法
NavigableString objects：標籤文字，不具有 get_text() 方法
Comment object：註解文字

2. 前置工作：

自定義一個簡單的 html:

from bs4 import BeautifulSoup

html = """
<html>
<head></head>
<body>
  <div id="content" class="my-body">
    <h3>heading 1</h3>
      <ol>
        <li>list 1</li>
        <li>list 2</li>
        <li>list 3</li>
      </ol>
    <h3>heading 2</h3>
  </div>
</body>
</html>
"""

bsObj = BeautifulSoup(html, features="lxml")

3. 子代：

for child in bsObj.find("div").children:
    print(child)

取得子代標籤內容：

for child in bsObj.find("ol").children:
    print(child.string)

4. 平輩：

向後迭代平輩：

for sibling in bsObj.find("li").next_siblings:
    print(sibling)

向前迭代平輩：

for sibling in bsObj.find("li", text="list 3").previous_siblings:
    print(sibling)

5. 親代：

bsbj.find("h3").parent

搜尋此網誌

簡單最重要

Web Scraping (2) Navigating Trees

1. BeautifulSoup Objects:

2. 前置工作：

3. 子代：

4. 平輩：

5. 親代：

留言

張貼留言

熱門文章

Chef (1) Install Chef Development Kit and Basic Ruby Syntax

用 Python Jupyter notebook 處理 CSV 檔