Web Scraping (2) Navigating Trees
1. BeautifulSoup Objects:
- BeautifulSoup objects: BeautifulSoup() 回傳的物件
- Tag objects: BeautifulSoup objects 呼叫 find() 和 findAll() 回傳的物件,帶有 get_text() 方法
- NavigableString objects: 標籤文字,不具有 get_text() 方法
- Comment object: 註解文字
2. 前置工作:
自定義一個簡單的 html:
from bs4 import BeautifulSoup html = """ <html> <head></head> <body> <div id="content" class="my-body"> <h3>heading 1</h3> <ol> <li>list 1</li> <li>list 2</li> <li>list 3</li> </ol> <h3>heading 2</h3> </div> </body> </html> """ bsObj = BeautifulSoup(html, features="lxml")
3. 子代:
for child in bsObj.find("div").children: print(child)
取得子代標籤內容:
for child in bsObj.find("ol").children: print(child.string)
4. 平輩:
向後迭代平輩:
for sibling in bsObj.find("li").next_siblings: print(sibling)
向前迭代平輩:
for sibling in bsObj.find("li", text="list 3").previous_siblings: print(sibling)
5. 親代:
bsbj.find("h3").parent
留言
張貼留言