Web Scraping (1) find and findAll
1. find and findAll:
find 和 findAll 的參數如下,後面再一一解釋:
findAll(tag, attributes, recursive, text, limit, keywords) find(tag, attributes, recursive, text, keywords)
前置動作:
from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("https://en.wikipedia.org/wiki/Ward_Cunningham") bsObj = BeautifulSoup(html.read(), features="lxml")
1. tag:
tag 就是你想找的 tag 名稱,比方說我們想找出所有 "h1", "h2", "h3":
headings = bsObj.findAll({"h1", "h2", "h3"}) for heading in headings: print(heading)
2. attributes:
尋找 h1 元素中 id 和 class 是 "firstHeading" 的傢伙:
bsObj.find("h1", {"id": "firstHeading", "class": "firstHeading"}) <h1 class="firstHeading" id="firstHeading" lang="en"> Ward Cunningham </h1>
取得標籤裡的內容:
firstHeading = bsObj.find("h1", {"id": "firstHeading", "class": "firstHeading"}) firstHeading.get_text() Ward Cunningham
3. text:
例用標籤內容進行搜索:
bsObj.find(text="Ward Cunningham")
4. limit:
限制回傳個數:
bsObj.find_all("h1", limit=2)
留言
張貼留言