Web Scraping (1) find and findAll
1. find and findAll:
find 和 findAll 的參數如下,後面再一一解釋:
findAll(tag, attributes, recursive, text, limit, keywords) find(tag, attributes, recursive, text, keywords)
前置動作:
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("https://en.wikipedia.org/wiki/Ward_Cunningham")
bsObj = BeautifulSoup(html.read(), features="lxml")
1. tag:
tag 就是你想找的 tag 名稱,比方說我們想找出所有 "h1", "h2", "h3":
headings = bsObj.findAll({"h1", "h2", "h3"})
for heading in headings:
print(heading)
2. attributes:
尋找 h1 元素中 id 和 class 是 "firstHeading" 的傢伙:
bsObj.find("h1", {"id": "firstHeading", "class": "firstHeading"})
<h1 class="firstHeading" id="firstHeading" lang="en">
Ward Cunningham
</h1>
取得標籤裡的內容:
firstHeading = bsObj.find("h1",
{"id": "firstHeading",
"class": "firstHeading"})
firstHeading.get_text()
Ward Cunningham
3. text:
例用標籤內容進行搜索:
bsObj.find(text="Ward Cunningham")
4. limit:
限制回傳個數:
bsObj.find_all("h1", limit=2)
留言
張貼留言