Web Scraping (1) find and findAll

1. find and findAll:

find 和 findAll 的參數如下,後面再一一解釋:

findAll(tag, attributes, recursive, text, limit, keywords)
find(tag, attributes, recursive, text, keywords)

前置動作:

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("https://en.wikipedia.org/wiki/Ward_Cunningham")
bsObj = BeautifulSoup(html.read(), features="lxml")

1. tag:

tag 就是你想找的 tag 名稱,比方說我們想找出所有 "h1", "h2", "h3":

headings = bsObj.findAll({"h1", "h2", "h3"})
for heading in headings:
    print(heading)

2. attributes:

尋找 h1 元素中 id 和 class 是 "firstHeading" 的傢伙:

bsObj.find("h1", {"id": "firstHeading", "class": "firstHeading"})

<h1 class="firstHeading" id="firstHeading" lang="en">
Ward Cunningham
</h1>

取得標籤裡的內容:

firstHeading = bsObj.find("h1", 
                         {"id": "firstHeading", 
                          "class": "firstHeading"})
firstHeading.get_text()

Ward Cunningham

3. text:

例用標籤內容進行搜索:

bsObj.find(text="Ward Cunningham")

4. limit:

限制回傳個數:

bsObj.find_all("h1", limit=2)

留言

熱門文章