330beautifulsoup-蒲公英云

330beautifulsoup

小灰灰 2022-10-01 09:49 105阅读 0赞

实战例子1：获取博客发布日期（beautifulsoup的解析功能） import requests from bs4 import BeautifulSoup #调用模块 headers={‘user-agent’:’Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36’}

for i in range(1,4): link=’blog.csdn.net/weixin_4218… #网页的翻页 r=requests.get(link,headers=headers) #访问获取网页

soup=BeautifulSoup(r.text,'lxml')           #用lxml分析网页信息
dates=soup.find_all('span',class_='date')    #标签span，的，date类，得到列表
for x in dates:
    date=x.text.strip()        #转换为字符串副本
    print(date)
复制代码

BeautifulSoup的其它功能：遍历文档树： BeautifulSoup的HTML代码美化功能：

soup=BeautifulSoup(r.text,’lxml)

print(soup.prettify())
复制代码

遍历文档树

遍历文档树并获取span标签：

soup.header.span

div标签的所有子节点，返回列表：
    soup.header.div.contents
children方法获取所有子标签：
    soup.header.div.children
复制代码

所有子子孙孙节点：

soup.header.div,descontents 1

获取父节点：
    soup.header.div.a.parent
复制代码

搜索文档树：

搜索文档树
复制代码

获取所有h开头的标签，结合正则表达式，匹配字符串开头的^：

list=soup.find_all(re.compile(^h))

for tag_name in list:
        print(tag_name)
复制代码

1 2 3

## select
复制代码

通过标签查找：

soup.select(div a)
复制代码

soup.select(‘div>a’)

转载于//juejin.im/post/5c9f22a7f265da30bd3e4285

330beautifulsoup

发表评论取消回复

还没有评论，来说两句吧...

相关阅读

相关 BeautifulSoup库

相关 BeautifulSoup学习

相关 beautifulsoup函数

相关 BeautifulSoup 笔记

相关 330beautifulsoup

相关 BeautifulSoup实战

相关 leetcode 330. Patching Array

相关 CodeForces - 330D Biridian Forest

相关 BeautifulSoup

相关 BeautifulSoup模块

随便看看

synchronized详解

Ubuntu安装yum失败-2

艾宾浩斯遗忘规律学习记东西（诗词，单词等）

rebalance的详细过程

重磅！《阿里前端工程师面试手册》，附 PDF & PPT 下载

kafka 消费者offset记录位置和方式

教程文章

热评文章

1江湖小白之一起学Python （二）爬取数据的保存

2Java Shiro：简化身份验证和授权的安全框架

3Java中try()catch{}的使用方法

4Swagger注解-@ApiModel 和 @ApiModelProperty

5windows下强制杀死tomcat进程

6uni-app 条形码(一维码)/二维码生成实现

标签列表