2、python之爬虫css属性解析以及2级爬取
代码】2、python之爬虫css属性解析以及2级爬取。
·
文章目录
欢迎使用PyChram编辑器
import pathlib
import uuid
from requests import get
from scrapy.selector import Selector
urlHttp = "https://www.51moot.net"
count = 0
for index_page in range(0, 5):
html = get(
"https://www.51moot.net/main/course?search_id=0&is_free=-1&page_index={0}".format(index_page)).content.decode(
"utf-8")
sel = Selector(text=html)
result = sel.css("div.course-details-cont-view ul li a::attr(href)").extract()
for x in result:
x = urlHttp + x
html = get(x).content.decode("utf-8")
sel = Selector(text=html)
strInfo = ""
title = sel.css("div.course-details-title-cont-text h2::text").extract()[0]
infos = sel.css(
"div.course-details-title-cont-text li.course-details-title-cont-text-chapter span::text").extract()
introduce = sel.css(
"div.course-details-view-list div.course-details-view-list-introduce-cont p::text").extract()[0]
strInfo += title + "\n"
strInfo += "主讲人:{0}\t章节数:{1}\t学习时长:{2}课时\t学习人数:{3}人\n".format(infos[0], infos[1], infos[2], infos[3])
strInfo += "课程简介:{0}".format(introduce)
try:
saveUrl = str.format("D:/save51Moot/{0}{1}", title, ".txt")
path = pathlib.Path(saveUrl)
if path.exists():
saveUrl = str.format("D:/save51Moot/{0}{1}{2}", title, uuid.uuid4(), ".txt")
file = open(saveUrl, "w", encoding="utf-8")
file.write(strInfo)
file.close()
count += 1
except:
print("异常链接:", x)
print("共计保存文件:", count, "个")

GitCode 天启AI是一款由 GitCode 团队打造的智能助手,基于先进的LLM(大语言模型)与多智能体 Agent 技术构建,致力于为用户提供高效、智能、多模态的创作与开发支持。它不仅支持自然语言对话,还具备处理文件、生成 PPT、撰写分析报告、开发 Web 应用等多项能力,真正做到“一句话,让 Al帮你完成复杂任务”。
更多推荐
所有评论(0)