python提取知乎当前最热的问答内容

python教程评论287 views阅读模式
#-*- coding: utf-8 -*-
import urllib.request
import re
from _io import open
def yunpan_search():
    url = "https://www.zhihu.com/explore"
    req = urllib.request.Request(url, headers = {
        'Connection': 'Keep-Alive',
        'Accept': 'text/html, application/xhtml+xml, */*',
       'Accept-Language': 'en-US,en;q=0.8,zh-Hans-CN;q=0.5,zh-Hans;q=0.3',
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko'
})
    opener = urllib.request.urlopen(req)
    html = opener.read()
    html = html.decode('utf-8')
    rex = '(?<=\n).*?(?=<span class="answer-date-link-wrap">)'
    m = re.findall(rex,html,re.S)
    f = open('/root/Desktop/zhihu.txt','w')
    for i in m:
        f.write(i)
        f.write('\n\n')
    f.close()
    print("抓取成功!")
    file = open('/root/Desktop/zhihu.txt','r+')
    fullfile = file.readlines()
    text = []
    p = re.compile(r'\w*', re.L)
    pp = re.compile(r"(&;)*")
    for line in fullfile:
        lines = p.sub('',line)
        liness = pp.sub('',lines)
        text.append(liness)
    file.seek(0)
    file.truncate(0)
    file.writelines(text)
    file.close()
    print("处理成功!")
 
if __name__=='__main__':
    yunpan_search()</pre>				
				<div class='share layui-clear bdsharebuttonbox'>
					<li ><a href='javascript:;' data-cmd="weixin" class='wechat'><i class="layui-icon"></i>微信</a></li>
					<li ><a href='javascript:;' data-cmd="more" class='share-btn'><i class="layui-icon"></i>分享</a></li>
				</div>
				<img src="/static/images/article_wechat.jpg?1" style="margin-top: 30px;" alt="php教程最新课程二维码"/>
				<div class='tags layui-clear'>
				<li>相关标签:<a href="/search?word=代码片段,代码分享,php代码分享,java代码分享" target="_blank">代码片段,代码分享,PHP代码分享,Java代码分享</a> <a href="/search?word=ruby代码分享,python代码分享,html代码分享,css代" target="_blank">Ruby代码分享,Python代码分享,HTML代码分享,CSS代</a></li>
					<li class='line'>
												本文原创发布php教程 ,转载请注明出处,感谢您的尊重!
											</li>
				</div>
								<div class='page layui-clear'>
					<ul>
												<li>上一篇:<a href="/python-tutorials-338919.html">模拟登录封包python实现</a></li>
												<li>下一篇:<a href="/python-tutorials-338922.html">python实现将文本转换成语音的方法</a></li>
											</ul>
				</div>
			    				</div>
				<p class="article-relative-header">相关文章</p>
				<p class="article-relative-header">相关视频</p>
				<hr class="layui-clear">
				<ul class="article-relative-ul">
								<li><span class="layui-badge-dots"></span><a class="relevant" href="/python-tutorials-83516.html" target="_blank">在Django框架中运行Python应用全攻略</a></li>
								<li><span class="layui-badge-dots"></span><a class="relevant" href="/python-tutorials-83517.html" target="_blank">在Python的Django框架中创建和使用模版</a></li>
								<li><span class="layui-badge-dots"></span><a class="relevant" href="/python-tutorials-83526.html" target="_blank">python获取元素在数组中索引号的方法</a></li>
								<li><span class="layui-badge-dots"></span><a class="relevant" href="/python-tutorials-83531.html" target="_blank">浅谈python中截取字符函数strip,lstr...</a></li>
								<li><span class="layui-badge-dots"></span><a class="relevant" href="/python-tutorials-338921.html">python提取知乎当前最热的问答内容</a></li>
				</ul>
				<ul class="article-relative-ul">
										<li><span class="layui-badge-dots"></span><a class="relevant" href="/code/31616.html" target="_blank" title='轮播图案例讲解'>轮播图案例讲解</a></li>
										<li><span class="layui-badge-dots"></span><a class="relevant" href="/code/31615.html" target="_blank" title='轮播图案例的预备知识(3)'>轮播图案例的预备知识(3)</a></li>
										<li><span class="layui-badge-dots"></span><a class="relevant" href="/code/31614.html" target="_blank" title='轮播图案例的预备知识(2)'>轮播图案例的预备知识(2)</a></li>
										<li><span class="layui-badge-dots"></span><a class="relevant" href="/code/31613.html" target="_blank" title='轮播图案例的预备知识(1)'>轮播图案例的预备知识(1)</a></li>
									</ul>
				<div class="layui-clear"></div>
		</div>

		<div class="comment layui-clear">
				<div class="J_Header" id="J_Header">
					<p class="header-title">网友评论</p>
					<p class="header-protocol">文明上网理性发言,请遵守
						<a href="javascript:void(0);">新闻评论服务协议</a></p>
						<a class="article-comment-publish" href="javascript:void(0);">我要评论</a>
										</div>

								<div class="layui-row diy-page"></div>

				<div class="text-box layui-clear article_comment_textarea" >
						<div class="box-left">
							<li>
								<a href="javascript:login_trigger()">
									<img src="/static/images/user_avatar.jpg" alt="">
								</a>
							</li>
							<li><a href="javascript:login_trigger()"></a></li>
						</div>
						<div class="box-right">
							<textarea name="desc" placeholder="请输入内容" class="layui-textarea article_textarea">
							立即提交
						

企鹅博客
  • 本文由 发表于 2020年8月21日 07:11:57
  • 转载请务必保留本文链接:https://www.qieseo.com/339407.html

发表评论