wordpress 后台文章批量抓取id

1、这个脚本主要是用于seo的链接提交,以及seo的主动推送,主要用于的是wordpress的网站站长。

2、这个脚本写着玩的,其实获取文章id的方法很多,最便捷的应该是通过数据库获取,此脚本涉及到一些爬虫的新知识,算是一个demo,算是一个记录,为后面写脚本提供参考,

脚本如下:

#coding:utf-8
#author:http://www.chenhaifei.com/?p=728
import requests #打开
import time,random #控制时间
from bs4 import BeautifulSoup as bs #把html结构化
headers={
‘Accept’:’text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3′,
‘Accept-Encoding’:’gzip, deflate’,
‘Accept-Language’:’zh-CN,zh;q=0.9,ja;q=0.8,en;q=0.7′,
‘Cache-Control’:’max-age=0′,
‘Connection’:’keep-alive’,
‘Cookie’:’wordpress_114b3dd3c91a3b1577f7d113e922e0cd=oovipooko%7C1598493389%7C1c4Mxjcw6azHDyzhQD1RWHRuTHIGsiyIoLHjAKDo5AF%7C77341cad27fb581b52db534d159a36db9268230259629f9eb832bd6f71b4b95f; wordpress_test_cookie=WP+Cookie+check; wordpress_logged_in_114b3dd3c91a3b1577f7d113e922e0cd=oovipooko%7C1598493389%7C1c4Mxjcw6azHDyzhQD1RWHRuTHIGsiyIoLHjAKDo5AF%7Cb56c4c6b78a2a12e7de7ef0c54cbabe47c88d1dfb8248aaf9eb658f8fd6baf27; wp-settings-1=editor%3Dtinymce%26hidetb%3D1%26libraryContent%3Dbrowse%26post_dfw%3Doff%26imgsize%3Dfull; wp-settings-time-1=1597283789′,
‘Host’:’www.chenhaifei.com’,
‘Upgrade-Insecure-Requests’:’1′,
‘User-Agent’: ‘Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36’
}
for i in range(1,7):#遍历1-6
url=’http://www.chenhaifei.com/wp-admin/edit.php?paged=’+str(i)#拼接url
print (url)
cont = requests.get(url,timeout=120,headers=headers).content#get方法获取内容
soup = bs(cont, “html.parser”)#BeautifulSoup把html结构化
table=soup.find(‘tbody’,{‘id’:”the-list”})
tr=table.find_all(‘tr’)
for t in tr:#遍历tr列表
get_id=t[‘id’]#获取字典表里面id的值
id_replace=get_id.replace(“post-“,””)#replace去除post-字段
lianjie=’http://www.chenhaifei.com/?p=’+str(id_replace)#拼接网站url
txt=open(‘lianjie.txt’,’a’)
txt.write(lianjie+’\n’)
txt.close()
time.sleep(3)

脚本截图如下:

ps:脚本如果直接复制粘贴的话会有语法错误,需要自行修改,由于流量限制不放脚本文件下载,如果有这方面需求的可以联系博主本人(ps:有偿)。

未经允许不得转载:陈海飞博客 » wordpress 后台文章批量抓取id

此文由“快兔兔AI采集器”自动生成,目的为演示采集器效果,若侵权请及时联系删除。

原文链接:https://www.chenhaifei.com/?p=735

更多内容