代理池怎么用 代理池


代理池怎么用 代理池


其实每次爬东西的时候,特怕IP被封,所以每次都要把时间延迟设置得长一点…
这次用Python搭建一个简单的代理池 。获取代理IP,然后验证其有效性 。
不过结果好像不是很理想,为什么西刺代理的高匿代理都能用???
不是说免费代理不好使吗?真的是黑人问号脸…
/ 01 / 代理获取
01 网页分析
【代理池怎么用 代理池】通过点击西刺代理国内高匿代理,得到网页信息 。
获取IP地址、端口、是否匿名、类型、速度 。
02 获取代理信息
这里记得一定要设置随机选取headers以及睡眠时间,因为我就没有设置,然后就被封了…
import time
import requests
from bs4 import BeautifulSoup
headers = {
\’User-Agent\’: \’Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36\’
}
for i in range(1, 36):
time.sleep(2)
print(\’第\’ + str(i) + \’页\’)
url = \’http://www.xicidaili.com/nn/\’ + str(i)
response = requests.get(url=url, headers=headers)
soup = BeautifulSoup(response.text, \’html.parser\’)
all_trs = soup.find_all(\’tr\’)
for tr in all_trs[1:]:
all_tds = tr.find_all(\’td\’)
ip = all_tds[1].get_text()
port = all_tds[2].get_text()
anonymous = all_tds[4].get_text()
type = all_tds[5].get_text()
for j in all_tds[6].find_all(\”div\”, attrs={\”class\”: \”bar\”}):
speed = j.get(\’title\’)
with open(\’ip.csv\’, \’a+\’, encoding=\’utf-8-sig\’) as f:
f.write(ip + \’,\’ + port + \’,\’ + anonymous + \’,\’ + type + \’,\’ + speed + \’\\n\’)
私信小编007即可获取小编数十套PDF的获取方式哦!
随机获取用户代理的代码,上面是没加下面这个函数的(结果就是被封,不过第二天又能用啦)
def get_user_agent():
\’\’\’
随机获取一个用户代理
\’\’\’
user_agents=[
\”Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)\”,
\”Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)\”,
\”Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)\”,
\”Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)\”,
\”Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)\”,
\”Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)\”,
\”Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)\”,
\”Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)\”,
\”Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6\”,
\”Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1\”,
\”Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0\”,

推荐阅读