2016-09-07

Python小程序：google不翻墙

使用Python找出没被墙的google网址, 上google再也不用翻墙啦！！！

Let’s go

作为一个程序猿, 经常要用到搜索引擎进行搜索, 一般人用某度进行日常搜索基本也足够了, 但涉及到编程和技术方面的内容, 每次搜索出来的掺杂着各种广告, 某某培训班之类的, 找很久都找不到真正需要的东西, 费时费力, 多少次将遇到的问题放到google上第一页就能找到解决办法, 这也更加坚定了使用google的决心。作为一个程序猿, 尽可能使用google, 如果可以尽量用英文搜索, 将大大提高您的效率。程序员该如何使用搜索引擎？
然而, 在天朝google已经被墙了, 如果不想翻墙, 上google的正确姿势是什么呢？世界各国Google网址大全, 这么多个google我就不信会全部被墙, 可是要怎么知道哪个没被墙, 一个一个试吗？(手动再见)
是时候写个脚本run一下了, 需要用到两个包, requests 和 lxml, 访问网络和处理网页的神器, 使用xpath处理html, 废话不说, 上代码

# -*- coding: utf-8 -*-
import requests
from lxml import etree
requests.packages.urllib3.disable_warnings() # 禁止访问https时warning提示
    
def getGoogleFromNet():
  # 从豆瓣页面上提取出谷歌的网址并写入文件
  res = requests.get('https://www.douban.com/note/213070719/').text
  html = etree.HTML(res)
  allA = html.findall('.//div[@id="link-report"]/a')
  with open("google.txt", 'w') as fs:
    for a in allA:
      fs.write(a.get('href') + '\n')
    
def getGoogleFromFile():
  with open('google.txt', 'r') as fs:
    for a in fs:
      a = a.strip()  # 去掉每一行末尾的换行符
      try:
        requests.get(a, timeout = 0.2)  # 0.2秒内打不开就默认被墙, 时间根据网速设置
        print(a)                        # 输出未被墙的网址
      except Exception as e:
        print('Unable to reach!')
    
if __name__ == '__main__':
  getGoogleFromNet()
  getGoogleFromFile()

输出如下, 延长 timeout 将获得更多的输出

Get 22 google
https://www.google.je/
https://www.google.ad/
https://www.google.cz/
https://www.google.si/
https://www.google.me/
https://www.google.mk/
https://www.google.md/
https://www.google.la/
https://www.google.mv/
https://www.google.ml/
https://www.google.bf/
https://www.google.ne/
https://www.google.td/
https://www.google.cf/
https://www.google.ga/
https://www.google.it.ao/
https://www.google.co.tz/
https://www.google.co.mz/
https://www.google.mg/
https://www.google.co.zw/
https://www.google.tk/
https://www.google.gy/

主要链接

requests下载: https://pypi.python.org/pypi/requests
程序员该如何使用搜索引擎？: https://www.zhihu.com/question/28017993
世界各国Google网址大全: https://www.douban.com/note/213070719/
lxml下载: https://pypi.python.org/pypi/lxml
xpath教程: http://www.runoob.com/xpath/xpath-tutorial.html

本文标题:Python小程序：google不翻墙

文章作者:Jianwu Huang

发布时间:2016-09-07, 09:12:43

最后更新:2017-07-02, 22:14:13

原始链接:https://nevershow.github.io/2016/09/07/google/

许可协议: "署名-非商用-相同方式共享 4.0" 转载请保留原文链接及作者。