python 自动提交和抓取网页- Python - 软晨网（RuanChen.com）

当前位置: 首页 > 图文教程 > 脚本技术 > Python > python 自动提交和抓取网页

Python: Python 调用DLL操作抄表机; python 输出一个两行字符的变量; 用Python的urllib库提交WEB表单; Python 返回汉字的汉语拼音; python mysqldb连接数据库; python 判断自定义对象类型; Python字符串的encode与decode研究心得乱码问题解决方法; python 获取文件列表（或是目录例表）; python启动办公软件进程(word、excel、ppt、以及wps的et、wps、wpp); python 获取et和excel的版本号; python ElementTree 基本读操作示例; python 判断一个进程是否存在; python thread 并发且顺序运行示例; 合并Excel工作薄中成绩表的VBA代码，非常适合教育一线的朋友; Python 解析XML文件; asp Http_Referer,Server_Name和Http_Host; Python 自动安装 Rising 杀毒软件; 用python实现的可以拷贝或剪切一个文件列表中的所有文件; Python 正则表达式操作指南; python ip正则式

No. « ‹ 2 3 4 5 › »

技术文章搜索

关键字

Python 中的 python 自动提交和抓取网页

出处:互联网 整理: 软晨网（RuanChen.com） 发布: 2009-09-11 浏览: 507 ::

收藏到网摘: n/a

python self,cls,decorator的理解 python 域名分析工具实现代码

最近在研究怎么样做个自动发帖器，要完成这个工具难度蛮大的，验证码就是一个大问题(还没有想到解决办法哦，不管了），先要解决的是如何抓取，分析和提交页面的问题。下面是用python写的，使用lxml来做html分析，从网上看到的，说是分析速度最快的哦，不过没有验证过。好了，上代码。

复制代码代码如下:

 
import urllib 
import urllib2 
import urlparse 
import lxml.html 
def url_with_query(url, values): 
parts = urlparse.urlparse(url) 
rest, (query, frag) = parts[:-2], parts[-2:] 
return urlparse.urlunparse(rest + (urllib.urlencode(values), None)) 
def make_open_http(): 
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor()) 
opener.addheaders = [] # pretend we're a human -- don't do this 
def open_http(method, url, values={}): 
if method == "POST": 
return opener.open(url, urllib.urlencode(values)) 
else: 
return opener.open(url_with_query(url, values)) 
return open_http 
open_http = make_open_http() 
tree = lxml.html.fromstring(open_http("GET", "http://www.ruanchen.com").read()) 
form = tree.forms[0] 
form.fields["q"] = "eplussoft" 
form.action="http://www.ruanchen.com/" 
response = lxml.html.submit_form(form,open_http=open_http) 
html = response.read() 
doc = lxml.html.fromstring(html) 
lxml.html.open_in_browser(doc) 

恩，验证码是个大问题。还有今天看了一些百度贴吧上的东西，更是坏了心情，它的验证码是用ajax取的图片，这就更加麻烦了。不过好像现在大多数的论坛和博客的验证码都是这样的了。这样第一次抓取下来的页面就不会包含有验证码图片了，更不要说分析验证码图片了。要解决的问题还是很多的。。。

python self,cls,decorator的理解 python 域名分析工具实现代码

Python 中的 python 自动提交和抓取网页

评论 (0) All