php IIS日志分析搜索引擎爬虫记录程序- PHP

当前位置: 首页 > 图文教程 > 网络编程 > PHP > php IIS日志分析搜索引擎爬虫记录程序

PHP: php 多线程上下文中安全写文件实现代码; PHP类的使用实例代码讲解; 用php实现让页面只能被百度gogole蜘蛛访问的方法; php 学习笔记; PHP编程过程中需要了解的this,self,parent的区别; php 操作excel文件的方法小结; 使用PHP获取网络文件的实现代码; PHP 巧用数组降低程序的时间复杂度; php下将XML转换为数组; php 文件上传代码(限制jpg文件); php 无极分类(递归)实现代码; PHP 采集获取指定网址的内容; PHP 将图片按创建时间进行分类存储的实现代码; PHP 存储文本换行实现方法; PHP 批量更新网页内容实现代码; 用PHP查询搜索引擎排名位置的代码; 用php实现的获取网页中的图片并保存到本地的代码; php实现首页链接查询友情链接检查的代码; 处理php自动反斜杠的函数代码; php实现的遍历文件夹下所有文件,编辑删除

No. « ‹ 216 217 218 219 › »

技术文章搜索

关键字

PHP 中的 php IIS日志分析搜索引擎爬虫记录程序

出处:互联网 整理: 软晨网（RuanChen.com） 发布: 2009-09-13 浏览: 142 ::

收藏到网摘: n/a

php二分法在IP地址查询中的应用 PHP获取网站域名和地址的代码

由于最近比较忙，代码写得不怎么规范，界面也没有怎么美化，大家先用着吧，以后增加新功能会第一时间发布给大家！使用注意：
　　修改iis.php文件中iis日志的绝对路径
　　例如：$folder=”c:/windows/system32/logfiles/站点日志目录/”; //后面记得一定要带斜杠(/)。
　　( 用虚拟空间的不懂查看你的站点绝对路径?上传个探针查看!
　　直接查看法：http://站点域名/iis.php
　　本地查看法：把日志下载到本地 http://127.0.0.1/iis.php )
　　注意：
　　//站点日志目录，注意该目录必须要有站点用户读取权限!
　　//如果把日志下载到本地请修改143行的网址为您网站的网址，此操作不是必要操作，不影响分析结果。
　　//修改文件名称iis.php 需要同时修改对应代码 ctrl+h 把 iis.php全部替换成您要修改的文件名否则程序运行出错。
　　//如果iis日志文件过大，可能会导致程序超时!同时也不建议大家使用!
以下是PHP源代码：
[code]
<?php
/*
牛仔IIS日志蜘蛛爬行记录分析器 V1.1(PHP GB2312 版)
作者：牛仔
QQ：172379201
Email:[email protected]
*/
//===================================================
header("content-type:text/html; charset=gb2312");
//站点日志目录，注意该目录必须要有站点用户读取权限！
$folder="C:/WINDOWS/system32/LogFiles/W3SVC1155699908/";//后面记得一定要带斜杠！
$pagesize = 25;//设置分页显示条数！
//=========================
$type = addslashes($_GET['type']);
if ($type)$type = base64_decode($type);
$showfile = addslashes($_GET['showfile']);
$page = addslashes($_GET['page']);
if (!$page)$page=1;
//============================
//打开目录
if (!$type){
if (file_exists($folder))
{
$fp=opendir($folder);
while(false!=$file=readdir($fp))
{
if($file!='.' &&$file!='..')
{
$file="$file";
$arr_file[]=$file;
}
}
if(is_array($arr_file))
{
for ($i=count($arr_file)-1;$i>=0;$i--)
{
$indexstr.="<tr><td height=\"25\" width=\"25%\">".date("Y-m-d",filectime($folder.$arr_file[$i]))."</td>
<td height=\"25\" width=\"25%\" align=\"center\"><a href=\"iisfile.php?type=".base64_encode(Baiduspider)."&showfile=".$arr_file[$i]."\">百度(Baidu)</a></td>
<td height=\"25\" width=\"25%\" align=\"center\"><a href=\"iisfile.php?type=".base64_encode(Googlebot)."&showfile=".$arr_file[$i]."\">谷歌(Google)</a></td>
<td height=\"25\" width=\"25%\" align=\"center\"><a href=\"iisfile.php?type=".base64_encode(yahoo)."&showfile=".$arr_file[$i]."\">雅虎(yahoo)</a></td></tr>";
}
}
closedir($fp);
$html = indexhtml();
$copy = mycopy();
$html = str_replace("[showlog]",$indexstr,$html);
$html = str_replace("[copy]",$copy,$html);
echo $html;
}else{
echo "该日志目录不存在或权限不足，请检查设置！";
exit();
}
}elseif ($type=='Baiduspider'){
echo show($type,$folder,$showfile,$page,$pagesize);
}elseif ($type=='Googlebot'){
echo show($type,$folder,$showfile,$page,$pagesize);
}elseif ($type=='yahoo'){
echo show($type,$folder,$showfile,$page,$pagesize);
}
function show($type,$folder,$showfile,$page,$pagesize)
{
if ($type=='Baiduspider')
{
$title='百度';
}elseif ($type=='Googlebot'){
$title='谷歌';
}elseif ($type=='yahoo'){
$title='雅虎';
}
if ($type&&$folder&&$showfile)
{
if(file_exists($folder.$showfile))
{
$fp= fopen($folder.$showfile,"r");
}else{
echo "该日志文件不存在，请检查设置！";
exit;
}
$j=0;
$y=0;
$t=0;
$h=0;
while (!feof($fp))
{
$str = fgets($fp);
$str =iconv("UTF-8","GB2312//IGNORE",$str);
if(strpos($str,$type))
{
$j++;
$temp[].=$str;
$tmpcount = explode(" ",$str);
if ($tmpcount[11]==200)$t++;
if ($tmpcount[11]==304)$h++;
if ($tmpcount[11]==404)$y++;
}
}
fclose($fp);
$count = count($temp);
if ($page==1)
{
$countshow=$count;
$mynum = $count-$pagesize;
}else{
$countshow =$count-($page*$pagesize-$pagesize);
$mynum = $count-$page*$pagesize;
}
$pagecount =ceil(count($temp) / $pagesize);
if ($page>=$pagecount)
{
$mynum = $pagecount;
}
$m=0;
for ($i=$countshow-1;$i>=$mynum;$i--)
{
$num = explode(" ",$temp[$i]);
$show.="
<tr>
<td heigth\"20\">".$num[0]." ".$num[1]."</td>
<td>".$num[9]."</td>
<td><a href=\"".rawurlencode($num[5])."\" target=\"_blank\">".$num[5]."</a></td>
<td>".$num[11]."</td>
</tr>";
}
unset($temp);
$showpage = "<td colspan=\"4\" height=\"30\" align=\"center\">每页 ".$pagesize." 条当前".$page."/$pagecount";
$showpage.=" <a href=\"?type=".base64_encode($type)."&showfile=".$showfile."\">首页</a>";
if ($page!=1)
{
$showpage.=" <a href=\"?type=".base64_encode($type)."&showfile=".$showfile."&page=".($page-1)."\">上一页</a>";
}
if ($page!=$pagecount)
{
$showpage.=" <a href=\"?type=".base64_encode($type)."&showfile=".$showfile."&page=".($page+1)."\">下一页</a>";
$weei = " <a href=\"?type=".base64_encode($type)."&showfile=".$showfile."&page=".($pagecount)."\">尾页</a>";
}
$showpage.=$weei."</td>";
if ($show)
{
$html = pagehtml();
$copy = mycopy();
$htmltitle = "牛仔IIS日志蜘蛛爬行记录分析器-";//请保留，谢谢！
$html = str_replace("[title]",$title,$html);
$html = str_replace("[htmltitle]",$htmltitle,$html);
$html = str_replace("[show]",$show,$html);
$html = str_replace("[count]",$j,$html);
$html = str_replace(" 由于最近比较忙，代码写得不怎么规范，界面也没有怎么美化，大家先用着吧，以后增加新功能会第一时间发布给大家！ ",$showpage,$html);
$html = str_replace("[y]",$y,$html);
$html = str_replace("[t]",$t,$html);
$html = str_replace("[h]",$h,$html);
$html = str_replace("[copy]",$copy,$html);
return $html;
}
}
}
[/code]
打包下载 http://www.ruanchen.com/"con_ad1">

</tr>
</table>
[copy]
</body>
</html>';
}
function mycopy()
{
 return '<table border="1" width="100%" id="table2" cellspacing="0" cellpadding="0" style="border-collapse: collapse" height="402">
 <tr>
 <td height="35" bgcolor="#C0C0C0" align="center">分析说明</td>
 </tr>
 <tr>
 <td height="170">　 死链：表示蜘蛛访问的面页不存在或链接错误，爬行状态返回404。
 缓存：表示蜘蛛之前已经爬过的面页且该面页未更新过，蜘蛛缓存区已存在该文件，不再下载该面页内容。爬行状态返回304。
 正常：表示该面页蜘蛛访问正常，并已经下载。爬行状态返回200。
 注意：蜘蛛爬过的面页不一定会放出来，因为蜘蛛爬回去的数据须经过引擎规则筛选后才会放出来，至于详细请查看引擎收录帮助。
 　</td>
 </tr>
 <tr>
 <td> 程序名称：<a target="_blank" href="http://www.niuzi.com/">牛仔IIS日志蜘蛛爬行记录分析器 V1.1</a> 作者：牛仔
 QQ：172379201
 Email:17gd$163.com ($转换@)
 注意：本程序只供大家学习使用，请勿用作商业用途。</td>
 </tr>
</table>';
}
?>
[/code]

php二分法在IP地址查询中的应用 PHP获取网站域名和地址的代码

PHP 中的 php IIS日志分析搜索引擎爬虫记录程序

评论 (0) All