怎么设置HttpWebRequest才能正确获取"http://alexa.chinaz.com/Alexa_More.asp?Domain=csdn.net"这个网页

benjaminli 2008-03-30 04:12:13
在浏览器中输入网址可以正常显示
下面的代码返回的网页是网站的默认网页
我觉得应该是服务器进行了设置
请问怎样设置才能让httpwebrequest模拟浏览器发送请求?
------------------------------------------------
HttpWebResponse webrep;
StreamReader strrd;
string content;

HttpWebRequest webR = (HttpWebRequest)WebRequest.Create(" http://alexa.chinaz.com/Alexa_More.asp?Domain=csdn.net");
webR.AllowAutoRedirect = false;
CookieContainer cc = new CookieContainer();
CookieCollection cookies = new CookieCollection();
webR.CookieContainer = cc;
webR.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727)";
webR.Accept = "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-silverlight, */*";
webR.Method = "GET";

if (webProxyURL != null)
{
WebProxy proxyObject = new WebProxy(webProxyURL, true); //设置web代理
webR.Proxy = proxyObject;
}
webR.KeepAlive = true ;
webrep = (HttpWebResponse)webR.GetResponse();
strrd = new StreamReader(webrep.GetResponseStream(), Encoding.Default);
content = strrd.ReadToEnd(); //读取抓取的网页内容
...全文
591 17 打赏 收藏 转发到动态 举报
写回复
用AI写文章
17 条回复
切换为时间正序
请发表友善的回复…
发表回复
liu_binq63 2008-05-09
  • 打赏
  • 举报
回复
哦,关注一下
孟子E章 2008-03-31
  • 打赏
  • 举报
回复
你清空Cookie之后,在浏览器里面直接打开你的地址也是一样的,得到的也是114msn的那个。
ojekleen 2008-03-31
  • 打赏
  • 举报
回复
mark
youngerch 2008-03-31
  • 打赏
  • 举报
回复
Up
benjaminli 2008-03-31
  • 打赏
  • 举报
回复
代码运行没有问题
但是抓到的是默认网页
我想抓取的是csdn.net的排名,但结果却是默认的114msn.com的排名
----------------------------------------------------------
请输入要查询的网址: HTTP://WWW. <input type="text" name="domain" size="30" value="csdn.net" class="input_form"> 

<input type="submit" value="查 询" class="submit_button">
-------------------------------------------------------------
<tr>
<td width="210" height="155" rowspan="7"><div id="view_pic"><a href="http://www.chinaz.com" target="_blank"><Img Src="/Images/No_Pic.gif" border="0" width="202" height="147"></a></div></td>
<td width="66">站点名称:</td>
<td width="100" align="left" title="114msn.com/"><eyfwfiv341608882>1</eyfwfiv341608882><qqtvobh22866>1</qqtvobh22866><vfq252>4</vfq252><swaliui50485>m</swaliui50485><uq7612260>s</uq7612260><bnnuwcug071385434>n</bnnuwcug071385434><hxkjxutr64>.</hxkjxutr64><hgltrkht4862>c</hgltrkht4862><xvbknatrn4087411>o</xvbknatrn4087411><uqebtldem32>m</uqebtldem32><rktwbcuy327405>/</rktwbcuy327405></td>
<td width="66">网站站长:</td>
<td width="100" align="left" title="不详">不详</td>
<td width="66">电子信箱:</td>
<td width="174" align="left" title="不详"><idx233441>不</idx233441><bocp4157453>详</bocp4157453></td>
</tr>
benjaminli 2008-03-31
  • 打赏
  • 举报
回复
终于解决了,hackztx正解!
谢谢大家的帮忙,特别感想net_lover和hackztx!
hackztx 2008-03-31
  • 打赏
  • 举报
回复
static string contentType = "application/x-www-form-urlencoded";
static string userAgent = "Mozilla/5.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1;.NET CLR 2.0.50727)";
hackztx 2008-03-31
  • 打赏
  • 举报
回复
的确是cookies的原因:

CookieContainer CC = Hello.Get.GetCookies("http://alexa.chinaz.com/Alexa_More.asp", null);
string html = Hello.Get.Html("http://alexa.chinaz.com/Alexa_More.asp?domain=csdn.net",null,CC, "GET");

得到正确的HTML


代码如下:


public static CookieContainer GetCookies(string link, string encoding)
{
if (encoding == null)
{
encoding = "gb2312";
}
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(link);
request.Method = "GET";
request.UserAgent = userAgent;
request.ContentType = contentType;
CookieContainer CC = new CookieContainer();
request.CookieContainer = CC;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
response.Close();
return CC;

}

public static string Html(string strLink, string encoding,CookieContainer CC, string method)
{
if (encoding == null)
{
encoding = "gb2312";
}
if (method == null)
{
method = "POST";
}
try
{
string html;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(strLink);
request.UserAgent = userAgent;
request.CookieContainer = CC;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
CC.Add(response.Cookies);
using (Stream sm = response.GetResponseStream())
{
using (StreamReader sr = new StreamReader(sm, Encoding.GetEncoding(encoding)))
{
html = sr.ReadToEnd();
sr.Close();
sm.Close();
response.Close();
}
}
return html;
}
catch
{
return "";
}
}

我这边成功了。。。。


yuelailiu 2008-03-30
  • 打赏
  • 举报
回复
路过学习中…
孟子E章 2008-03-30
  • 打赏
  • 举报
回复
代码测试没问题啊

HttpWebResponse webrep;
StreamReader strrd;
string content;

HttpWebRequest webR = (HttpWebRequest)WebRequest.Create("http://alexa.chinaz.com/Alexa_More.asp?Domain=csdn.net");
webR.AllowAutoRedirect = false;
CookieContainer cc = new CookieContainer();
CookieCollection cookies = new CookieCollection();
webR.CookieContainer = cc;
webR.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727)";
webR.Accept = "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-silverlight, */*";
webR.Method = "GET";
webR.Headers["Accept-Language"] = "zh-cn";
webR.Headers["Accept-Charset"] = "gb2312";
webR.KeepAlive = true;
webrep = (HttpWebResponse)webR.GetResponse();
strrd = new StreamReader(webrep.GetResponseStream(), Encoding.Default);
content = strrd.ReadToEnd(); //读取
this.textBox1.Text = content;
benjaminli 2008-03-30
  • 打赏
  • 举报
回复
自己顶一个
songshp39 2008-03-30
  • 打赏
  • 举报
回复
mark,up
gs0038 2008-03-30
  • 打赏
  • 举报
回复
关注问题
benjaminli 2008-03-30
  • 打赏
  • 举报
回复
也不行,而且我用httpLook查看浏览器打开时没有这个

除了cookie以外,现在我用程序抓取和用浏览器打开发送的请求只有一个区别:
浏览器:GET /Alexa_More.asp?Domain=csdn.net HTTP/1.1
程序:GET http://alexa.chinaz.com/Alexa_More.asp?Domain=csdn.net HTTP/1.1
不知道这个有没有影响
changjiangzhibin 2008-03-30
  • 打赏
  • 举报
回复

webR.Headers["Accept-Ranges"] = "bytes";
benjaminli 2008-03-30
  • 打赏
  • 举报
回复
谢谢关注!
但还是不行啊,我刚试了下,抓到的还是默认网页
孟子E章 2008-03-30
  • 打赏
  • 举报
回复
            webR.Headers["Accept-Language"] = "zh-cn";
webR.Headers["Accept-Charset"] = "gb2312";

就可以了

110,545

社区成员

发帖
与我相关
我的任务
社区描述
.NET技术 C#
社区管理员
  • C#
  • Web++
  • by_封爱
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告

让您成为最强悍的C#开发者

试试用AI创作助手写篇文章吧