求简单的html语法分析
我将hmtl的源代码保存在一个txt中,希望用vb读这个txt,找出里面所有位于<a href="http://…….html">之间的内容(其实只要http://…….html就够了),然后写入并保存在另一个txt中,每行一个url地址。请高手赐教。 问题点数:40、回复次数:9Top
1 楼libralibra(食食物者为俊杰: U don't try, U don't know)回复于 2005-02-03 12:12:42 得分 0
用href="(包括前引号),然后split一下
舍掉数组第一个元素,然后每一个都从开始读取字母赋值给变量
读到引号(后引号)截至,over了
我只能想这个了,呵呵,高手指正
Top
2 楼trademark2004()回复于 2005-02-03 12:23:16 得分 0
请高手砸个代码过来,我太菜了。。。:(Top
3 楼liuxiaoyi666(MSMVP 小猪妹荣誉马甲之八卦兔子)回复于 2005-02-03 12:24:44 得分 5
用webbrowser么??
如果是这样的话有一个简单的方法webbrowser.document.getElementByTagName("A")这样的话就简单了
详细方法可参考MSDN的
还有方法就是用正则匹配,如果不懂正则可参考http://blog.csdn.net/liuxiaoyi666的关于正则的那个,我的blogTop
4 楼trademark2004()回复于 2005-02-03 12:28:43 得分 0
我想最好就是直接读取txt,用哪些什么Instr,left,right,mid什么的。。。可以吗?
希望详细点。Top
5 楼trademark2004()回复于 2005-02-03 12:30:46 得分 0
另外,我无法修改含源程序的那个txt,因为url可能有几千个Top
6 楼liuxiaoyi666(MSMVP 小猪妹荣誉马甲之八卦兔子)回复于 2005-02-03 13:31:33 得分 0
可以用instr去定位,用mid去寻找字符串
不过html是结构化的,为什么不用呢??Top
7 楼trademark2004()回复于 2005-02-03 13:38:35 得分 0
因为我不会用,一点思路都没有,如果真的简单写,请具体说说怎么用?Top
8 楼libralibra(食食物者为俊杰: U don't try, U don't know)回复于 2005-02-03 13:46:57 得分 35
一个按钮,一个listbox,一个commanddialog
Option Explicit
Dim sTmp As String
Dim url As Variant
Private Sub Command1_Click()
cd.Filter = "HTML源文件|*.txt"
cd.ShowOpen
If Len(cd.FileName) <> 0 Then
Dim objFSO, objText
Const ForReading = 1, ForWriting = 2, ForAppending = 3, TristateUseDefault = -2
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objText = objFSO.OpenTextFile(cd.FileName, ForReading, False, TristateUseDefault)
sTmp = objText.ReadAll()
Call objText.Close
url = Split(sTmp, "href=")
If UBound(url) = 0 Then
MsgBox "没有http链接,重新选择文件!"
Exit Sub
Else
Dim i As Integer
For i = 1 To UBound(url)
Dim tmpArr As Variant
tmpArr = Split(url(i), """")
If UBound(tmpArr) <> 0 Then
If UCase(Left$(tmpArr(1), 4)) = "HTTP" Then List1.AddItem tmpArr(1)
End If
Next
End If
End If
End SubTop
9 楼libralibra(食食物者为俊杰: U don't try, U don't know)回复于 2005-02-03 13:50:56 得分 0
这是从sohu首页抓下来的一部分,好多好多
http://61.135.132.134/goto.php?aid=27&pid=228
http://61.135.132.134/goto.php?aid=27&pid=222
http://61.135.132.134/goto.php?aid=27&pid=223
http://www.wap.sohu.com/tuiguang/05newyear/?fr=w-zt-05newyear
http://61.135.132.134/goto.php?aid=27&pid=225
http://mms.sohu.com/pic/?ref=251004
http://mms.sohu.com/ani/?ref=251004
http://mms.sohu.com/hifi/?ref=251007
http://sms.sohu.com/ems/0/6/hits/index.html
http://61.135.132.134/goto.php?aid=103&pid=223
http://61.135.132.134/goto.php?aid=103&pid=198
http://61.135.132.134/goto.php?aid=103&pid=191
http://61.135.132.134/goto.php?aid=103&pid=210
http://www.wap.sohu.com/tuiguang/free/?fr=w-free
http://store.sohu.com
http://store.sohu.com/shudie.jsp
http://store.sohu.com/subject/result/whole-missworld3-good.jsp
http://store.sohu.com/subject/result/whole-qingrenjie-good.jsp
http://buy.sohu.com
http://store.sohu.com/CatalogYs.jsp?autoid=1
http://store.sohu.com/Catalog.jsp?autoid=2
http://images2.sohu.com/image/zhuanti/gift/kitty/Kitty.htm
http://store.sohu.com/Catalog.jsp?autoid=119
http://store.sohu.com/Catalog.jsp?autoid=504
http://store.sohu.com/subject/result/fushi-casio1-good.jsp
http://store.sohu.com/Catalog.jsp?autoid=598
http://store.sohu.com/Catalog.jsp?autoid=541
http://store.sohu.com/Catalog.jsp?autoid=1
........
Top




