【请教】去除HTML标签

yangjinan0729 2009-07-28 06:58:22

小弟在做一个博客，首页显示文章列表格式当然必须得统一才行啦，但是css的优先级别不够，这大家都知道的了哈，
现在的问题就是怎么把文章里边的html标签去掉（去掉文字的样式，显示css设置的样式）
请问用什么方法？
请说的具体些或者给个好的参考
谢谢。。。
还有一个问题：
我搜索文章的时候比如搜索<p> 如果直接用 select的话肯定会搜到很多不相干的文章
这个怎么过滤一下下？

...全文

3308 19 打赏收藏转发到动态举报

写回复

用AI写文章

19 条回复

切换为时间正序

请发表友善的回复…

发表回复

wym3587 2011-05-21

打赏
举报

学习中。。。

ayan_smile 2010-11-10

打赏
举报

学习学习学习；

白云任去留 2009-08-28

打赏
举报

public string checkStr(string html)

      {

          System.Text.RegularExpressions.Regex regex1 = new System.Text.RegularExpressions.Regex(@"<script[\s\S]+</script *>", System.Text.RegularExpressions.RegexOptions.IgnoreCase);

          System.Text.RegularExpressions.Regex regex2 = new System.Text.RegularExpressions.Regex(@" href *= *[\s\S]*script *:", System.Text.RegularExpressions.RegexOptions.IgnoreCase);

          System.Text.RegularExpressions.Regex regex3 = new System.Text.RegularExpressions.Regex(@" no[\s\S]*=", System.Text.RegularExpressions.RegexOptions.IgnoreCase);

          System.Text.RegularExpressions.Regex regex4 = new System.Text.RegularExpressions.Regex(@"<iframe[\s\S]+</iframe *>", System.Text.RegularExpressions.RegexOptions.IgnoreCase);

          System.Text.RegularExpressions.Regex regex5 = new System.Text.RegularExpressions.Regex(@"<frameset[\s\S]+</frameset *>", System.Text.RegularExpressions.RegexOptions.IgnoreCase);

          System.Text.RegularExpressions.Regex regex6 = new System.Text.RegularExpressions.Regex(@"\<img[^\>]+\>", System.Text.RegularExpressions.RegexOptions.IgnoreCase); 

          System.Text.RegularExpressions.Regex regex7 = new System.Text.RegularExpressions.Regex(@"</p>", System.Text.RegularExpressions.RegexOptions.IgnoreCase);

          System.Text.RegularExpressions.Regex regex8 = new System.Text.RegularExpressions.Regex(@"<p>", System.Text.RegularExpressions.RegexOptions.IgnoreCase);

          System.Text.RegularExpressions.Regex regex9 = new System.Text.RegularExpressions.Regex(@"<[^>]*>", System.Text.RegularExpressions.RegexOptions.IgnoreCase);

          html = regex1.Replace(html, "");

          html = regex2.Replace(html, "");

          html = regex3.Replace(html, " _disibledevent=");

          html = regex4.Replace(html, "");

          html = regex5.Replace(html, "");

          html = regex6.Replace(html, "");

          html = regex7.Replace(html, "");

          html = regex8.Replace(html, "");

          html = regex9.Replace(html, "");

          html = html.Replace(" ", "");

          html = html.Replace("</strong>", "");

          html = html.Replace("<strong>", "");

          return html;

}

jerry_zuo 2009-08-28

打赏
举报

[Quote=引用 3 楼 winneraqun 的回复:]
这个我刚做过，可以去掉所有js脚本，css样式与html标签，只留标签中的文字。
想要联系我。
[/Quote]

这个确实有才！！哈哈。

kaishiyouxi 2009-08-28

打赏
举报

xuexi

kkai189 2009-07-30

打赏
举报

用正则表达式匹配去掉html标记

slimboy123 2009-07-30

打赏
举报



/// <summary>

        /// 删除字符串中指定的内容,不区分大小写

        /// </summary>

        /// <param name="src">要修改的字符串</param>

        /// <param name="pattern">要删除的正则表达式模式</param>

        /// <returns>已删除指定内容的字符串</returns>

        public static string DropIgnoreCase(string src, string pattern)

        {

            return ReplaceIgnoreCase(src, pattern, "");

        }

cpp2017 2009-07-30

打赏
举报



            string str = @"<p><span style=""background-color: #800000"">hahhhhdsadas</span></p>...</span>";

            str = System.Text.RegularExpressions.Regex.Replace(str,@"<(?!/?p(\s|>)+)[^>]*?>","", RegexOptions.IgnoreCase);

            Response.Write(str);

这个是正则表达式．

slimboy123 2009-07-30

打赏
举报



/// <summary>

        /// 去掉html内容中的指定的html标签

        /// </summary>

        /// <param name="content">html内容</param>

        /// <param name="tagName">html标签</param>

        /// <returns>去掉标签的内容</returns>

        public static string DropHtmlTag(string content, string tagName)

        {

            //去掉<tagname>和</tagname>

            return DropIgnoreCase(content, "<[/]{0,1}" + tagName + "[^\\>]*\\>");

        }



        /// <summary>

        /// 去掉html内容中全部标签

        /// </summary>

        /// <param name="content">html内容</param>

        /// <returns>去掉html标签的内容</returns>

        public static string DropHtmlTag(string content)

        {

            //去掉<*>

            return Drop(content, "<[^\\>]*>");

        }



  /// <summary>

        /// 删除字符串中指定的内容

        /// </summary>

        /// <param name="src">要修改的字符串</param>

        /// <param name="pattern">要删除的正则表达式模式</param>

        /// <returns>已删除指定内容的字符串</returns>

        public static string Drop(string src, string pattern)

        {

            return Replace(src, pattern, "");

        }

fus53 2009-07-30

打赏
举报

去html标签:
strResult = Regex.Replace(str原始串,"<.+?>","");

还想懒够 2009-07-30

打赏
举报

Server.HTMLEncode(字符串)

霸气小群哥 2009-07-30

打赏
举报

//删除脚本
Htmlstring = Regex.Replace(Htmlstring, @" <script(\s[^>]*?)?>[\s\S]*? </script>", "", RegexOptions.IgnoreCase);
//删除样式
Htmlstring = Regex.Replace(Htmlstring, @" <style(\s[^>]*?)?>[\s\S]*? </style>", "", RegexOptions.IgnoreCase);
//删除html标签
Htmlstring = Regex.Replace(Htmlstring, @" <(.[^>]*)>", "", RegexOptions.IgnoreCase);
我想知道你是用C#代码替换呢，还是用JS，我上面的那段代码是针对C#的。
Regex.Replace是C#中正则替换，第一个参数是要匹配的字符串，第二个参数是匹配的表达式，第二个参数是想要将匹配后的结果替换成的字符串，第四个参数是不区分大小写

wuyq11 2009-07-28

打赏
举报

private string StripHtml(string strHtml)
{
Regex objRegExp = new Regex("<(.|\n)+?>");
string strOutput = objRegExp.Replace(strHtml, "");
strOutput = strOutput.Replace("<", "<");
strOutput = strOutput.Replace(">", ">");
return strOutput;
}
参考

yangjinan0729 2009-07-28

打赏
举报

[Quote=引用 4 楼 winneraqun 的回复:]
算了，直接贴给你
//删除脚本
Htmlstring = Regex.Replace(Htmlstring, @" <script(\s[^>]*?)?>[\s\S]*? </script>", "", RegexOptions.IgnoreCase);
//删除样式
Htmlstring = Regex.Replace(Htmlstring, @" <style>[\s\S]*? </style>", "", RegexOptions.IgnoreCase);
//删除html标签
Htmlstring = Regex.Replace(Htmlstring, @" <(.[^>]*)>", "", RegexOptions.IgnoreCase);
[/Quote]看不懂呢，Regex.Replace，RegexOptions.IgnoreCase 这都是些什么东东啊麻烦您给说详细点吧

yangjinan0729 2009-07-28

打赏
举报

[Quote=引用 3 楼 winneraqun 的回复:]
这个我刚做过，可以去掉所有js脚本，css样式与html标签，只留标签中的文字。
想要联系我。
[/Quote]呵呵大哥还想卖给我呀

霸气小群哥 2009-07-28

打赏
举报

算了，直接贴给你
//删除脚本
Htmlstring = Regex.Replace(Htmlstring, @"<script(\s[^>]*?)?>[\s\S]*?</script>", "", RegexOptions.IgnoreCase);
//删除样式
Htmlstring = Regex.Replace(Htmlstring, @"<style>[\s\S]*?</style>", "", RegexOptions.IgnoreCase);
//删除html标签
Htmlstring = Regex.Replace(Htmlstring, @"<(.[^>]*)>", "", RegexOptions.IgnoreCase);

霸气小群哥 2009-07-28

打赏
举报

这个我刚做过，可以去掉所有js脚本，css样式与html标签，只留标签中的文字。
想要联系我。

yangjinan0729 2009-07-28

打赏
举报

比如这段代码

<p><span style="background-color: #800000">hahhhhdsadas</span></p>...</span>

去掉样式标记之后就光剩下hahhhhdsadas...
也可以保留<p></p>标记其余标记全部去掉

cpp2017 2009-07-28

打赏
举报

你给个HTML
源代码，然后说明要去掉什么内容．

如何用正则表达式去掉html标签使用正则表达式去掉html标签的方法常用的正则表达式是：/正则表达式，清除HTML标签，但要保留和...保留和标签，其他的清除 CSS布局HTML小编今天和大家分享高手给个正则表达式把除了 ...

与大牛老师互动+请教，请扫码入群

我在网上搜了这个标签，定义导航链接的部分，但是我把它去掉也没有看出来有什么影响的，我想是我暂时还没到它有用的时候。请教各位它到底有什么独特的作用呢？另外，在学HTML的时候想看出来各个标签的独特作用，但是...

jsp中fieldset标签是什么意思fieldset 元素可将表单内的相关元素分组。标签将表单内容的一部分打包，生成...html 中 fieldset标签作用是什么？网页链接网页链接组合表单中的相关元素.如何将fieldset标签垂直居中...

.NET社区

62,074

社区成员

669,022

社区内容

发帖

与我相关

我的任务

javascript云原生企业社区

社区管理员

加入社区

近7日
近30日
至今

加载中

查看更多榜单

社区公告

.NET 社区是一个围绕开源 .NET 的开放、热情、创新、包容的技术社区。社区致力于为广大 .NET 爱好者提供一个良好的知识共享、协同互助的 .NET 技术交流环境。我们尊重不同意见，支持健康理性的辩论和互动，反对歧视和攻击。

希望和大家一起共同营造一个活跃、友好的社区氛围。

试试用AI创作助手写篇文章吧

+ 用AI写文章