HtmlParser示例及对比说明

本文介绍了对原HtmlParser的修改,创建了HtmlParserEx,增加了如设置Attributes、移除属性、查找及遍历等功能。通过示例展示了如何使用新功能,如修改元素属性、删除元素、遍历查找等。

低功耗蓝牙项目,需要一块懂省电的板

思澈 SF32LB52 芯片,BLE 协议栈深度优化,上手即开发

delphi html parser

代码是改自原wr960204的 HtmlParser ,因为自己的需求需要对 html 进行修改操作,但无奈只支持读取操作,所以在此基础上做了修改并命名为HtmlParserEx.pas与之区别。

IHtmlElement和THtmlElement的改变:

1、Attributes属性增加Set方法

2、TagName属性增加Set方法

3、增加Parent属性

4、增加RemoveAttr方法

5、增加Remove方法

6、增加RemoveChild方法

7、增加Find方法,此为SimpleCSSSelector的一个另名

8、_GetHtml不再直接附加FOrignal属性值,而是使用GetSelfHtml重新对修改后的元素进行赋值操作,并更新FOrignal的值

9、增加Text属性

IHtmlElementList和THtmlElementList的改变:

1、增加RemoveAll方法

2、增加Remove方法

3、增加Each方法

4、增加Text属性

修改后的新功能的一些使用法

IHtmlElement
EL.Attributes[‘class’] := ‘xxxx’;

 EL.TagName = 'a';

 EL.Remove; // 移除自己

 EL.RemoveChild(El2);

 El.Find('a');

IHtmlElementList
// 移除选择的元素
LHtml.Find(‘a’).RemoveAll;

// 查找并遍沥
// LHtml.Find(‘a’).Each(
procedure(AIndex: Integer; AEl: IHtmlElement)
begin
Writeln(‘Index=’, AIndex, ‘, href=’, AEl.Attributes[‘href’]);
end);

// 直接输出,仅选中的第一个元素
Writeln(LHtml.Find(‘title’).Text);

// 从文件加载示例
procedure Test;
var
LHtml: IHtmlElement;
LList: IHtmlElementList;
LStrStream: TStringStream;
begin
LStrStream := TStringStream.Create(’’, TEncoding.UTF8);
try
LStrStream.LoadFromFile(‘view-source_https___github.com_ying32_htmlparser.html’);
LHtml := ParserHTML(LStrStream.DataString);
if LHtml <> nil then
begin
LList := LHtml.SimpleCSSSelector(‘a’);
for LHtml in LList do
Writeln(‘url:’, lhtml.Attributes[‘href’]);
end;
finally
LStrStream.Free;
end;
end;

源代码下载

https://github.com/ying32/htmlparser

低功耗蓝牙项目,需要一块懂省电的板

思澈 SF32LB52 芯片,BLE 协议栈深度优化,上手即开发

The HTML pieces are: CData Sections: CData Sections, found in XML, are used to escape blocks of text containing characters which would otherwise be recognized as markup. A CData section begins with <![CDATA[and ends with ]]>. Comments: The Comments' contents are returned readily stripped of the comment markers. A comment starts with <!– and ends with –>. Document Type Definitions: A Document Type Definition defines the syntax of markup constructs. It begins with <!DOCTYPE and ends with >. HTML Processing Instructions: HTML Processing Instructions are a mechanism to capture platform-specific idioms. They start with <? and end with >. HTML-Tags: HTML-Tags are readily parsed into Name, Attributes and Values. DIHtmlParser recognizes Start Tags, End Tags and Empty Element Tags. Example: <TagName Attribute=“Value” />. Scripts: DIHtmlParser returns the contents between the <SCRIPT> and </SCRIPT> tags as simple text. The surrounding HTML tags are reported separately. Styles: DIHtmlParser returns the contents between the <STYLE> and </STYLE> tags as simple text. The surrounding HTML tags are reported separately. Text: Text is everything which is not markup. If the NormalizeWhiteSpace option is enabled, DIHtmlParser reduces multiple white space to a single character. Preformatted text wrapped by <PRE>and </PRE> is never normalized. Titles: DIHtmlParser returns the contents between the <TITLE> and </TITLE> tags as simple text. Titles are not normal text because they are parsed differently. XML Processing Instructions: XML Processing Instructions are similar to the HTML Processing Instructions with a slightly different syntax: They begin with <?XML and end with ?>. The Non-HTML pieces are: Active Server Pages (ASP): Active Server Page markup is often used to enclose scripting macros. It begins with <% and runs up to %>. Custom-Tags: Custom Tags are similar to HTML-Tags and to what Delphi's Help calls Transparent Tags. For DIHtmlParser, a Custom-Tags' name must begin with a user-define start character just as #like in <#Name Attribute=“Value” />. PHP: PHP is a powerful and popular scripting language. Its markup begins with <?PHP and ends with ?>. Server Side Includes (SSI): SSI, an extension of the Apache Web Server, starts with <!–# and continues up to –>. It allows to insert include files and other data into HTML documents on the fly. Parsing Efficiency
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值