boost epressive用法2

最新推荐文章于 2026-07-03 09:54:53 发布

翻译最新推荐文章于 2026-07-03 09:54:53 发布 · 1.2k 阅读

收录于

c++

modern C++

本文通过两个实例展示了如何使用Boost库中的正则表达式来解析和标记文本中的特定模式，包括时间格式和HTML标签。

书接上回，我们接着讲如何找到匹配结果中的字串

实例5：从匹配的字串中标记指定的子表达式

#define _SCL_SECURE_NO_WARNINGS // 去除vs编译警告

#include <iostream>
#include <boost/xpressive/xpressive.hpp>

using namespace boost::xpressive;

int main()
{
	std::string str( "Eric: 4:40, Karl: 3:35, Francesca: 2:32" );

	// find a race time
	sregex time = sregex::compile( "(\\d):(\\d\\d)" );

	// for each match, the token iterator should first take the value of
	// the first marked sub-expression followed by the value of the second
	// marked sub-expression
	int const subs[] = { 1, 2 };

	sregex_token_iterator cur( str.begin(), str.end(), time, subs );
	sregex_token_iterator end;

	for( ; cur != end; ++cur )
	{
		std::cout << *cur << '\n';
	}
	/*
	result:
	4
	40
	3
	35
	2
	32
	*/

	// 另一种实现，实例4中类似的方法实现之
	sregex_iterator curI( str.begin(), str.end(), time );
	sregex_iterator endI;

	for( ; curI != endI; ++curI )
	{
		std::cout << (*curI)[1] << ":" <<  (*curI)[2] << '\n';
	}

	return 0;
}
/*
result:
4:40
3:35
2:32
*/

实例6：token_iterator的特殊应用

#define _SCL_SECURE_NO_WARNINGS // 去除vs编译警告

#include <iostream>
#include <boost/xpressive/xpressive.hpp>

using namespace boost::xpressive;

int main()
{
	std::string str( "Now <bold>is the time <i>for all good men</i> to come to the aid of their</bold> country." );

	// find a HTML tag
	//sregex html = '<' >> optional('/') >> +_w >> '>';
	sregex html = sregex::compile("</?(\\w*)>");

	// -1, 是一个特殊的token数组标志，表示所有不能匹配的字串
	sregex_token_iterator cur( str.begin(), str.end(), html, -1 );
	sregex_token_iterator end;

	for( ; cur != end; ++cur )
	{
		std::cout << '{' << *cur << '}';
	}
	std::cout << '\n';
	// result:{Now }{is the time }{for all good men}{ to come to the aid of their}{ country.}

	// 0, 是一个特殊的token数组标志，表示所有能匹配的字串
	sregex_token_iterator curI( str.begin(), str.end(), html, 0);

	for( ; curI != end; ++curI )
	{
		std::cout << '{' << *curI << '}';
	}
	std::cout << '\n';
	// result: {<bold>}{<i>}{</i>}{</bold>}

	// 1为元素的数组, 是一个特殊的token数组标志，表示所有能匹配的字串内部的第1个子串
	const int sub[] = {1};
	sregex_token_iterator cur2( str.begin(), str.end(), html, sub);

	for( ; cur2 != end; ++cur2 )
	{
		std::cout << '{' << *cur2 << '}';
	}
	std::cout << '\n';
	// result:{bold}{i}{i}{bold}

	return 0;
}

对应的基本表达式表

Perl	Static xpressive	Meaning
`.`	`_`	any character (assuming Perl's /s modifier).
`ab`	`a >> b`	sequencing of `a` and `b` sub-expressions.
`a\|b`	`a \| b`	alternation of `a` and `b` sub-expressions.
`(a)`	`(s1= a)`	group and capture a back-reference.
`(?:a)`	`(a)`	group and do not capture a back-reference.
`\1`	`s1`	a previously captured back-reference.
`a*`	`*a`	zero or more times, greedy.
`a+`	`+a`	one or more times, greedy.
`a?`	`!a`	zero or one time, greedy.
`a{n,m}`	`repeat<n,m>(a)`	between `n` and `m` times, greedy.
`a*?`	`-*a`	zero or more times, non-greedy.
`a+?`	`-+a`	one or more times, non-greedy.
`a??`	`-!a`	zero or one time, non-greedy.
`a{n,m}?`	`-repeat<n,m>(a)`	between `n` and `m` times, non-greedy.
`^`	`bos`	beginning of sequence assertion.
`$`	`eos`	end of sequence assertion.
`\b`	`_b`	word boundary assertion.
`\B`	`~_b`	not word boundary assertion.
`\n`	`_n`	literal newline.
`.`	`~_n`	any character except a literal newline (without Perl's /s modifier).
`\r?\n\|\r`	`_ln`	logical newline.
`[^\r\n]`	`~_ln`	any single character not a logical newline.
`\w`	`_w`	a word character, equivalent to set[alnum \| '_'].
`\W`	`~_w`	not a word character, equivalent to ~set[alnum \| '_'].
`\d`	`_d`	a digit character.
`\D`	`~_d`	not a digit character.
`\s`	`_s`	a space character.
`\S`	`~_s`	not a space character.
`[:alnum:]`	`alnum`	an alpha-numeric character.
`[:alpha:]`	`alpha`	an alphabetic character.
`[:blank:]`	`blank`	a horizontal white-space character.
`[:cntrl:]`	`cntrl`	a control character.
`[:digit:]`	`digit`	a digit character.
`[:graph:]`	`graph`	a graphable character.
`[:lower:]`	`lower`	a lower-case character.
`[:print:]`	`print`	a printing character.
`[:punct:]`	`punct`	a punctuation character.
`[:space:]`	`space`	a white-space character.
`[:upper:]`	`upper`	an upper-case character.
`[:xdigit:]`	`xdigit`	a hexadecimal digit character.
`[0-9]`	`range('0','9')`	characters in range `'0'` through `'9'`.
`[abc]`	`as_xpr('a') \| 'b' \|'c'`	characters `'a'`, `'b'`, or `'c'`.
`[abc]`	`(set= 'a','b','c')`	same as above
`[0-9abc]`	`set[ range('0','9') \| 'a' \| 'b' \| 'c' ]`	characters `'a'`, `'b'`, `'c'` or in range `'0'` through `'9'`.
`[0-9abc]`	`set[ range('0','9') \| (set= 'a','b','c') ]`	same as above
`[^abc]`	`~(set= 'a','b','c')`	not characters `'a'`, `'b'`, or `'c'`.
`(?i:stuff)`	`icase(stuff)`	match stuff disregarding case.
`(?>stuff)`	`keep(stuff)`	independent sub-expression, match stuff and turn off backtracking.
`(?=stuff)`	`before(stuff)`	positive look-ahead assertion, match if before stuff but don't include stuff in the match.
`(?!stuff)`	`~before(stuff)`	negative look-ahead assertion, match if not before stuff.
`(?<=stuff)`	`after(stuff)`	positive look-behind assertion, match if after stuff but don't include stuff in the match. (stuff must be constant-width.)
`(?<!stuff)`	`~after(stuff)`	negative look-behind assertion, match if not after stuff. (stuff must be constant-width.)
`(?P<name>stuff)`	`mark_tag` `name(`n`);` ... `(name=` `stuff)`	Create a named capture.
`(?P=name)`	`mark_tag` `name(`n`);` ... `name`	Refer back to a previously created named capture.

标签

#boost #正则表达式