书接上回,我们接着讲如何找到匹配结果中的字串
实例5:从匹配的字串中标记指定的子表达式
#define _SCL_SECURE_NO_WARNINGS // 去除vs编译警告
#include <iostream>
#include <boost/xpressive/xpressive.hpp>
using namespace boost::xpressive;
int main()
{
std::string str( "Eric: 4:40, Karl: 3:35, Francesca: 2:32" );
// find a race time
sregex time = sregex::compile( "(\\d):(\\d\\d)" );
// for each match, the token iterator should first take the value of
// the first marked sub-expression followed by the value of the second
// marked sub-expression
int const subs[] = { 1, 2 };
sregex_token_iterator cur( str.begin(), str.end(), time, subs );
sregex_token_iterator end;
for( ; cur != end; ++cur )
{
std::cout << *cur << '\n';
}
/*
result:
4
40
3
35
2
32
*/
// 另一种实现,实例4中类似的方法实现之
sregex_iterator curI( str.begin(), str.end(), time );
sregex_iterator endI;
for( ; curI != endI; ++curI )
{
std::cout << (*curI)[1] << ":" << (*curI)[2] << '\n';
}
return 0;
}
/*
result:
4:40
3:35
2:32
*/
实例6:token_iterator的特殊应用
#define _SCL_SECURE_NO_WARNINGS // 去除vs编译警告
#include <iostream>
#include <boost/xpressive/xpressive.hpp>
using namespace boost::xpressive;
int main()
{
std::string str( "Now <bold>is the time <i>for all good men</i> to come to the aid of their</bold> country." );
// find a HTML tag
//sregex html = '<' >> optional('/') >> +_w >> '>';
sregex html = sregex::compile("</?(\\w*)>");
// -1, 是一个特殊的token数组标志,表示所有不能匹配的字串
sregex_token_iterator cur( str.begin(), str.end(), html, -1 );
sregex_token_iterator end;
for( ; cur != end; ++cur )
{
std::cout << '{' << *cur << '}';
}
std::cout << '\n';
// result:{Now }{is the time }{for all good men}{ to come to the aid of their}{ country.}
// 0, 是一个特殊的token数组标志,表示所有能匹配的字串
sregex_token_iterator curI( str.begin(), str.end(), html, 0);
for( ; curI != end; ++curI )
{
std::cout << '{' << *curI << '}';
}
std::cout << '\n';
// result: {<bold>}{<i>}{</i>}{</bold>}
// 1为元素的数组, 是一个特殊的token数组标志,表示所有能匹配的字串内部的第1个子串
const int sub[] = {1};
sregex_token_iterator cur2( str.begin(), str.end(), html, sub);
for( ; cur2 != end; ++cur2 )
{
std::cout << '{' << *cur2 << '}';
}
std::cout << '\n';
// result:{bold}{i}{i}{bold}
return 0;
}
对应的基本表达式表
| Perl | Static xpressive | Meaning |
|---|---|---|
|
| any character (assuming Perl's /s modifier). | |
|
|
| sequencing of |
|
|
| alternation of |
|
|
| group and capture a back-reference. |
|
|
| group and do not capture a back-reference. |
|
| a previously captured back-reference. | |
|
|
| zero or more times, greedy. |
|
|
| one or more times, greedy. |
|
|
| zero or one time, greedy. |
|
|
| between |
|
|
| zero or more times, non-greedy. |
|
|
| one or more times, non-greedy. |
|
|
| zero or one time, non-greedy. |
|
|
| between |
|
| beginning of sequence assertion. | |
|
| end of sequence assertion. | |
|
| word boundary assertion. | |
|
|
| not word boundary assertion. |
|
| literal newline. | |
|
|
| any character except a literal newline (without Perl's /s modifier). |
|
| logical newline. | |
|
|
| any single character not a logical newline. |
|
| a word character, equivalent to set[alnum | '_']. | |
|
|
| not a word character, equivalent to ~set[alnum | '_']. |
|
| a digit character. | |
|
|
| not a digit character. |
|
| a space character. | |
|
|
| not a space character. |
|
| an alpha-numeric character. | |
|
| an alphabetic character. | |
|
| a horizontal white-space character. | |
|
| a control character. | |
|
| a digit character. | |
|
| a graphable character. | |
|
| a lower-case character. | |
|
| a printing character. | |
|
| a punctuation character. | |
|
| a white-space character. | |
|
| an upper-case character. | |
|
| a hexadecimal digit character. | |
|
|
| characters in range |
|
|
| characters |
|
|
| same as above |
|
| characters | |
|
| same as above | |
|
|
| not characters |
|
|
| match stuff disregarding case. |
|
|
| independent sub-expression, match stuff and turn off backtracking. |
|
|
| positive look-ahead assertion, match if before stuff but don't include stuff in the match. |
|
|
| negative look-ahead assertion, match if not before stuff. |
|
|
| positive look-behind assertion, match if after stuff but don't include stuff in the match. (stuff must be constant-width.) |
|
|
| negative look-behind assertion, match if not after stuff. (stuff must be constant-width.) |
|
|
| Create a named capture. |
|
|
| Refer back to a previously created named capture. |
本文通过两个实例展示了如何使用Boost库中的正则表达式来解析和标记文本中的特定模式,包括时间格式和HTML标签。
1530

被折叠的 条评论
为什么被折叠?



