XShell文件到本地，以及C++处理

原创已于 2024-11-19 14:01:52 修改 · 1k 阅读

11 ·

本内容遵循CC 4.0 BY-SA版权协议

GEO检测

于 2024-11-18 10:36:47 首次发布

输入pwd获取路径，得到路径后，例如：/local/home/project1

输入sz /local/home/project1/你需要的文件.log

选择你要保存的目录这样子就到本地了

随后对文件进行处理，需要对特定行进行保留，创建C++项目。

代码如下

#include <iostream>
#include <fstream>
#include <string>
#include <regex>

int main() {
	std::string logFilePath = "C:\xxxxx.log";
	std::string outputFilePath = "flow3.txt";
	std::regex flowRegex(R"((\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})\|INFO\|\|CatchEvent\|\s*flow\s*=(\d+))");

	// Open the input log file.
	std::ifstream logFile(logFilePath);
	std::ofstream outputFile(outputFilePath);

	if (!logFile.is_open() || !outputFile.is_open()) {
		std::cerr << "Unable to open file." << std::endl;
		return 1;
	}

	std::string line;
	while (std::getline(logFile, line)) {
		std::smatch matches;
		if (std::regex_search(line, matches, flowRegex) && matches.size() == 3) {
			// Write the timestamp and flow value to the output file.
			outputFile << matches[1].str() << " " << matches[2].str() << std::endl;
		}
	}

	logFile.close();
	outputFile.close();

	std::cout << "Flow data has been extracted to " << outputFilePath << std::endl;
	return 0;
}

这里用到了正则表达式,匹配一个时间戳（格式为YYYY-MM-DD HH:MM:SS.sss），后面跟着"|INFO||CatchEvent|"，然后是"flow"关键字，最后是数值（即flow的值）。

使用std::smatch定义一个匹配结果变量matches。

std::regex_search(line, matches, flowRegex)：在当前行中搜索正则表达式定义的模式

检查是否匹配成功并且匹配结果至少有3个子匹配（整个表达式和两个捕获组）。

matches.size() == 3：确保匹配结果包含两个捕获组（时间戳和flow值）加上整个匹配。

如果匹配成功，将时间戳和flow值写入输出文件。

std::smatch 是 C++ 标准库中的一个类模板，它用于存储正则表达式搜索算法的结果。它是 std::match_results 的一个 typedef，专门用于存储 std::string 类型的数据匹配结果。当你使用正则表达式进行搜索时，std::smatch 可以保存匹配的子串和捕获组。

在正则表达式搜索中，std::smatch 对象可以包含以下信息：

整个匹配：正则表达式匹配的整个字符串。
捕获组：正则表达式中括号 () 捕获的子表达式匹配的子串。

std::smatch 提供了以下方法：

size()：返回匹配结果的数量，包括整个匹配和所有捕获组。
empty()：检查是否有匹配结果。
str()：返回指定索引位置的匹配字符串。
length()：返回指定索引位置的匹配字符串的长度。
position()：返回指定索引位置的匹配字符串在原始字符串中的起始位置。

代码中，std::smatch 被用来存储 std::regex_search 函数的结果。这个函数搜索日志文件的每一行，寻找与正则表达式匹配的行。如果找到了匹配，std::smatch 对象 matches 将包含整个匹配和两个捕获组：时间戳和 flow 值。然后可以使用 matches[1].str() 和 matches[2].str() 来访问这些捕获组，并将其写入输出文件。

std::smatch matches;
if (std::regex_search(line, matches, flowRegex) && matches.size() == 3) {
    // matches[0] contains the whole match
    // matches[1] contains the first capture group (timestamp)
    // matches[2] contains the second capture group (flow value)
    outputFile << matches[1].str() << " " << matches[2].str() << std::endl;
}

在这个例子中，matches[0] 总是包含整个匹配的字符串，matches[1] 和 matches[2] 分别包含正则表达式中第一个和第二个捕获组的匹配结果。在正则表达式中，每个括号 () 都定义了一个捕获组，所以 (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}) 和 (\d+) 分别是两个捕获组。

std::regex_search 是 C++ 标准库中的一个函数模板，它用于在给定的字符串中搜索与正则表达式匹配的部分。这个函数是正则表达式库的一部分，提供了对正则表达式的支持，允许进行复杂的文本匹配和搜索。

函数原型

template <class BidirectionalIterator, class Allocator, class charT, class traits>
bool regex_search (BidirectionalIterator first, BidirectionalIterator last,
                  match_results<BidirectionalIterator, Allocator>& m,
                  const basic_regex<charT, traits>& e,
                  regex_constants::match_flag_type flags = regex_constants::match_default);

参数

first, last：定义要搜索的字符串范围的迭代器。
m：一个 match_results 对象，用于存储匹配结果。
e：要搜索的正则表达式。
flags：指定搜索标志，用于控制匹配行为（可选，默认为 match_default）。

返回值

true：如果在指定范围内找到匹配，则返回 true。
false：如果在指定范围内没有找到匹配，则返回 false。

使用示例

以下是 std::regex_search 的一个使用示例，它在字符串中搜索与正则表达式匹配的部分：

#include <iostream>
#include <string>
#include <regex>

int main() {
    std::string text = "Hello, world!";
    std::regex pattern("world");  // 正则表达式模式
    std::smatch matches;          // 用于存储匹配结果

    if (std::regex_search(text, matches, pattern)) {
        std::cout << "Match found: " << matches.str(0) << std::endl;
    } else {
        std::cout << "No match found." << std::endl;
    }

    return 0;
}

在这个例子中，std::regex_search 在字符串 text 中搜索正则表达式 pattern 定义的模式。如果找到了匹配，matches 对象将包含匹配结果，matches.str(0) 将返回整个匹配的字符串。

匹配标志

std::regex_search 可以接受一个可选的 flags 参数，用于控制匹配行为。常用的匹配标志包括：

regex_constants::match_default：默认匹配行为。
regex_constants::match_not_null：不匹配空字符串。
regex_constants::match_continuous：要求匹配必须从上一个匹配的结束位置开始。
regex_constants::match_prev_avail：要求当前位置之前的位置必须是有效的。

通过组合这些标志，可以精细控制匹配行为，以满足特定的需求。

std::regex_search 是处理文本数据时非常强大的工具，它使得在 C++ 程序中实现复杂的文本搜索和匹配变得简单易行。

但是C++这个运行好慢，为了快一点，用python试一试

import re

# 文件路径
input_file_path = r'C:xxxx'
output_file_path = r'C:xxxxxx'

# 打开输入的文本文件和输出的文本文件
with open(input_file_path, 'r') as infile, open(output_file_path, 'w') as outfile:
    # 逐行读取日志文件
    for line in infile:
        # 使用正则表达式查找包含'flow'的行
        if re.search(r'flow\s*=\s*\d+', line):
            # 将行分割为时间戳和flow值
            parts = line.split('|')
            if len(parts) > 2 and '=' in parts[-1]:
                # 时间戳在第一块，去除前后空格
                timestamp = parts[0].strip()
                # flow值在最后一块，去除前后空格并提取数字
                flow_value = parts[-1].strip().split('=')[1]
                # 将时间戳和flow值写入输出文件
                outfile.write(f'{timestamp} {flow_value}\n')

print(f"Flow data has been extracted to {output_file_path}.")

简洁明了快很多，运行速度也很快，C++中的std::regex库和Python中的re库在实现上可能有所不同，这可能影响到正则表达式的匹配速度。Python的re库在处理简单的正则表达式时通常非常快。C++代码中，每次找到匹配项时都会打开和关闭文件，这可能导致不必要的性能开销。而在Python代码中，文件在循环外部打开和关闭，这样可以减少文件打开和关闭的次数。

讲解一下这段python代码：导入Python的正则表达式模块re，使用with语句同时打开两个文件：输入日志文件用于读取（'r'模式），输出文本文件用于写入（'w'模式）。with语句确保文件在操作完成后会被正确关闭。infile和outfile是文件对象的别名。