PYTHON利用ElementTree解析XML数据

最新推荐文章于 2024-04-15 15:35:05 发布

原创最新推荐文章于 2024-04-15 15:35:05 发布 · 1.3k 阅读

1 ·

本内容遵循CC 4.0 BY-SA版权协议

收录于

PYTHON

本文详细介绍了使用Python的ElementTree库解析XML文件的方法，包括如何遍历XML节点、提取特定节点的数据、批量处理多个XML文件并将数据写入CSV。通过实际案例展示了find、findall、iter等函数的应用。

Python3.8

Conda

Python

Python 是一种高级、解释型、通用的编程语言，以其简洁易读的语法而闻名，适用于广泛的应用，包括Web开发、数据分析、人工智能和自动化脚本

xml中节点Element类的函数

1 tag 当前节点标签名
2 attrib 当前节点属性
3 text 当前节点内容
4 append 添加一个子节点
5 clear 清空节点
6 extend 为当前节点添加 n 个子节点
7 find 获取第一个寻找到的子节点
8 findall 获取所有的子节点
9 findtext 获取第一个寻找到的子节点的内容
10 get 获取当前节点的属性
11 insert 在当前节点创建子节点，然后插入指定位置
12 items 获取当前节点的所有属性，和字典中的items一样，内容都是健值对
13 iter 在根据节点名称寻找所有指定的节点，并返回一个迭代器
14 iterfind 获取所有指定的节点，并放在一个迭代器中
15 itertext 在子孙中根据节点名称寻找所有指定的节点的内容，并返回一个迭代器
16 keys 获取当前节点的所有属性的 key
17 makeelement 创建一个新节点
18 remove 删除某个节点
19 set 设置当前节点属性

当XML文件较大或者其中的子节点tag非常多的时候，一个一个获取是比较麻烦，我们可以通过find(‘nodeName’)或者findall(‘nodeName’)方法来查找指定tag的节点。

find(‘nodeName’)：表示在该节点下，查找其中第一个tag为nodeName的节点。
findall(‘nodeName’)：表示在该节点下，查找其中所有tag为nodeName的节点。

from xml.etree import ElementTree  as ET  #引入ElementTree的包
tree=ET.parse("test.xml")
root=tree.getroot()
for child in root:
    print(child.tag,child.text,child.attrib)
    for sub in child:
        print(sub.tag,sub.text,sub.attrib)
        for sub1 in sub:
            print(sub1.tag,sub1.text,sub1.attrib)
            for sub2 in sub1:
                print(sub2.tag,sub2.text,sub2.attrib)
                .........

单独提取相应的数据
案例一：
for value in root.iter("object"):   #直接通过标题获取内容、标题等
    print(value.tag, value.text, value.attrib)
 案例二：
    for child in root:
        print(root[0].tag)  #### 0、1、2、3.。。。代表目录级，tag：类似标题，text：类似内容，attrib：类似属性
   可以写成root[0].tag.get('eNB')、root[0][1]。。等
案例三：批量导入XML并写到TXT或CSV 
import os
import sys
import xml.etree.ElementTree as ET
import glob

indir='D:\\(D-drive)\\GZ\\VOLTE专项\\TOOL\\P编程\\程序\\2019\\XML\\inputfile\\'

outdir='D:\\(D-drive)\\GZ\\VOLTE专项\\TOOL\\P编程\\程序\\2019\\XML\\outputfile\\'
os.chdir(indir)
infile = os.listdir('.')
infile = glob.glob(str(infile)+'*.xml')

for i, file in enumerate(infile):

    file_txt = os.path.join(outdir, 'test.csv')
    f_w = open(file_txt, 'w')


    # actual parsing
    in_file = open(file)
    print(file)

    tree = ET.parse(in_file)
    root = tree.getroot()
    root1=root[1][0]
    for sub1 in root1:
        eci=sub1.attrib.get('id')
        for sub2 in sub1:
            vl=sub2.text
            print(eci,vl)
            f_w.write(eci +","+ vl+ '\n')

另一个例子

import os
import sys
import xml.etree.ElementTree as ET
import glob

indir='D:\\(D-drive)\\GZ\\VOLTE专项\\TOOL\\P编程\\程序\\2019\\XML\\inputfile\\'

outdir='D:\\(D-drive)\\GZ\\VOLTE专项\\TOOL\\P编程\\程序\\2019\\XML\\outputfile\\'
os.chdir(indir)
infile = os.listdir('.')
infile = glob.glob(str(infile)+'*.xml')
file_txt = os.path.join(outdir, 'test.csv')
f_w = open(file_txt, 'w')
f_w.write("。。。。。。。。。。。。。。。。。。。+'\n')
f_w.closed

for i, file in enumerate(infile):
    f_w = open(file_txt, 'a')

    # actual parsing
    in_file = open(file)
    print("正在解析文件:"+file)

    tree = ET.parse(in_file)
    root = tree.getroot()
    for obj1 in root.iter('eNB'):
        enb=obj1.attrib.get('userLabel')
        enbid=obj1.attrib.get('id')

    root1=root[1][0]
    for sub1 in root1:
        eci=sub1.attrib.get('id')
        rtime=sub1.attrib.get("TimeStamp")
        for sub2 in sub1:
            v1=sub2.text
            v1=v1.split()[0]+","+v1.split()[1]+","+v1.split()[4]+","+v1.split()[5]+","+v1.split()[7]+","+v1.split()[8]+","+v1.split()[9]+","+v1.split()[10]+","+v1.split()[11]
            #print(eci,v1)
            f_w.write(rtime+","+enb+","+enbid+","+eci +","+ v1+ '\n')

您可能感兴趣的与本文相关的镜像

Python3.8

Conda

Python

Python 是一种高级、解释型、通用的编程语言，以其简洁易读的语法而闻名，适用于广泛的应用，包括Web开发、数据分析、人工智能和自动化脚本