Hive分区表的创建使用

最新推荐文章于 2023-03-27 09:58:35 发布

原创最新推荐文章于 2023-03-27 09:58:35 发布 · 3.4k 阅读

40 ·

本内容遵循CC 4.0 BY-SA版权协议

收录于

Hive

本文深入探讨了Hive中分区表的概念与应用，包括分区表的创建、数据加载、查询及动态分区的实现。介绍了如何通过分区提升查询效率，以及多级分区的创建和数据操作。

分区表实际上就是对应一个 HDFS 文件系统上的独立的文件夹，该文件夹下是该分区所有的数据文件。Hive 中的分区就是分目录，把一个大的数据集根据业务需要分割成小的数据集。在查询时通过 WHERE 子句中的表达式选择查询所需要的指定的分区，这样的查询效率会提高很多。

Hive 建分区表语句：

create table dept_partition(
 deptno int, dname string, loc string
 )
 partitioned by (month string)
 row format delimited fields terminated by '\t';

加载数据到分区表中

hive> load data local inpath '/opt/datas/dept.txt' into table dept_partition partition(month='201907');
hive> load data local inpath '/opt/datas/dept.txt' into table dept_partition partition(month='201908');
hive> load data local inpath '/opt/datas/dept.txt' into table dept_partition partition(month='201909');

查询分区表中数据

# 单分区查询
hive> select * from dept_partition where month='201909';
# 多分区联合查询
hive> select * from dept_partition where month='201909'
 union all select * from dept_partition where month='201908'
 union all select * from dept_partition where month='201907';

增加分区

# 在HDFS文件系统中可以查看分区目录，增/删分区后，对应的文件目录也对增/删
# 新增单个分区
hive> alter table dept_partition add partition(month='201906') ;
# 同时创建多个分区
hive>alter table dept_partition 
add partition(month='201905') partition(month='201904')

查看分区表有多少分区

hive>show partitions dept_partition;

查看分区表结构

hive>desc formatted dept_partition;

创建二级分区表（还可以创建多级分区）：

hive> create table dept_partition2(
 deptno int, dname string, loc string
 )
 partitioned by (month string, day string)
 row format delimited fields terminated by '\t';

（1）加载数据到二级分区表中

hive> load data local inpath '/opt/datas/dept.txt' 
into table default.dept_partition2 partition(month='201909', day='13');

（2）查询分区数据

hive> select * from dept_partition2 where month='201909' and day='13';

动态分区

当使用静态分区时，在向分区表中插入数据时，我们需要指定具体分区列的值。此外，hive 还支持动态提供分区值（即在插入数据时，不指定具体的分区列值，而是仅仅指定分区字段）。动态分区在默认情况下是禁用的(在 hive2.3.4 版本中默认是开启的)，所以需要将 hive.exec.dynamic.partition 设为 true。默认情况下，用户必须至少指定一个静态分区列，这是为了避免意外覆盖分区。要禁用此限制，
可以设置分区模式为非严格模式(即将 hive.exec.dynamic.partition.mode 设为 nonstrict，默认值为 strict)。
可以选择在命令行终端方式设置：

SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;

1.准备数据 people.txt

001,tom,23,2019-03-16
002,jack,12,2019-03-13
003,robin,14,2018-08-13
004,justin,34,2018-10-12
005,jarry,24,2017-11-11
006,jasper,24,2017-12-12

2.创建表
(1)创建普通表，用于 load 数据

create table people(id int,name string,age int,start_date date)
row format delimited
fields terminated by ',';

(2)创建分区表

create table dynamic_people(id int,name string,age int,start_date date)
partitioned by (year string,month string)
row format delimited
fields terminated by ',';

3.加载数据
(1)向普通表加载数据

load data local inpath '/opt/datas/people.txt' into table people;

(2)向分区表中动态插入数据

insert into dynamic_people
partition(year,month)
select id,name,age,start_date,year(start_date),month(start_date)
from people;

4.查询数据
(1)查询 year=2018 的数据

select * from dynamic_people where year = 2018;

（2）查询 year=2017，month=11 的数据

select * from dynamic_people where year = 2017 and month = 11;

标签

#hive