大数据学习栈记——Hive编程

原创于 2025-04-27 11:33:16 发布 · 942 阅读

20 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#大数据 #学习 #hive

大数据专栏收录该内容

12 篇文章

订阅专栏

本文介绍大数据技术中数据仓库Hive基础使用和编程。操作系统：Ubuntu24.04。

Hive常用Shell命令

Hive基本数据类型

首先，叙述一下HiveQL的基本数据类型。

Hive支持基本数据类型和复杂类型, 基本数据类型主要有数值类型(INT、FLOAT、DOUBLE ) 、布尔型和字符串, 复杂类型有三种:ARRAY、MAP 和 STRUCT。

（1）基本数据类型

TINYINT: 1个字节

SMALLINT: 2个字节

INT: 4个字节

BIGINT: 8个字节

BOOLEAN: TRUE/FALSE

FLOAT: 4个字节，单精度浮点型

DOUBLE: 8个字节，双精度浮点型STRING 字符串

（2）复杂数据类型

ARRAY: 有序字段

MAP: 无序字段

STRUCT: 一组命名的字段

常用的HiveQL操作命令

Hive常用的HiveQL操作命令主要包括：数据定义、数据操作。接下来详细介绍一下这些命令即用法

create：创建数据库、表

创建数据库hive

hive> create database hive;

创建数据库hive，因为hive已经存在，所以会抛出异常，加上if not exists关键字，则不会抛出异常

hive> create database if not exists hive;

创建表

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment],....
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name,...）
[SORTED BY (col_name [ASC|DESC], ...)］ INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)]
[AS select_statement]

在hive数据库中，创建表usr，含三个属性id，name，age

hive> use hive;

hive>create table if not exists usr(id bigint,name string,age int);

在hive数据库中，创建表usr，含三个属性id，name，age，存储路径为“/usr/local/hive/warehouse/hive/usr”

hive>create table if not exists hive.usr(id bigint,name string,age int)

>location ‘/usr/local/hive/warehouse/hive/usr’;

在hive数据库中，创建外部表usr，含三个属性id，name，age，可以读取路径“/usr/local/data”下以“，”分隔的数据。

hive>create external table if not exists hive.usr(id bigint,name string,age int)

>row format delimited fields terminated by ','

location ‘/usr/local/data’;

在hive数据库中，创建分区表usr，含三个属性id，name，age，还存在分区字段sex。

hive>create table hive.usr(id bigint,name string,age int) partitioned by(sex boolean);

在hive数据库中，创建分区表usr1，它通过复制表usr得到。

hive>create table if not exists usr1 like usr;

（2）drop：删除数据库、表

删除数据库hive，如果不存在会出现警告

hive> drop database hive;

删除数据库hive，因为有if exists关键字，即使不存在也不会抛出异常

hive>drop database if exists hive;

删除数据库hive，加上cascade关键字，可以删除当前数据库和该数据库中的表

hive> drop database if exists hive cascade;

出现错误

这个错误是由于尝试删除Hive数据库"hive"时，该数据库中存在一个或多个表。由于存在表，删除操作无法执行。

Hive数据库中的删除操作需要先删除数据库中的所有表，然后才能成功删除数据库。如果确定不再需要这些表，可以使用以下步骤来解决这个问题

1）删除数据库之前，先删除数据库中的所有表。可以使用以下命令删除数据库中的每个表：

DROP TABLE IF EXISTS <database_name>.<table_name>;

2）数据库：在确认数据库中不再存在任何表后，可以尝试删除数据库。使用以下命令删除数据库：

DROP DATABASE IF EXISTS <database_name>;

或者直接使用第三条句：

（3）alter：修改数据库、表

为hive数据库设置dbproperties键值对属性值来描述数据库属性信息

hive> alter database hive set dbproperties(‘edited-by’=’lily’);

修改表

重命名表usr为user

hive> alter table usr rename to user;

这个语句有些问题，直接写会容易报错，要在表名前面加上数据库，并且使用反引号来将关键字包围起来，避免出现错误

alter table `hive`.`usr` rename to `hive`.`user`

为表usr增加新分区

hive> alter table usr add if not exists partition(age=10);

删除表usr中分区

hive> alter table usr drop if exists partition(age=10);

把表usr中列名name修改为username，并把该列置于age列后（不能使用了

hive>alter table usr change name username string after age;

在对表usr分区字段之前，增加一个新列gender

hive>alter table usr add columns(gender boolean);

删除表usr中所有字段并重新指定新字段newid，newname，newage

hive>alter table usr replace columns(newid bigint,newname string,newage int);

为usr表设置tblproperties键值对属性值来描述表的属性信息

hive> alter table usr set tblproperties(‘notes’=’the columns in usr may be null except id’);

（4）show：查看数据库、表

 查看数据库

查看Hive中包含的所有数据库

hive> show databases;

查看Hive中以h开头的所有数据库

hive>show databases like 'h.*';

查看表和视图

查看数据库hive中所有表和视图

hive> use hive; #切换数据库

hive> show tables;

查看数据库hive中以u开头的所有表和视图

hive> show tables in hive like ‘u.*’;

（5）load：向表中装载数据

把目录’/usr/local/datas‘下的数据文件中的数据装载进usr表并覆盖原有数据

mkdir -p /usr/local/data

cd /usr/local/data

随便新建一个文件

vim fileA.txt

hive> load data local inpath '/usr/local/data' overwrite into table usr;

把目录’/usr/local/datas‘下的数据文件中的数据装载进usr表不覆盖原有数据

hive> load data local inpath ‘/usr/local/data’ into table usr;

把分布式文件系统目录’hdfs://master_srever/usr/local/data‘下的数据文件数据装载进usr表并覆盖原有数据

hive> load data inpath ‘hdfs://master_srever/usr/local/data’

>overwrite into table usr;

（6）insert：向表中插入数据或从表中导出数据

向表usr1中插入来自usr表的数据并覆盖原有数据

hive> insert overwrite table usr1 select * from usr where age=10;

执行时间非常漫长

Hive java编程

进入项目，在pom.xml中添加依赖，hive4.0.1之后，还需要添加thrift依赖。

<!-- Thrift 依赖 -->
<dependency>
    <groupId>org.apache.thrift</groupId>
    <artifactId>libthrift</artifactId>
    <version>0.16.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-jdbc</artifactId>
    <version>4.0.1</version>
</dependency>
<!-- 补充 Hive 依赖，避免类缺失问题 -->
<dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-service</artifactId>
    <version>4.0.1</version>
</dependency>
<dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-service-rpc</artifactId>
    <version>4.0.1</version>
</dependency>
<dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-common</artifactId>
    <version>4.0.1</version>
</dependency>
<dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
    <version>8.0.32</version>
</dependency>

编写程序

package Hive;

import java.sql.*;

public class HiveAPI {
    private static final String HIVE_DRIVER = "org.apache.hive.jdbc.HiveDriver";
    private static final String HIVE_CONNECTION_URL = "jdbc:hive2://192.168.179.150:10000/default";
    private static final String HIVE_USERNAME = "root";
    private static final String HIVE_PASSWORD = "259303";

    public static void main(String[] args) {
        try {
            // 加载Hive JDBC驱动
            Class.forName(HIVE_DRIVER);

            // 连接到Hive服务器
            Connection connection = DriverManager.getConnection(HIVE_CONNECTION_URL, HIVE_USERNAME, HIVE_PASSWORD);

            // 创建表
            createTable(connection);

            // 添加/导入数据
            loadData(connection);

            // 查询数据
            queryData(connection);

            // 关闭连接
            connection.close();
        } catch (ClassNotFoundException | SQLException e) {
            e.printStackTrace();
        }
    }

    private static void createTable(Connection connection) throws SQLException {
        try {
            Statement stmt = connection.createStatement();
            String sql = "create table if not exists my_table (id int, name string)";
            stmt.execute(sql);
            System.out.println("成功创建表");
        } catch (SQLException e) {
            System.out.println("未成功创建表: " + e.getMessage());
            throw e;
        }
    }

    private static void loadData(Connection connection) throws SQLException {
        try {
            Statement stmt = connection.createStatement();
            // String sql = "load data local inpath '/usr/local/data' into table my_table";
            String sql = "insert into table my_table values (1, 'Alice'), (2, 'Bob'), (3, 'Charlie')";
            stmt.execute(sql);
            System.out.println("成功添加数据");
        } catch (SQLException e) {
            System.out.println("未成功添加数据: " + e.getMessage());
            throw e;
        }
    }

    private static void queryData(Connection connection) throws SQLException {
        try {
            Statement stmt = connection.createStatement();
            String sql = "select * from my_table";
            ResultSet rs = stmt.executeQuery(sql);

            System.out.println("查询结果:");
            while (rs.next()) {
                int id = rs.getInt("id");
                String name = rs.getString("name");
                System.out.println("id: " + id + ", name: " + name);
            }
        } catch (SQLException e) {
            System.out.println("未查询到: " + e.getMessage());
            throw e;
        }
    }
}

运行时间很长，但是成功了