Hive Sql基本操作
1.hive里面不显示列名
2.注释问题2.1 MySQL中的注释2.2 Hive中的注释3.乱码问题3.1 修改表字段注解和表注解3.2 修改分区字段注解3.3 修改索引注解3.4 修改metastore的连接URL4.数据库基本操作4.1 创建带属性的库4.2 显示数据库详情:4.3 查看正在使用哪个库4.4 查看数据库的详情语句5.删除数据库5.1 删除库原则5.2 删除不含表的数据库5.3 删除含表数据库6.切换库及创建表6.1 切换库6.2 创建表7.表详情及表操作7.1 表详情7.2 表操作8.分区8.1 查看分区8.2 添加分区8.3 修改分区8.4 删除分区
1.hive里面不显示列名
问题:
修改hive-site
配置:
<property>
<name>hive.cli.print.current.db</name> <!--默认是true-->
<value>true</value>
</property>
<property>
<name>hive.cli.print.header</name> <!--默认是false-->
<value>true</value>
</property>
修改后:
2.注释问题
Hive中支持创建数据库时带注释。
create database if not exists t2 comment 'l h';
MySQL中是表带注释。当然Hive中也行。
2.1 MySQL中的注释
MySQL创建带列与表注释的表:
create table test(id int primary key auto_increment comment '设置主键自增',name varchar(100) comment '列注释')comment '表注释';
查看列注释
查看全部列注释:
mysql> show full columns from test;
+-------+--------------+-------------------+------+-----+---------+----------------+---------------------------------+--------------------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------+--------------+-------------------+------+-----+---------+----------------+---------------------------------+--------------------+
| id | int(11) | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | 设置主键自增 |
| name | varchar(100) | latin1_swedish_ci | YES | | NULL | | select,insert,update,references | 列注释 |
+-------+--------------+-------------------+------+-----+---------+----------------+---------------------------------+--------------------+
2 rows in set (0.00 sec)
查看特定列注释:
mysql> show full columns from test where field='id';
+-------+---------+-----------+------+-----+---------+----------------+---------------------------------+--------------------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------+---------+-----------+------+-----+---------+----------------+---------------------------------+--------------------+
| id | int(11) | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | 设置主键自增 |
+-------+---------+-----------+------+-----+---------+----------------+---------------------------------+--------------------+
1 row in set (0.01 sec)
查看表注释
第一种:
mysql> show create table test;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| test | CREATE TABLE `test` (`id` int(11) NOT NULL AUTO_INCREMENT COMMENT '设置主键自增',`name` varchar(100) DEFAULT NULL COMMENT '列注释',PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='表注释' |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
第二种:
mysql> SELECT table_name,table_comment FROM information_schema.tables WHERE table_schema = 'test' AND table_name ='test'-> ;
+------------+---------------+
| table_name | table_comment |
+------------+---------------+
| test | 表注释 |
+------------+---------------+
1 row in set (0.01 sec)
修改表注释
mysql> alter table test comment '表注释修改';
Query OK, 0 rows affected (0.11 sec)
Records: 0 Duplicates: 0 Warnings: 0mysql> SELECT table_name,table_comment FROM information_schema.tables WHERE table_schema = 'test' AND table_name ='test'-> ;
+------------+-----------------+
| table_name | table_comment |
+------------+-----------------+
| test | 表注释修改 |
+------------+-----------------+
1 row in set (0.00 sec)
修改列注释
mysql> ALTER TABLE test MODIFY COLUMN name VARCHAR(100) COMMENT '修改列注释';
Query OK, 0 rows affected (0.10 sec)
Records: 0 Duplicates: 0 Warnings: 0mysql> show full columns from test where field='name';
+-------+--------------+-------------------+------+-----+---------+-------+---------------------------------+-----------------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------+--------------+-------------------+------+-----+---------+-------+---------------------------------+-----------------+
| name | varchar(100) | latin1_swedish_ci | YES | | NULL | | select,insert,update,references | 修改列注释 |
+-------+--------------+-------------------+------+-----+---------+-------+---------------------------------+-----------------+
1 row in set (0.00 sec)
2.2 Hive中的注释
创建表注释与列注释
hive (t1)> create table test(id int comment '用户ID',price string comment '价格') comment '测试表' ;
OK
Time taken: 1.146 seconds
hive (t1)> desc test;
OK
col_name data_type comment
id int 用户ID
price string 价格
修改注释
hive (t1)> alter table test change id id int comment 'id';
OK
Time taken: 0.355 seconds
hive (t1)> desc test;
OK
col_name data_type comment
id int id
price string 价格
Time taken: 0.039 seconds, Fetched: 2 row(s)
3.乱码问题
当添加注释如果是中文,会乱码,因为MySQL中的元数据出现乱码。需要修改metastore中的表、分区、视图的编码。
3.1 修改表字段注解和表注解
alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
3.2 修改分区字段注解
alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8 ;
alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) character set utf8;
3.3 修改索引注解
alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
3.4 修改metastore的连接URL
修改hive-site
配置文件:
<property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://city:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false&amp;useUnicode=true&amp;characterEncoding=UTF-8</value><description>JDBC connect string for a JDBC metastore</description>
</property>
经过上述操作,就可以完成中文显示。
4.数据库基本操作
4.1 创建带属性的库
create database if not exists t3 with dbproperties('creator'='hadoop','date'='2019-06-11');
4.2 显示数据库详情:
desc database [extended] dbname;
示例:
hive (t1)> desc database t3;
OK
db_name comment location owner_name owner_type parameters
t3 hdfs://city:9012/user/hive/warehouse/t3.db hadoop USER
Time taken: 0.009 seconds, Fetched: 1 row(s)
hive (t1)> desc database extended t3;
OK
db_name comment location owner_name owner_type parameters
t3 hdfs://city:9012/user/hive/warehouse/t3.db hadoop USER {date=2019-06-11, creator=hadoop}
Time taken: 0.011 seconds, Fetched: 1 row(s)
4.3 查看正在使用哪个库
hive (t1)> select current_database();
OK
_c0
t1
Time taken: 0.954 seconds, Fetched: 1 row(s)
4.4 查看数据库的详情语句
hive (t1)> show create database t3;
OK
createdb_stmt
CREATE DATABASE `t3`
LOCATION'hdfs://city:9012/user/hive/warehouse/t3.db'
WITH DBPROPERTIES ('creator'='hadoop', 'date'='2019-06-11')
Time taken: 0.016 seconds, Fetched: 6 row(s)
5.1 删除库原则
hive不允许删除包含表的数据库
hive (t1)> drop database t1;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database t1 is not empty. One or more tables exist.)
两种解决方法:
-
手动删除表,再删除库;
-
使用cascade关键字
5.2 删除不含表的数据库
检查是否包含表
hive (t1)> show tables in t1;
OK
tab_name
test
Time taken: 0.02 seconds, Fetched: 1 row(s)
hive (t1)> show tables in t2;
OK
tab_name
Time taken: 0.017 seconds
发现t1数据库中有表,t2数据库中无表。
直接删除数据库t2:
hive (t1)> drop database t2;
OK
Time taken: 0.223 seconds
5.3 删除含表数据库
hive (t1)> drop database if exists t1 cascade;
OK
Time taken: 0.611 seconds
6.1 切换库
使用use
即可。
6.2 创建表
分为内部表、外部表、分区表、分桶表。默认为内部表。
内部表
create table student(id int, name string, sex string, age int,department string) row format delimited fields terminated by ",";
这种就属于默认的内部表。
外部表
create external table student_ext
(id int, name string, sex string, age int,department string) row format delimited fields terminated by "," location "/hive/student";
内部表与外部表删除的一个区别:当使用drop table xxxx
时候,hdfs文件系统上如果是内部表,则直接会被删除,同时本地数据库里面的表也会被删除;而如果是外部表,则只有本地数据库表删除,hdfs文件系统上不会被删除。
存储路径不一样:内部表存储在/user/hive/warehouse/xx.db
里面,而外部表存储在自定指定的location里面。(注:该location为hdfs目录文件系统)。如果创建时候不指定,默认跟内部表的hdfs路径一致,当然内部表也可以指定位置。
分区表
分区表实际上就是一种外部表,如果某张表是分区表。那么每个分区的定义,其实就表现为了这张表的数据存储目录下的一个子目录。如果是分区表。那么数据文件一定要存储在某个分区中,而不能直接存储在表中。
hive (t1)> create external table student_ptn(id int, name string, sex string, age int,department string) partitioned by (city string) location "/hive/student_ptn";
OK
Time taken: 0.455 seconds
hive (t1)> alter table student_ptn add partition(city="beijing");
OK
Time taken: 0.406 seconds
hive (t1)> alter table student_ptn add partition(city="shenzhen");
OK
Time taken: 0.525 seconds
分桶表
create external table student_bck(id int, name string, sex string, age int,department string) clustered by(id) sorted by (id asc, name desc) into 4 buckets location "/hive/student_bck";
Hive使用CTAS创建表
读取本地文件:
注意:如果local不写,则从hdfs中查找文件。
load data local inpath "/home/light/mysql/student.txt" into table student;
使用CTAS创建表:
create table student_ctas as select * from student where id < 95012;
MySQL使用CTAS创建表
load data local inpath
得变为load data local infile
。
load data local infile "/home/light/mysql/student.txt" into table student fields terminated by ',';
CTAS创建如下:
hive (t1)> load data local inpath "/home/light/mysql/student.txt" into table student;
Loading data to table t1.student
OK
Time taken: 0.637 seconds
hive (t1)> select * from student;
OK
student.id student.name student.sex student.age student.department
95002 刘晨 女 19 IS
95017 王风娟 女 18 IS
95018 王一 女 19 IS
95013 冯伟 男 21 CS
95014 王小丽 女 19 CS
95019 邢小丽 女 19 IS
95020 赵钱 男 21 IS
95003 王敏 女 22 MA
95004 张立 男 19 IS
95012 孙花 女 20 CS
95010 孔小涛 男 19 CS
95005 刘刚 男 18 MA
95006 孙庆 男 23 CS
95007 易思玲 女 19 MA
95008 李娜 女 18 CS
95021 周二 男 17 MA
95022 郑明 男 20 MA
95001 李勇 男 20 CS
95011 包小柏 男 18 MA
95009 梦圆圆 女 18 MA
95015 王君 男 18 MA
Time taken: 1.302 seconds, Fetched: 21 row(s)
hive (t1)> create table student_ctas as select * from student where id < 95012;
Query ID = hadoop_20190611151034_31ae197c-8ad7-4e81-9d23-ce78cfbbb1e5
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1560231670908_0001, Tracking URL = http://localhost:8088/proxy/application_1560231670908_0001/
Kill Command = /usr/local/hadoop/bin/mapred job -kill job_1560231670908_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-06-11 15:10:46,629 Stage-1 map = 0%, reduce = 0%
2019-06-11 15:10:52,875 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.17 sec
MapReduce Total cumulative CPU time: 3 seconds 170 msec
Ended Job = job_1560231670908_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://city:9012/user/hive/warehouse/t1.db/.hive-staging_hive_2019-06-11_15-10-34_904_5157590666630521651-1/-ext-10002
Moving data to directory hdfs://city:9012/user/hive/warehouse/t1.db/student_ctas
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 3.17 sec HDFS Read: 6199 HDFS Write: 337 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 170 msec
OK
student.id student.name student.sex student.age student.department
Time taken: 20.827 seconds
hive (t1)> select * from student_ctas;
OK
student_ctas.id student_ctas.name student_ctas.sex student_ctas.agestudent_ctas.department
95002 刘晨 女 19 IS
95003 王敏 女 22 MA
95004 张立 男 19 IS
95010 孔小涛 男 19 CS
95005 刘刚 男 18 MA
95006 孙庆 男 23 CS
95007 易思玲 女 19 MA
95008 李娜 女 18 CS
95001 李勇 男 20 CS
95011 包小柏 男 18 MA
95009 梦圆圆 女 18 MA
Time taken: 0.165 seconds, Fetched: 11 row(s)
上述的CTAS创建表在MySQL中也是适用,如下所示:
mysql> create table student_ctas as select * from student where id<95012;
Query OK, 11 rows affected (0.51 sec)
Records: 11 Duplicates: 0 Warnings: 0mysql> select * from student_ctas;
+-------+-----------+------+------+------------+
| id | name | sex | age | department |
+-------+-----------+------+------+------------+
| 95002 | 刘晨 | 女 | 19 | IS |
| 95003 | 王敏 | 女 | 22 | MA |
| 95004 | 张立 | 男 | 19 | IS |
| 95010 | 孔小涛 | 男 | 19 | CS |
| 95005 | 刘刚 | 男 | 18 | MA |
| 95006 | 孙庆 | 男 | 23 | CS |
| 95007 | 易思玲 | 女 | 19 | MA |
| 95008 | 李娜 | 女 | 18 | CS |
| 95001 | 李勇 | 男 | 20 | CS |
| 95011 | 包小柏 | 男 | 18 | MA |
| 95009 | 梦圆圆 | 女 | 18 | MA |
+-------+-----------+------+------+------------+
11 rows in set (0.00 sec)
复制表结构
在MySQL中也有一样的语法。
create table a like student;
7.表详情及表操作
7.1 表详情
show tables in t1;
OK
tab_name
a
student
student_bck
student_ctas
student_ext
student_ptn
test
Time taken: 0.018 seconds, Fetched: 7 row(s)
匹配:查看以xxx开头或结尾的所有表。
查看以student开头的所有表:
hive (t1)> show tables like 'student*';
OK
tab_name
student
student_bck
student_ctas
student_ext
student_ptn
Time taken: 0.011 seconds, Fetched: 5 row(s)
查看以dent结尾的所有表:
hive (t1)> show tables like '*dent';
OK
tab_name
student
Time taken: 0.017 seconds, Fetched: 1 row(s)
表详情1:desc xxx
。
表详情2:
hive (t1)> desc formatted student;
OK
col_name data_type comment
# col_name data_type comment
id int
name string
sex string
age int
department string # Detailed Table Information
Database: t1
OwnerType: USER
Owner: hadoop
CreateTime: Tue Jun 11 15:09:13 CST 2019
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://city:9012/user/hive/warehouse/t1.db/student
Table Type: MANAGED_TABLE
...
...
...
表详情3:
hive (t1)> desc extended student;
OK
col_name data_type comment
id int
name string
sex string
age int
department string Detailed Table Information Table(tableName:student, dbName:t1, owner:hadoop, createTime:1560236953, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null), FieldSchema(name:sex, type:string, comment:null), FieldSchema(name:age, type:int, comment:null), FieldSchema(name:department, type:string, comment:null)], location:hdfs://city:9012/user/hive/warehouse/t1.db/student, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=,, field.delim=,}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=504, numRows=0, rawDataSize=0, numFiles=1, transient_lastDdlTime=1560236957, bucketing_version=2}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false, catName:hive, ownerType:USER)
Time taken: 0.049 seconds, Fetched: 7 row(s)
7.2 表操作
对比MySQL与Hive修改表名
在Hive中必须有to
,而MySQL中可以不要!
alter table student rename to ab;
添加字段
alter table student add columns(sc int)
MySQL中columns
不能加s。
alter table ba add column(sc1 int,sc2 int);
修改字段的定义
alter table student change name name1 string;
MySQL语法同上。
删除一个字段不支持!
但可以通过replace
替换掉!
替换所有字段
删除name
列:
hive (t1)> select * from student;
OK
student.id student.address student.name
95002 刘晨 女
95017 王风娟 女
95018 王一 女
95013 冯伟 男
95014 王小丽 女
95019 邢小丽 女
95020 赵钱 男
95003 王敏 女
95004 张立 男
95012 孙花 女
95010 孔小涛 男
95005 刘刚 男
95006 孙庆 男
95007 易思玲 女
95008 李娜 女
95021 周二 男
95022 郑明 男
95001 李勇 男
95011 包小柏 男
95009 梦圆圆 女
95015 王君 男
Time taken: 0.159 seconds, Fetched: 21 row(s)
hive (t1)> alter table student replace columns (id int,address string);
OK
Time taken: 0.25 seconds
hive (t1)> select * from student;
OK
student.id student.address
95002 刘晨
95017 王风娟
95018 王一
95013 冯伟
95014 王小丽
95019 邢小丽
95020 赵钱
95003 王敏
95004 张立
95012 孙花
95010 孔小涛
95005 刘刚
95006 孙庆
95007 易思玲
95008 李娜
95021 周二
95022 郑明
95001 李勇
95011 包小柏
95009 梦圆圆
95015 王君
Time taken: 0.162 seconds, Fetched: 21 row(s)
删除表:
drop table xxx
清空表:
truncate table xxx;
8.分区
8.1 查看分区
查看分区信息
hive (t1)> show partitions student_ptn;
OK
partition
city=beijing
city=shenzhen
Time taken: 0.103 seconds, Fetched: 2 row(s)
8.2 添加分区
添加静态分区
hive (t1)> alter table student_ptn add partition(city="sc") partition(city="bj");
OK
Time taken: 0.722 seconds
hive (t1)> show partitions student_ptn;
OK
partition
city=beijing
city=bj
city=cq
city=sc
city=shenzhen
Time taken: 0.073 seconds, Fetched: 5 row(s)
添加动态分区
Hive导入数据成功,但是查询结果为NULL,且未报错。
这是因为创建表格时没有对导入的数据格式没有处理,比如每行数据以tab键隔开,以换行键结尾,就要以如下语句创建表格。
比如:当时创建的分区表语句:
create external table student_ptn(id int, name string, sex string, age int,department string) partitioned by (city string) location "/hive/student_ptn";
应该改为:
create external table student_ptn(id int, name string, sex string, age int,department string) partitioned by (city string) row format delimited fields terminated by "," location "/hive/student_ptn";
先向student_ptn表中插入数据:
load data local inpath "/home/light/mysql/student.txt" into table student_ptn partition(city="cq");
其中where city='cq'
正是分区字段。如果不加where筛选,直接select *
那么会返回所有分区结果信息。
hive (t1)> select * from student_ptn where city='cq';
OK
student_ptn.id student_ptn.name student_ptn.sex student_ptn.age student_ptn.department student_ptn.city
95002 刘晨 女 19 IS cq
95017 王风娟 女 18 IS cq
95018 王一 女 19 IS cq
95013 冯伟 男 21 CS cq
95014 王小丽 女 19 CS cq
95019 邢小丽 女 19 IS cq
95020 赵钱 男 21 IS cq
95003 王敏 女 22 MA cq
95004 张立 男 19 IS cq
95012 孙花 女 20 CS cq
95010 孔小涛 男 19 CS cq
95005 刘刚 男 18 MA cq
95006 孙庆 男 23 CS cq
95007 易思玲 女 19 MA cq
95008 李娜 女 18 CS cq
95021 周二 男 17 MA cq
95022 郑明 男 20 MA cq
95001 李勇 男 20 CS cq
95011 包小柏 男 18 MA cq
95009 梦圆圆 女 18 MA cq
95015 王君 男 18 MA cq
Time taken: 0.21 seconds, Fetched: 21 row(s)
现在我把这张表的内容直接插入到另一张表student_ptn_age中,并实现age为动态分区(不指定到底是哪中性别,让系统自己分配决定)。
create table student_ptn_age(id int,name string,sex string,department string) partitioned by (age int);
从student_ptn表中查询数据并插入student_ptn_age表中
hive (t1)> insert overwrite table student_ptn_age partition(age) select id,name,sex,department,age from student_ptn;
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
此时报错了,需要在当前命令窗口处输入:
hive (t1)> set hive.exec.dynamic.partition=true;
hive (t1)> set hive.exec.dynamic.partition.mode=nostrick;
然后再运行上述插入命令即可。
8.3 修改分区
修改分区一般来说就是修改分区的数据存储目录。
在添加分区的时候,直接指定当前分区的数据存储目录
alter table student_ptn add if not exists partition(city='beijing') location '/student_ptn_beijing' partition(city='cc') location '/student_cc';
修改已经指定好的分区的数据存储目录:
alter table student_ptn partition (city='beijing') set location '/student_ptn_beijing';
此时原先的分区文件夹仍存在,但是在往分区添加数据时,只会添加到新的分区目录
load data local inpath "/home/light/mysql/student.txt" into table student_ptn partition(city="beijing");
8.4 删除分区
hive (t1)> alter table student_ptn drop partition (city='beijing');
Dropped the partition city=beijing
OK
Time taken: 0.459 seconds
hive (t1)> show partitions student_ptn;
OK
partition
city=cc
city=cq
Time taken: 0.075 seconds, Fetched: 2 row(s)
最后,学习参考了:https://www.cnblogs.com/qingyunzong/p/8723271.html