接上文:大数据-Hadoop+Hive+Spark集群安装(二)
一、Hive命令:
beeline
!connect jdbc:hive2://localhost:10000
用户名密码登陆 hadoop,123456789
见hive-site.xml配置
本地直接Hive命令
建表:
CREATE TABLE goods_ids(
id string,
goods_id BIGINT,
source int,
cat_id_1 int,
cat_id_2 int,
cat_id_3 int,
cat_id_4 int,
cat_name_1 string,
cat_name_2 string,
cat_name_3 string,
cat_name_4 string,
category string
) row format delimited fields terminated by "," stored as orc;
orc是压缩快,快速列存取 ,效率比rcfile高,是rcfile的改良版本,加上红色部分即可。如果不加默认为TextFile格式。
如果是textfile格式,可以直接将txt数据文件导入到表中,导入命令:
load data local inpath '/data/data/goods_ids.txt' into table goods_ids1;
可以建一个orc的表,再建一个textfile的表goods_ids1,把文件导入goods_ids,在通过sql将数据同步到orc表中,删除textfile表:insert into goods_ids select * from goods_ids1 。
索引:
create index goods_ids_goodsid_index on table goods_ids (goods_id)
as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
with deferred rebuild
in table goods_ids_goodsid_index_table; 创建
alter index goods_ids_goodsid_index on goods_ids rebuild; rebuild成功
show index on goods_ids; 查看
drop index goods_ids_goodsid_index on goods_ids; 删除
如果启动下面的错误:修改mysql hivedb VERSION表,SCHEMA_VERSION值改成2.3.0
Exception in thread "main" MetaException(message:Hive Schema version 2.3.0 does not match metastore's schema version 1.2.0 Metastore is not upgraded or corrupt)
或者(永久性办法)
hive-site.xml补充:
二、优化-1
大字段的分区代替索引,索引用起来比较繁琐,新增数据索引表也不会自动变化,需要重新构建。
大字段分区可以会根据虚拟字段进行分文件夹存储。
CREATE TABLE goods_info_daysales(
id string,
goods_id BIGINT,
goods_name string,
sales int,
`date` date
) PARTITIONED BY (day string) row format delimited fields terminated by "," stored as orc;
虚拟day分区字段。
insert into goods_info_daysales partition(day='2020-08-24') select * from goods_info_daysales1 where `date`='2020-08-24';
select * from goods_info_daysales where day='2020-08-24';查询速度根据分区条件进行定位,增加查询速度。
三、运维
-- 删除库drop database if exists db_name;
-- 强制删除库drop database if exists db_name cascade;
-- 删除表drop table if exists employee;
-- 清空表truncate table employee;
-- 清空表,第二种方式insert overwrite table employee select * from employee where 1=0;
-- 删除分区alter table employee_table drop partition (stat_year_month>='2018-01');
-- 按条件删除数据insert overwrite table employee_table select * from employee_table where id>'180203a15f';
四、java
java -cp "yourFile.jar:jars/*" package.className
五、参考配置
注意:本文归作者所有,未经作者允许,不得转载