HBase – Hadoop Database,是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统,利用HBase技术可在廉价PC Server上搭建起大规模结构化存储集群。与FUJITSU Cliq等商用大数据产品不同,HBase是Google Bigtable的开源实现,类似Google Bigtable利用GFS作为其文件存储系统,HBase利用Hadoop HDFS作为其文件存储系统;Google运行MapReduce来处理Bigtable中的海量数据,HBase同样利用Hadoop MapReduce来处理HBase中的海量数据;Google Bigtable利用 Chubby作为协同服务,HBase利用Zookeeper作为对应。
HBase是一个分布式、面向列的开源数据库,是Apache顶层项目,适用于非结构化数据存储的数据库。在Hadoop家族中,很多产品为HBase提供服务:
- Hadoop HDFS为HBase提供了高可靠性的底层存储支持;
- Hadoop MapReduce为HBase提供了高性能的计算能力;
- Zookeeper为HBase提供了稳定服务和failover机制;
- Pig和Hive为HBase提供了高层语言支持,使得在HBase上进行数据统计处理变的非常简单;
- Sqoop为HBase提供了方便的RDBMS数据导入功能,使得传统数据库数据向HBase中迁移变的非常方便。
1 安装
1.1 下载、解压
从这里找最新的稳定版下载,本文使用的是hbase-0.98.6.1-hadoop2-bin.tar.gz。
解压缩,然后进入到那个要解压的目录:
$ tar xzvf hbase-0.98.6.1-hadoop2-bin.tar.gz
$ cd hbase-0.98.6.1-hadoop2/
1.2 简单配置
这一步可以选择跳过。
此处需要配置的是$HBASE_HOME/conf/hbase-site.xml
中的hbase.rootdir
,即HBase保存数据的目录。如果不进行配置,默认hbase.rootdir
指向/tmp/hbase-${user.name},因为系统重启时会清理/tmp目录,所以重启后会丢失数据。如果是在分布式模式部署中,需要提供的是HDFS上的目录位置。
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:/home/lxh/hadoop/hbase</value>
</property>
</configuration>
2 启动HBase
直接使用start-hbase.sh
脚本启动
$ ./bin/start-hbase.sh
启动正常时,在$HBASE_HOME/logs/hbase-lxh-master-ubuntu.log
日志的中会提示下面内容:
2014-10-14 09:47:07,189 INFO [M:0;ubuntu:40435] master.HMaster: Master has completed initialization
通过jps
查询进程,会发现多了HMaster
这个进程:
2694 HMaster
3 初探HBase
3.1 启动shell
进入HBase提供的shell中进行测试。
$ ./bin/hbase shell
2014-10-14 10:14:55,859 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.98.6.1-hadoop2, r96a1af660b33879f19a47e9113bf802ad59c7146, Sun Sep 14 21:27:25 PDT 2014
hbase(main):001:0>
3.2 查看帮助
通过键入help命令查看在HBase的shell中的命令。
hbase(main):001:0> help
HBase Shell, version 0.98.6.1-hadoop2, r96a1af660b33879f19a47e9113bf802ad59c7146, Sun Sep 14 21:27:25 PDT 2014
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.
COMMAND GROUPS:
Group name: general
Commands: status, table_help, version, whoami
Group name: ddl
Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, show_filters
Group name: namespace
Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables
Group name: dml
Commands: append, count, delete, deleteall, get, get_counter, incr, put, scan, truncate, truncate_preserve
Group name: tools
Commands: assign, balance_switch, balancer, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, flush, hlog_roll, major_compact, merge_region, move, split, trace, unassign, zk_dump
Group name: replication
Commands: add_peer, disable_peer, enable_peer, list_peers, list_replicated_tables, remove_peer, set_peer_tableCFs, show_peer_tableCFs
Group name: snapshots
Commands: clone_snapshot, delete_snapshot, list_snapshots, rename_snapshot, restore_snapshot, snapshot
Group name: security
Commands: grant, revoke, user_permission
Group name: visibility labels
Commands: add_labels, clear_auths, get_auths, set_auths, set_visibility
SHELL USAGE:
Quote all names in HBase Shell such as table and column names. Commas delimit
command parameters. Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:
{'key1' => 'value1', 'key2' => 'value2', ...}
and are opened and closed with curley-braces. Key/values are delimited by the
'=>' character combination. Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type
'Object.constants' to see a (messy) list of all constants in the environment.
If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:
hbase> get 't1', "key\x03\x3f\xcd"
hbase> get 't1', "key\003\023\011"
hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"
The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/docs/current/book.html
3.3 create创建表
首先创建一个名为test
的表,这个表只有一个列族为cf
。可以通过list
命令列出所有的表来检查创建情况。
hbase(main):002:0> create 'test', 'cf'
0 row(s) in 0.4330 seconds
=> Hbase::Table - test
hbase(main):003:0> list
TABLE
test
1 row(s) in 0.0590 seconds
=> ["test"]
3.4 put插入数据
test
表已经创建成功,通过put 'table', 'row', 'col-pre:col-name', 'value'
向其中插入数据。table
至表名,row
指每行的键key,col-pre
是列族前缀,col-name
是列名,列族前缀与列名之间通过冒号隔开,value
是值value。
hbase(main):005:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.1380 seconds
hbase(main):006:0> put 'test', 'row2', 'cf:b', 'value2-b'
0 row(s) in 0.0130 seconds
hbase(main):007:0> put 'test', 'row2', 'cf:c', 'value2-c'
0 row(s) in 0.0100 seconds
hbase(main):008:0> put 'test', 'row3', 'cf', 'value3'
0 row(s) in 0.0110 seconds
hbase(main):011:0> put 'test', 'row3', 'cf:e', 'value3-e'
0 row(s) in 0.0060 seconds
3.5 scan扫描全表
通过scan 'table'
命令查询表test
的数据:
hbase(main):012:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1413253976039, value=value1
row2 column=cf:b, timestamp=1413253980776, value=value2-b
row2 column=cf:c, timestamp=1413253985691, value=value2-c
row3 column=cf:, timestamp=1413253990953, value=value3
row3 column=cf:e, timestamp=1413254206302, value=value3-e
3 row(s) in 0.0430 seconds
3.6 get查询某一行
通过get 'table', 'row'
命令查询某一行数据:
hbase(main):013:0> get 'test', 'row1'
COLUMN CELL
cf:a timestamp=1413253976039, value=value1
1 row(s) in 0.0150 seconds
hbase(main):014:0> get 'test', 'row2'
COLUMN CELL
cf:b timestamp=1413253980776, value=value2-b
cf:c timestamp=1413253985691, value=value2-c
2 row(s) in 0.0120 seconds
hbase(main):015:0> get 'test', 'row3'
COLUMN CELL
cf: timestamp=1413253990953, value=value3
cf:e timestamp=1413254206302, value=value3-e
2 row(s) in 0.0050 seconds
3.7 disable使表无效
disable 'table'
命令可以使表无效,表并没有删除,但是不能进行查询等操作。
hbase(main):017:0> disable 'test'
0 row(s) in 1.4850 seconds
如果此时再通过get 'table', 'row'
查询,则会报错:
hbase(main):018:0> get 'test', 'row3'
COLUMN CELL
ERROR: test is disabled.
3.8 enable使表有效
对于无效的表,可以使用enable 'table'
命令使其有效,此时可以进行一系列对表的操作:
hbase(main):020:0> enable 'test'
0 row(s) in 0.5540 seconds
hbase(main):021:0> get 'test', 'row3'
COLUMN CELL
cf: timestamp=1413253990953, value=value3
cf:e timestamp=1413254206302, value=value3-e
2 row(s) in 0.0160 seconds
3.9 drop删除表
drop 'table'
命令可以删除表,该表必须是无效的表,即通过disable 'table'
命令操作的表。
hbase(main):030:0> drop 'test'
0 row(s) in 0.2300 seconds
3.10 关闭shell
与其他shell类似,退出shell的命令是exit
hbase(main):031:0> exit
4 停止HBase
直接使用脚本stop-hbase.sh
停止。
$ ./bin/stop-hbase.sh
stopping hbase....................