HBase – Hadoop Database,是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统,利用HBase技术可在廉价PC Server上搭建起大规模结构化存储集群。与FUJITSU Cliq等商用大数据产品不同,HBase是Google Bigtable的开源实现,类似Google Bigtable利用GFS作为其文件存储系统,HBase利用Hadoop HDFS作为其文件存储系统;Google运行MapReduce来处理Bigtable中的海量数据,HBase同样利用Hadoop MapReduce来处理HBase中的海量数据;Google Bigtable利用 Chubby作为协同服务,HBase利用Zookeeper作为对应。

hbase环境搭建

  HBase是一个分布式、面向列的开源数据库,是Apache顶层项目,适用于非结构化数据存储的数据库。在Hadoop家族中,很多产品为HBase提供服务:

  • Hadoop HDFS为HBase提供了高可靠性的底层存储支持;
  • Hadoop MapReduce为HBase提供了高性能的计算能力;
  • Zookeeper为HBase提供了稳定服务和failover机制;
  • Pig和Hive为HBase提供了高层语言支持,使得在HBase上进行数据统计处理变的非常简单;
  • Sqoop为HBase提供了方便的RDBMS数据导入功能,使得传统数据库数据向HBase中迁移变的非常方便。

1 安装

1.1 下载、解压

这里找最新的稳定版下载,本文使用的是hbase-0.98.6.1-hadoop2-bin.tar.gz。

解压缩,然后进入到那个要解压的目录:

$ tar xzvf hbase-0.98.6.1-hadoop2-bin.tar.gz
$ cd hbase-0.98.6.1-hadoop2/

1.2 简单配置

这一步可以选择跳过。

此处需要配置的是$HBASE_HOME/conf/hbase-site.xml中的hbase.rootdir,即HBase保存数据的目录。如果不进行配置,默认hbase.rootdir指向/tmp/hbase-${user.name},因为系统重启时会清理/tmp目录,所以重启后会丢失数据。如果是在分布式模式部署中,需要提供的是HDFS上的目录位置。

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>file:/home/lxh/hadoop/hbase</value>
  </property>
</configuration>

2 启动HBase

直接使用start-hbase.sh脚本启动

$ ./bin/start-hbase.sh

启动正常时,在$HBASE_HOME/logs/hbase-lxh-master-ubuntu.log日志的中会提示下面内容:

2014-10-14 09:47:07,189 INFO  [M:0;ubuntu:40435] master.HMaster: Master has completed initialization

通过jps查询进程,会发现多了HMaster这个进程:

2694 HMaster

3 初探HBase

3.1 启动shell

进入HBase提供的shell中进行测试。

$ ./bin/hbase shell
2014-10-14 10:14:55,859 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.98.6.1-hadoop2, r96a1af660b33879f19a47e9113bf802ad59c7146, Sun Sep 14 21:27:25 PDT 2014
hbase(main):001:0>

3.2 查看帮助

通过键入help命令查看在HBase的shell中的命令。

hbase(main):001:0> help
HBase Shell, version 0.98.6.1-hadoop2, r96a1af660b33879f19a47e9113bf802ad59c7146, Sun Sep 14 21:27:25 PDT 2014
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.

COMMAND GROUPS:
  Group name: general
  Commands: status, table_help, version, whoami

  Group name: ddl
  Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, show_filters

  Group name: namespace
  Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables

  Group name: dml
  Commands: append, count, delete, deleteall, get, get_counter, incr, put, scan, truncate, truncate_preserve

  Group name: tools
  Commands: assign, balance_switch, balancer, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, flush, hlog_roll, major_compact, merge_region, move, split, trace, unassign, zk_dump

  Group name: replication
  Commands: add_peer, disable_peer, enable_peer, list_peers, list_replicated_tables, remove_peer, set_peer_tableCFs, show_peer_tableCFs

  Group name: snapshots
  Commands: clone_snapshot, delete_snapshot, list_snapshots, rename_snapshot, restore_snapshot, snapshot

  Group name: security
  Commands: grant, revoke, user_permission

  Group name: visibility labels
  Commands: add_labels, clear_auths, get_auths, set_auths, set_visibility

SHELL USAGE:
Quote all names in HBase Shell such as table and column names.  Commas delimit
command parameters.  Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:

  {'key1' => 'value1', 'key2' => 'value2', ...}

and are opened and closed with curley-braces.  Key/values are delimited by the
'=>' character combination.  Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc.  Constants do not need to be quoted.  Type
'Object.constants' to see a (messy) list of all constants in the environment.

If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:

  hbase> get 't1', "key\x03\x3f\xcd"
  hbase> get 't1', "key\003\023\011"
  hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"

The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/docs/current/book.html

3.3 create创建表

首先创建一个名为test的表,这个表只有一个列族为cf。可以通过list命令列出所有的表来检查创建情况。

hbase(main):002:0> create 'test', 'cf'
0 row(s) in 0.4330 seconds
=> Hbase::Table - test

hbase(main):003:0> list
TABLE                                                                                                                                          
test                                                                                                                                           
1 row(s) in 0.0590 seconds
=> ["test"]

3.4 put插入数据

test表已经创建成功,通过put 'table', 'row', 'col-pre:col-name', 'value'向其中插入数据。table至表名,row指每行的键key,col-pre是列族前缀,col-name是列名,列族前缀与列名之间通过冒号隔开,value是值value。

hbase(main):005:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.1380 seconds

hbase(main):006:0> put 'test', 'row2', 'cf:b', 'value2-b'
0 row(s) in 0.0130 seconds

hbase(main):007:0> put 'test', 'row2', 'cf:c', 'value2-c'
0 row(s) in 0.0100 seconds

hbase(main):008:0> put 'test', 'row3', 'cf', 'value3'
0 row(s) in 0.0110 seconds

hbase(main):011:0> put 'test', 'row3', 'cf:e', 'value3-e'
0 row(s) in 0.0060 seconds

3.5 scan扫描全表

通过scan 'table'命令查询表test的数据:

hbase(main):012:0> scan 'test'
ROW                                  COLUMN+CELL                                                                                               
 row1                                column=cf:a, timestamp=1413253976039, value=value1                                                        
 row2                                column=cf:b, timestamp=1413253980776, value=value2-b                                                      
 row2                                column=cf:c, timestamp=1413253985691, value=value2-c                                                      
 row3                                column=cf:, timestamp=1413253990953, value=value3                                                         
 row3                                column=cf:e, timestamp=1413254206302, value=value3-e                                                      
3 row(s) in 0.0430 seconds

3.6 get查询某一行

通过get 'table', 'row'命令查询某一行数据:

hbase(main):013:0> get 'test', 'row1'
COLUMN                               CELL                                                                                                      
 cf:a                                timestamp=1413253976039, value=value1                                                                     
1 row(s) in 0.0150 seconds

hbase(main):014:0> get 'test', 'row2'
COLUMN                               CELL                                                                                                      
 cf:b                                timestamp=1413253980776, value=value2-b                                                                   
 cf:c                                timestamp=1413253985691, value=value2-c                                                                   
2 row(s) in 0.0120 seconds

hbase(main):015:0> get 'test', 'row3'
COLUMN                               CELL                                                                                                      
 cf:                                 timestamp=1413253990953, value=value3                                                                     
 cf:e                                timestamp=1413254206302, value=value3-e                                                                   
2 row(s) in 0.0050 seconds

3.7 disable使表无效

disable 'table'命令可以使表无效,表并没有删除,但是不能进行查询等操作。

hbase(main):017:0> disable 'test'
0 row(s) in 1.4850 seconds

如果此时再通过get 'table', 'row'查询,则会报错:

hbase(main):018:0> get 'test', 'row3'
COLUMN                               CELL                                                                                                      

ERROR: test is disabled.

3.8 enable使表有效

对于无效的表,可以使用enable 'table'命令使其有效,此时可以进行一系列对表的操作:

hbase(main):020:0> enable 'test'
0 row(s) in 0.5540 seconds

hbase(main):021:0> get 'test', 'row3'
COLUMN                               CELL                                                                                                      
 cf:                                 timestamp=1413253990953, value=value3                                                                     
 cf:e                                timestamp=1413254206302, value=value3-e                                                                   
2 row(s) in 0.0160 seconds

3.9 drop删除表

drop 'table'命令可以删除表,该表必须是无效的表,即通过disable 'table'命令操作的表。

hbase(main):030:0> drop 'test'
0 row(s) in 0.2300 seconds

3.10 关闭shell

与其他shell类似,退出shell的命令是exit

hbase(main):031:0> exit

4 停止HBase

直接使用脚本stop-hbase.sh停止。

$ ./bin/stop-hbase.sh 
stopping hbase....................