(Hadoop) analysis of Hive Meta Store Entity Using A Hook Function

Currently, I was working to implement a project to save metadata of Hive Program. Basically, I keep A SQL database to save and update the metadata of every execution of HQL sentence with a internal hook.

Basically, There are two important Set of Entities Class:
Set inputs and Set outputs

In this article, I only introduce the data inside of those Entities above, the exact structure of Hive program will be introduced in another article.

In the source code of org.apache.hadoop.hive.ql.plan.HiveOperation, you can found tens of different hive operation. For our goal, a metadata store system, I only care about those operation related to metadata.

Notice:
EXPLAIN AUTHORIZATION commend can show INPUTS, OUTPUTS, CURRENT_USER and OPERATION.

  1. CREATETABLE

input: null, or location if set location while create table.
output: new table, current database
log: operation is CREATETABLE,inputs :[],outputs:[db@tml_2, database:db]

  1. DROPTABLE

input: deleted table
output: deleted table
log: operation is DROPTABLE,inputs :[db@tml_1],outputs:[db@tml_1]

  1. ALTERTABLE_RENAME

input: old table
output: old table, new table
log: operation is ALTERTABLE_RENAME,inputs :[db@tml_2],outputs:[db@tml_2, db@tml_3]

  1. ALTERTABLE_RENAMECOL

input: null
output: new table
log: operation is ALTERTABLE_RENAMECOL,inputs :[],outputs:[db@tml_3]

  1. ALTERTABLE_REPLACECOLS

input: null
output: new table
log: operation is ALTERTABLE_RENAMECOL,inputs :[],outputs:[db@tml_3]

  1. ALTERTABLE_RENAMEPART

input: table, old partition
output: old partition, new partition
log: operation is ALTERTABLE_RENAMEPART,inputs :[db@tml_part, ks_xs@tml_part@dt=2008-08-08/hour=14],outputs:[db@tml_part@dt=2008-08-08/hour=14, db@tml_part@dt=2008-08-08/hour=15]

  1. ALTERPARTITION_LOCATION
    input: partition
    output: location, partition
    log: operation is ALTERPARTITION_LOCATION,inputs :[db@tml_part, db@tml_part@dt=2008-08-08/hour=15],outputs:[viewfs://hadoop-lt-cluster/home/dp/data/userprofile/db.db/tml_part/dt, db@tml_part@dt=2008-08-08/hour=15]

Conclusition

In the org.apache.hadoop.hive.ql.hooks.ENtity, You can found all the Type of Entity.

  /**
   * The type of the entity.
   */
  public static enum Type {
    DATABASE, TABLE, PARTITION, DUMMYPARTITION, DFS_DIR, LOCAL_DIR, FUNCTION
  }

What’s strange is that there is no COLUMN in them. So when we try to catch the operation of add/rename/replace columns, we have to get the data from their parent table.

Besides, we can get meta data easily with specific type.