Overall structure of RabbitMQ

RabbitMQ is an open source message broker middleware created in 2007 and that is now managed by GoPivotal.

Most of the operations are performed in memory. RabbitMQ is not “disk-oriented”:
messages are received by brokers via an exchange (i.e. a logical entry point that will decide based on some criteria in which queue(s) the broker should place a message) and then pushed to the registered consumers. The broker pushes randomly queued messages toward the consumers. They thus receive unordered messages, and do not need to remember anything about the queue state (as messages are unordered and pushed by the brokers. They do not and cannot fetch specific messages on their own). Messages are paged out to disc only if there is no more memory available, or if they are explicitly told to be stored.

Continue reading “Overall structure of RabbitMQ”

(Hadoop) analysis of Hive Meta Store Entity Using A Hook Function

Currently, I was working to implement a project to save metadata of Hive Program. Basically, I keep A SQL database to save and update the metadata of every execution of HQL sentence with a internal hook.

Basically, There are two important Set of Entities Class:
Set inputs and Set outputs

In this article, I only introduce the data inside of those Entities above, the exact structure of Hive program will be introduced in another article.

In the source code of org.apache.hadoop.hive.ql.plan.HiveOperation, you can found tens of different hive operation. For our goal, a metadata store system, I only care about those operation related to metadata.

Notice:
EXPLAIN AUTHORIZATION commend can show INPUTS, OUTPUTS, CURRENT_USER and OPERATION.

  1. CREATETABLE

input: null, or location if set location while create table.
output: new table, current database
log: operation is CREATETABLE,inputs :[],outputs:[db@tml_2, database:db]

  1. DROPTABLE

input: deleted table
output: deleted table
log: operation is DROPTABLE,inputs :[db@tml_1],outputs:[db@tml_1]

  1. ALTERTABLE_RENAME

input: old table
output: old table, new table
log: operation is ALTERTABLE_RENAME,inputs :[db@tml_2],outputs:[db@tml_2, db@tml_3]

  1. ALTERTABLE_RENAMECOL

input: null
output: new table
log: operation is ALTERTABLE_RENAMECOL,inputs :[],outputs:[db@tml_3]

  1. ALTERTABLE_REPLACECOLS

input: null
output: new table
log: operation is ALTERTABLE_RENAMECOL,inputs :[],outputs:[db@tml_3]

  1. ALTERTABLE_RENAMEPART

input: table, old partition
output: old partition, new partition
log: operation is ALTERTABLE_RENAMEPART,inputs :[db@tml_part, ks_xs@tml_part@dt=2008-08-08/hour=14],outputs:[db@tml_part@dt=2008-08-08/hour=14, db@tml_part@dt=2008-08-08/hour=15]

  1. ALTERPARTITION_LOCATION
    input: partition
    output: location, partition
    log: operation is ALTERPARTITION_LOCATION,inputs :[db@tml_part, db@tml_part@dt=2008-08-08/hour=15],outputs:[viewfs://hadoop-lt-cluster/home/dp/data/userprofile/db.db/tml_part/dt, db@tml_part@dt=2008-08-08/hour=15]

Conclusition

In the org.apache.hadoop.hive.ql.hooks.ENtity, You can found all the Type of Entity.

  /**
   * The type of the entity.
   */
  public static enum Type {
    DATABASE, TABLE, PARTITION, DUMMYPARTITION, DFS_DIR, LOCAL_DIR, FUNCTION
  }

What’s strange is that there is no COLUMN in them. So when we try to catch the operation of add/rename/replace columns, we have to get the data from their parent table.

Besides, we can get meta data easily with specific type.

(crawler4j)SLF4J: Failed to load class “org.slf4j.impl.StaticLoggerBinder”.

While using crawler4j to crawl some data, I met errors as following:

Based on the official document I found solution:

https://www.slf4j.org/codes.html#StaticLoggerBinder

Then I downloadedlogback and import logback-classic.jar.

Perfect!