Apache JMeter Introduction on MAC

Commend Version

You can install by HomeBrew

  1. Install HomeBrew

To install HomeBrew:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

It should take only a couple of minutes. Before installing JMeter, let’s now update HomeBrew package definitions:

brew update

Continue reading “Apache JMeter Introduction on MAC”

(Hadoop) analysis of Hive Meta Store Entity Using A Hook Function

Currently, I was working to implement a project to save metadata of Hive Program. Basically, I keep A SQL database to save and update the metadata of every execution of HQL sentence with a internal hook.

Basically, There are two important Set of Entities Class:
Set inputs and Set outputs

In this article, I only introduce the data inside of those Entities above, the exact structure of Hive program will be introduced in another article.

In the source code of org.apache.hadoop.hive.ql.plan.HiveOperation, you can found tens of different hive operation. For our goal, a metadata store system, I only care about those operation related to metadata.

Notice:
EXPLAIN AUTHORIZATION commend can show INPUTS, OUTPUTS, CURRENT_USER and OPERATION.

  1. CREATETABLE

input: null, or location if set location while create table.
output: new table, current database
log: operation is CREATETABLE,inputs :[],outputs:[db@tml_2, database:db]

  1. DROPTABLE

input: deleted table
output: deleted table
log: operation is DROPTABLE,inputs :[db@tml_1],outputs:[db@tml_1]

  1. ALTERTABLE_RENAME

input: old table
output: old table, new table
log: operation is ALTERTABLE_RENAME,inputs :[db@tml_2],outputs:[db@tml_2, db@tml_3]

  1. ALTERTABLE_RENAMECOL

input: null
output: new table
log: operation is ALTERTABLE_RENAMECOL,inputs :[],outputs:[db@tml_3]

  1. ALTERTABLE_REPLACECOLS

input: null
output: new table
log: operation is ALTERTABLE_RENAMECOL,inputs :[],outputs:[db@tml_3]

  1. ALTERTABLE_RENAMEPART

input: table, old partition
output: old partition, new partition
log: operation is ALTERTABLE_RENAMEPART,inputs :[db@tml_part, ks_xs@tml_part@dt=2008-08-08/hour=14],outputs:[db@tml_part@dt=2008-08-08/hour=14, db@tml_part@dt=2008-08-08/hour=15]

  1. ALTERPARTITION_LOCATION
    input: partition
    output: location, partition
    log: operation is ALTERPARTITION_LOCATION,inputs :[db@tml_part, db@tml_part@dt=2008-08-08/hour=15],outputs:[viewfs://hadoop-lt-cluster/home/dp/data/userprofile/db.db/tml_part/dt, db@tml_part@dt=2008-08-08/hour=15]

Conclusition

In the org.apache.hadoop.hive.ql.hooks.ENtity, You can found all the Type of Entity.

  /**
   * The type of the entity.
   */
  public static enum Type {
    DATABASE, TABLE, PARTITION, DUMMYPARTITION, DFS_DIR, LOCAL_DIR, FUNCTION
  }

What’s strange is that there is no COLUMN in them. So when we try to catch the operation of add/rename/replace columns, we have to get the data from their parent table.

Besides, we can get meta data easily with specific type.

Some experience about upload/download files on HDFS by JAVA Spring framework

  1. Using Stream method to transfer data

At first, I tried to use a easy way to transfer files, read them into memory as binary arrays, and then upload to HDFS. However, I met problem with java.lang.OutOfMemoryError: Java heap space. The reason is that I use some redundant function like File.getBytes(), which occupied too much memory.

Then I fixed the problem by using double stream. From Client(browser) to Web Server, and from Web Server to HDFS Server.

  • From Browser To Web Server

We can use the normal way, MultipartFile file to upload file.

The file contents are either stored in memory or temporarily on disk. In either case, the user is responsible for copying file contents to a session-level or persistent store as and if desired. The temporary storage will be cleared at the end of request processing.

Developer can config the threshold to decide if the temporary file is saved in memory or in disk by modify proterty file like following.

spring.http.multipart.file-size-threshold=10MB

So that we can support high level concurrent visit.

  • From Web Server to HDFS Server

After initialized FileSystem by Config Class, The process is easily.

FSDataOutputStream out = fs.create(path, true);
IOUtils.copyBytes(file.getInputStream(), out, 1024, true);
  • From HDFS to Web Server

While downloading file, Hadoop API offers powerful API, only if we input the file path, we can create a InputStream.

FSDataInputStream in = currentVfs.open(srcPath);
  • From Web Server to Browser

Normally, in Spring framework, developer can use Resource Instance to transer file. And kindly, Resource offers a function InputStreamResource(InputStream), so that we can use it to download files from HDFS to browser directly.

  1. File Name Code Problem

This problem is exactly for Chinese developer. Chinese characters will be grabled in Http header. The solution is that change charset to “ISO8859-1”

String fileName = new String(fileName.getBytes(), "ISO-8859-1");

(MongoDB) How to use –fork On Windows for multiple servers

Although for most of developers, deploying database On Windows is not advised. For beginners like me, always need do some practise on Windows.

Currently I am studying replica sets, which asks me to run multiple servers at the same time with following commands.

Continue reading “(MongoDB) How to use –fork On Windows for multiple servers”

Logistic Regression with a Neural Network mindset

Logistic Regression with a Neural Network mindset

Welcome to your first (required) programming assignment! You will build a logistic regression classifier to recognize cats. This assignment will step you through how to do this with a Neural Network mindset, and so will also hone your intuitions about deep learning.
Continue reading “Logistic Regression with a Neural Network mindset”