Hadoop Note 1: Database Choices

Fisrt class of Hadoop

Understanding RDBMS Limits

  1. Scalability
  2. Speed
  3. Queryability
  4. Sophisticated processing

Database Choices

  1. File Systems
  • Other fields
  • HDFS (Hadoop Distributed File System)
  1. Databases
  • NoSQL(key/value, columnstore, etc.)
    * Some implementations of portions of the Haddop ecosystem could be categorized as NoSQL, Hadoop itself is not a database.
  • RDBMS(SQL Server, Oracle, MySQL)
    * Hadoop is designed to solve specific data set that RDBMS not to solve, so they cannot replace with each other.

Hadoop and HBase

  • Hadoop uses an alternative file system (HDFS)
  • HBase is a NoSQL database(wide columnstore)
    In HBase, data was saved as key value paires, where you can have any number of columns (keys), each of which has a value. (And, technically, each of which can have multiple values with different timestamps).

CAP Theory

  1. Consistency A.K.A Transactions
  2. Availability
  • Up-time
  1. Partitioning

– Scalability
* Hard for RDBMS

Where Hadoop Fits

  1. Scalability(Partitioning)
  • Commodity hardware for data storage. Easy for multi changable servers.
  1. Flexibility(Availability)

– Commodity hardware for distributed processing.
* Initially, Consistency is not what Hadoop be exactly designed for.
3. LOB(Line of Business)- Usually transactional so not a good fit for Haddop
4. Behavioral Data good fit for Haddop
– This kind of data is processed as a group rather than individually-queried. e.g.Healthcare.

Leave a Reply

Your email address will not be published. Required fields are marked *