Understanding RDBMS Limits
- Scalability
- Speed
- Queryability
- Sophisticated processing
Database Choices
- File Systems
- Other fields
- HDFS (Hadoop Distributed File System)
- Databases
- NoSQL(key/value, columnstore, etc.)
* Some implementations of portions of the Haddop ecosystem could be categorized as NoSQL, Hadoop itself is not a database. - RDBMS(SQL Server, Oracle, MySQL)
* Hadoop is designed to solve specific data set that RDBMS not to solve, so they cannot replace with each other.
Hadoop and HBase
- Hadoop uses an alternative file system (HDFS)
- HBase is a NoSQL database(wide columnstore)
In HBase, data was saved as key value paires, where you can have any number of columns (keys), each of which has a value. (And, technically, each of which can have multiple values with different timestamps).
CAP Theory
- Consistency A.K.A Transactions
- Availability
- Up-time
- Partitioning
– Scalability
* Hard for RDBMS
Where Hadoop Fits
- Scalability(Partitioning)
- Commodity hardware for data storage. Easy for multi changable servers.
- Flexibility(Availability)
– Commodity hardware for distributed processing.
* Initially, Consistency is not what Hadoop be exactly designed for.
3. LOB(Line of Business)- Usually transactional so not a good fit for Haddop
4. Behavioral Data good fit for Haddop
– This kind of data is processed as a group rather than individually-queried. e.g.Healthcare.