What is Hadoop
Two components plus projects
- Open-source data storage: HDFS(Hadoop File System)
- Precessing API: MapReduce
- Other projects/libraries: HBase, Hive, Pig, etc.
Hadoop Distributions
-
- Open Source
- Apache Hadoop
- Commercial
- Cloudera
- Hortonworks
- MapR
- Open Source
- Cloud
-
- AWS
- Windows Azure
-
Why Use Hadoop
- Cheaper: Scalues to petabytes or more
- Faster Parallel data processing
- Better Suited for particular type of ‘Big Data’
Hadoop Business Problems
- Rish modeling
- Customer churn analysis
- Recommendation engine
- Ad targeting
- Transactional analysis
- Treat analysis
- Search quality