Hadoop Note 2: Introducing Hadoop

What is Hadoop

Two components plus projects

  • Open-source data storage: HDFS(Hadoop File System)
  • Precessing API: MapReduce
  • Other projects/libraries: HBase, Hive, Pig, etc.

Hadoop Distributions

    1. Open Source
      • Apache Hadoop
    2. Commercial
      • Cloudera
      • Hortonworks
      • MapR
  1. Cloud
      • AWS
      • Windows Azure

     

Why Use Hadoop

  • Cheaper: Scalues to petabytes or more
  • Faster Parallel data processing
  • Better Suited for particular type of ‘Big Data’

Hadoop Business Problems

  • Rish modeling
  • Customer churn analysis
  • Recommendation engine
  • Ad targeting
  • Transactional analysis
  • Treat analysis
  • Search quality

 

Leave a Reply

Your email address will not be published. Required fields are marked *