Hadoop Features in HDFS and MR across releases
News on releases : Oct 2012 :
Common
A set of components and interfaces for distributed filesystems and general I/O
(serialization, Java RPC, persistent data structures).
Avro
A serialization system for efficient, cross-language RPC and persistent data
storage.
MapReduce
A distributed data processing model and execution environment that runs on large
clusters of commodity machines.
HDFS
A distributed filesystem that runs on large clusters of commodity machines.
Pig
A data flow language and execution environment for exploring very large datasets.
Pig runs on HDFS and MapReduce clusters.
Hive
A distributed data warehouse. Hive manages data stored in HDFS and provides a
query language based on SQL (and which is translated by the run time engine to
MapReduce jobs) for querying the data.
HBase
A distributed, column-oriented database. HBase uses HDFS for its underlying
storage, and supports both batch-style computations using MapReduce and point
queries (random reads).
ZooKeeper
A distributed, highly available coordination service. ZooKeeper provides primitives
such as distributed locks that can be used for building distributed applications.
Sqoop
A tool for efficient bulk transfer of data between structured data stores (such as
relational databases) and HDFS.
Oozie
A service for running and scheduling workflows of Hadoop jobs (including Map-
Reduce, Pig, Hive, and Sqoop jobs).
Note referenece : Hadoop The Definitive GUIDE (3rd Edition) & Hadoop
News on releases : Oct 2012 :
9 October, 2012: Release 2.0.2-alpha available
This is the second (alpha) version in the hadoop-2.x series.
This delivers significant enhancements to HDFS HA. Also it has a significantly more stable version of YARN which, at the time of release, has already been deployed on a 2000 node cluster.
Please see the Hadoop 2.0.2-alpha Release Notes for details.
Latest Hadoop : http://hadoop.apache.org/docs/current/ : 2.0.2
Latest Stable Release : http://hadoop.apache.org/docs/stable/ : 1.0.4
Common
A set of components and interfaces for distributed filesystems and general I/O
(serialization, Java RPC, persistent data structures).
Avro
A serialization system for efficient, cross-language RPC and persistent data
storage.
MapReduce
A distributed data processing model and execution environment that runs on large
clusters of commodity machines.
HDFS
A distributed filesystem that runs on large clusters of commodity machines.
Pig
A data flow language and execution environment for exploring very large datasets.
Pig runs on HDFS and MapReduce clusters.
Hive
A distributed data warehouse. Hive manages data stored in HDFS and provides a
query language based on SQL (and which is translated by the run time engine to
MapReduce jobs) for querying the data.
HBase
A distributed, column-oriented database. HBase uses HDFS for its underlying
storage, and supports both batch-style computations using MapReduce and point
queries (random reads).
ZooKeeper
A distributed, highly available coordination service. ZooKeeper provides primitives
such as distributed locks that can be used for building distributed applications.
Sqoop
A tool for efficient bulk transfer of data between structured data stores (such as
relational databases) and HDFS.
Oozie
A service for running and scheduling workflows of Hadoop jobs (including Map-
Reduce, Pig, Hive, and Sqoop jobs).
Note referenece : Hadoop The Definitive GUIDE (3rd Edition) & Hadoop
No comments:
Post a Comment