Saturday, November 10, 2012

GOF : Creational 1. Factory Method Design pattern


What is a factory?

  • factory is an object for creating other objects. 
  • It is an abstraction of a constructor, and 
  • can be used to implement various allocation schemes. For example, using this definition, singletons implemented by the singleton pattern are formal factories.
  • A factory object typically has a method for every kind of object it is capable of creating. These methods optionally accept parameters defining how the object is created, and then return the created object.
  • Advance Use: The factory object might decide to create the object's class (if applicable) dynamically, return it from an object pool, do complex configuration on the object, or other things.
  • These kinds of objects have proven useful and several design patterns have been developed to implement them in many languages. For example, several "GoF patterns", like the "Factory method pattern", the "Builder" or even the "Singleton" are implementations of this concept. The "Abstract factory pattern" instead is a method to build collections of factories.
  • Factory objects are common in toolkits and frameworks where library code needs to create objects of types which may be subclassed by applications using the framework. They are also used in test-driven development to allow classes to be put under test.

Use of factory : Where and When

  • Factories determine the actual concrete type of object to be created, 
  • it is here that the object is actually created
  • As the factory only returns an abstract pointer, the client code does not know - and is not burdened by - the actual concrete type of the object which was just created. However, the type of a concrete object is known by the abstract factory. In particular, this means:

  • The client code has no knowledge whatsoever of the concrete type, so it does not need to include any header files or class declarations relating to the concrete type. 
  • The client code deals only with the abstract type. 
  • Objects of a concrete type are indeed created by the factory, but the client code accesses such objects only through their abstract interface.
  • Adding new concrete types is done by modifying the client code to use a different factory, a modification which is typically one line in one file. This is significantly easier than modifying the client code to instantiate a new type, which would require changing every location in the code where a new object is created.

Factory method Pattern 

Factory methods are static methods that return an instance of the native class. Examples in the JDK include :
Factory methods :
  • have names, unlike constructors, which can clarify code.
  • do not need to create a new object upon each invocation - objects can be cached and reused, if necessary.
  • can return a subtype of their return type - in particular, can return an object whose implementation class is unknown to the caller. This is a very valuable and widely used feature in many frameworks which use interfaces as the return type of static factory methods.
  • The creation of an object requires access to information or resources that should not be contained within the composing class
  • Common names for factory methods include getInstance and valueOf. These names are not mandatory - choose whatever makes sense for each case.
  • When factory methods are used for disambiguation like this, the constructor is often made private to force clients to use the factory methods.
  • Factory methods encapsulate the creation of objects. This can be useful, if the creation process is very complex; for example, if it depends on settings in configuration files or on user input.


public class Complex
    {
        public double real;
        public double imaginary;
 
        public static Complex FromCartesianFactory(double real, double imaginary ) 
        {
            return new Complex(real, imaginary);
        }
 
        public static Complex FromPolarFactory(double modulus , double angle ) 
        {
            return new Complex(modulus * Math.Cos(angle), modulus * Math.Sin(angle));
        }
 
 
        private Complex (double real, double imaginary)
        {
            this.real = real;
            this.imaginary = imaginary;
        }
    }
 
Complex product = Complex.FromPolarFactory(1,pi);


public class ComplexNumber {

  /**
  * Static factory method returns an object of this class.
  */
  public static ComplexNumber valueOf(float aReal, float aImaginary) {
    return new ComplexNumber(aReal, aImaginary);
  }

  /**
  * Caller cannot see this private constructor.
  *
  * The only way to build a ComplexNumber is by calling the static 
  * factory method.
  */
  private ComplexNumber (float aReal, float aImaginary) {
    fReal = aReal;
    fImaginary = aImaginary;
  }

  private float fReal;
  private float fImaginary;

  //..elided
} 

Limitations

There are three limitations associated with the use of the factory method. The first relates to refactoring existing code; the other two relate to extending a class.
  • The first limitation is that refactoring an existing class to use factories breaks existing clients. For example, if class Complex were a standard class, it might have numerous clients with code like:
Complex c = new Complex(-1, 0);
Once we realize that two different factories are needed, we change the class (to the code shown earlier). But since the constructor is now private, the existing client code no longer compiles.
  • The second limitation is that, since the pattern relies on using a private constructor, the class cannot be extended. Any subclass must invoke the inherited constructor, but this cannot be done if that constructor is private.
  • The third limitation is that, if we do extend the class (e.g., by making the constructor protected—this is risky but feasible), the subclass must provide its own re-implementation of all factory methods with exactly the same signatures. For example, if class StrangeComplex extends Complex, then unless StrangeComplex provides its own version of all factory methods, the call
    StrangeComplex.fromPolar(1, pi);
    
    will yield an instance of Complex (the superclass) rather than the expected instance of the subclass. The reflection features of some languages can obviate this issue.
All three problems could be alleviated by altering the underlying programming language to make factories first-class class members (see also Virtual class).[4]

      Thursday, November 8, 2012

      Quickies for Quick Results : Jar and JUnit

      1.Updating a Jar File

      The Jar tool provides a u option which you can use to update the contents of an existing JAR file by modifying its manifest or by adding files.

      The basic command for adding files has this format:
      jar uf jar-file input-file(s)
      
      In this command:
      • The u option indicates that you want to update an existing JAR file.
      • The f option indicates that the JAR file to update is specified on the command line.
      • jar-file is the existing JAR file that's to be updated.
      • input-file(s) is a space-deliminated list of one or more files that you want to add to the Jar file.
      Also, please remember the following before executing the command :
      • Any files already in the archive having the same pathname as a file being added will be overwritten.
      • When creating a new JAR file, you can optionally use the -C option to indicate a change of directory
      If using windows command line put the "class file" ( which you want to replace in the jar file ), in the same directory hierarchy as that of the file in the jar file. Otherwise the file would be either added to some other location or will not appear at all.

      Also, Use Zip or 7-zip software to look into the jar file. This will help you to
      • Obtain the fully qualified name of the class file you want to replace. ( using it you will create the folder structure (eg org/apache/hadoop/mapred/abc.class) that you want to create for the file).
      • After jar command execution you can recheck the timestamp of the file in the new jar file. If it is an old one you need to replace it with a new one.
      • For more info
      2. Running tests using junit (Testcase) :

      If you have your test case file in a jar file use the option from following depending on your JUnit's version.
      test class name is the fully qualified name of your test class file.
      For JUnit 4.X it's really:
      java -cp /usr/share/java/junit.jar:{any other jar files/ your jar file where your test case resides} org.junit.runner.JUnitCore [test class name]
      
      But if you are using JUnit 3.X please note the class name is different:
      java -cp /usr/share/java/junit.jar:{any other jar files/ your jar file where your test case resides} junit.textui.TestRunner [test class name]

      Hadoop Releases, Projects and features in a nutshell

      Hadoop Features in HDFS and MR across releases



      News on releases : Oct 2012 : 

      9 October, 2012: Release 2.0.2-alpha available

      This is the second (alpha) version in the hadoop-2.x series.
      This delivers significant enhancements to HDFS HA. Also it has a significantly more stable version of YARN which, at the time of release, has already been deployed on a 2000 node cluster.
      Please see the Hadoop 2.0.2-alpha Release Notes for details.
      Latest Hadoop : http://hadoop.apache.org/docs/current/ : 2.0.2
      Latest Stable Release : http://hadoop.apache.org/docs/stable/ : 1.0.4

      Common
      A set of components and interfaces for distributed filesystems and general I/O
      (serialization, Java RPC, persistent data structures).


      Avro
      A serialization system for efficient, cross-language RPC and persistent data
      storage.

      MapReduce
      A distributed data processing model and execution environment that runs on large
      clusters of commodity machines.

      HDFS
      A distributed filesystem that runs on large clusters of commodity machines.

      Pig
      A data flow language and execution environment for exploring very large datasets.
      Pig runs on HDFS and MapReduce clusters.

      Hive
      A distributed data warehouse. Hive manages data stored in HDFS and provides a
      query language based on SQL (and which is translated by the run time engine to
      MapReduce jobs) for querying the data.

      HBase
      A distributed, column-oriented database. HBase uses HDFS for its underlying
      storage, and supports both batch-style computations using MapReduce and point
      queries (random reads).

      ZooKeeper
      A distributed, highly available coordination service. ZooKeeper provides primitives
      such as distributed locks that can be used for building distributed applications.


      Sqoop
      A tool for efficient bulk transfer of data between structured data stores (such as
      relational databases) and HDFS.

      Oozie
      A service for running and scheduling workflows of Hadoop jobs (including Map-
      Reduce, Pig, Hive, and Sqoop jobs).

      Note referenece : Hadoop The Definitive GUIDE (3rd Edition) & Hadoop 

      2. SQL statements interview questions: a must know list

      The JOIN concept
      JOIN is a query clause that can be used with the SELECT, UPDATE, and DELETE data query statements to simultaneously affect rows from multiple tables. There are several distinct types of JOIN statements that return different data result sets.

      Joined tables must each include at least one field in both tables that contain comparable data. For example, if you want to join a Customer table and a Transaction table, they both must contain a common element, such as a CustomerID column, to serve as a key on which the data can be matched. Tables can be joined on multiple columns so long as the columns have the potential to supply matching information. Column names across tables don't have to be the same, although for readability this standard is generally preferred.

      Now that we’ve examined the basic theory, let’s take a look at the various types of joins and examples of each.

      The basic JOIN statement
      A basic JOIN statement has the following format:
      SELECT Customer.CustomerID, TransID, TransAmt
      FROM Customer JOIN Transaction
      ON Customer.CustomerID = Transaction.CustomerID;


      In practice, you'd never use the example above because the type of join is not specified. In this case, SQL Server assumes an INNER JOIN. You can get the equivalent to this query by using the statement:
      SELECT Customer.CustomerID, TransID, TransAmt
      FROM Customer, Transaction;

      However, the example is useful to point out a few noteworthy concepts:
      • TransID and TransAmt do not require fully qualified names because they exist in only one of the tables. You can use fully qualified names for readability if you wish.
      • The Customer table is considered to be the “left” table because it was called first. Likewise, theTransaction table is the “right” table.
      • You can use more than two tables, in which case each one is “naturally” joined to the cumulative result in the order they are listed, unless controlled by other functionality such as “join hints” or parenthesis.
      • You may use WHERE and ORDER BY clauses with any JOIN statement to limit the scope of your results. Note that these clauses are applied to the results of your JOIN statement.
      • SQL Server does not recognize the semicolon (;), but I use it in the included examples to denote the end of each statement, as would be expected by most other RDBMSs.
      The INNER JOIN drops rows
      When you perform an INNER JOIN, only rows that match up are returned. Any time a row from either table doesn’t have corresponding values from the other table, it is disregarded. Because stray rows aren’t included, you don’t have any of the “left” and “right” nonsense to deal with and the order in which you present tables matters only if you have more than two to compare. Since this is a simple concept, here’s a simple example:

      SELECT CustomerName, TransDate
      FROM Customer INNER JOIN Transaction
      ON Customer.CustomerID = Transaction.CustomerID;


      If a row in the Transaction table contains a CustomerID that’s not listed in the Customer table, that row will not be returned as part of the result set. Likewise, if the Customer table has a CustomerIDwith no corresponding rows in the Transaction table, the row from the Customer table won’t be returned.


      The OUTER JOIN can include mismatched rows
      OUTER JOINs, sometimes called “complex joins,” aren’t actually complicated. They are so-called because SQL Server performs two functions for each OUTER JOIN.

      The first function performed is an INNER JOIN. The second function includes the rows that the INNER JOIN would have dropped. Which rows are included depends on the type of OUTER JOIN that is used and the order the tables were presented.

      There are three types of an OUTER JOIN: LEFT, RIGHT, and FULL. As you’ve probably guessed, the LEFT OUTER JOIN keeps the stray rows from the “left” table (the one listed first in your query statement). In the result set, columns from the other table that have no corresponding data are filled with NULL values. Similarly, the RIGHT OUTER JOIN keeps stray rows from the right table, filling columns from the left table with NULL values. The FULL OUTER JOIN keeps all stray rows as part of the result set. Here is your example:
      SELECT CustomerName, TransDate, TransAmt
      FROM Customer LEFT OUTER JOIN Transaction
      ON Customer.CustomerID = Transaction.CustomerID;

      Customer names that have no associated transactions will still be displayed. However, transactions with no corresponding customers will not, because we used a LEFT OUTER JOIN and theCustomer table was listed first.

      In SQL Server, the word OUTER is actually optional. The clauses LEFT JOIN, RIGHT JOIN, and FULL JOIN are equivalent to LEFT OUTER JOIN, RIGHT OUTER JOIN, and FULL OUTER JOIN, respectively.

      1. SQL statements interview questions: a must know list


      1. you will want to list only the different (distinct) values in a table.
      The DISTINCT keyword can be used to return only distinct (different) values.

      SQL SELECT DISTINCT Syntax

      SELECT DISTINCT column_name(s)
      FROM table_name

      2. SQL UNIQUE Constraint
      The UNIQUE constraint uniquely identifies each record in a database table.
      The UNIQUE and PRIMARY KEY constraints both provide a guarantee for uniqueness for a column or set of columns.
      A PRIMARY KEY constraint automatically has a UNIQUE constraint defined on it.
      Note that you can have many UNIQUE constraints per table, but only one PRIMARY KEY constraint per table.

      SQL UNIQUE Constraint on CREATE TABLE

      The following SQL creates a UNIQUE constraint on the "P_Id" column when the "Persons" table is created:
      MySQL:
      CREATE TABLE Persons
      (
      P_Id int NOT NULL,
      LastName varchar(255) NOT NULL,
      FirstName varchar(255),
      Address varchar(255),
      City varchar(255),
      CONSTRAINT uc_PersonID UNIQUE (P_Id,LastName)
      )

      ALTER TABLE Persons
      ADD CONSTRAINT uc_PersonID UNIQUE (P_Id,LastName)


      3. The ORDER BY Keyword

      The ORDER BY keyword is used to sort the result-set by a specified column.
      The ORDER BY keyword sorts the records in ascending order by default.
      If you want to sort the records in a descending order, you can use the DESC keyword.

      SQL ORDER BY Syntax

      SELECT column_name(s)
      FROM table_name
      ORDER BY column_name(s) ASC|DESC

      SELECT * FROM Persons
      ORDER BY LastName


      4. The LIKE Operator
      The LIKE operator is used to search for a specified pattern in a column.

      The "Persons" table:
      P_IdLastNameFirstNameAddressCity
      1HansenOlaTimoteivn 10Sandnes
      2SvendsonToveBorgvn 23Sandnes
      3PettersenKariStorgt 20Stavanger


      We use the following SELECT statement:
      SELECT * FROM Persons
      WHERE City LIKE '%s'
      The result-set will look like this:
      P_IdLastNameFirstNameAddressCity
      1HansenOlaTimoteivn 10Sandnes
      2SvendsonToveBorgvn 23Sandnes



      We use the following SELECT statement:
      SELECT * FROM Persons
      WHERE City LIKE '%tav%'
      The result-set will look like this:
      P_IdLastNameFirstNameAddressCity
      3PettersenKariStorgt 20Stavanger



      It is also possible to select the persons living in a city that does NOT contain the pattern "tav" from the "Persons" table, by using the NOT keyword.
      We use the following SELECT statement:
      SELECT * FROM Persons
      WHERE City NOT LIKE '%tav%'
      The result-set will look like this:
      P_IdLastNameFirstNameAddressCity
      1HansenOlaTimoteivn 10Sandnes
      2SvendsonToveBorgvn 23Sandnes


      Tuesday, October 23, 2012

      Hadoop Quest 1 : ERROR : IBM BIGINSIGHTS on Cloud not accessible from web links on control panel

      Using commands

      1. I could not get to links of web pages of NameNode status, JobTrackerStatus etc from the web page of IBM instance of Master node . But only one of the linke "BigInsights web console" is taking me to the next web page. What should I do in this regard?

      Diagnosis :
      I started with Assumption : From the nature of the problem I think if I restart the instance I will be good to go. But first I needed to make sure which part of Hadoop is not working -> NameNode or Jobtracker or Tasktracker? -> All are up and running.

      Following is a snapshot of IBM's Biginsight's console of showing that the NameNode is also acting as a secondary NameNode and  Jobtracker



      So I started with the UserGuide : http://hadoop.apache.org/docs/stable/hdfs_user_guide.html#Shell+Commands

      Info is like :

      Shell Commands

      Hadoop includes various shell-like commands that directly interact with HDFS and other file systems that Hadoop supports. The command bin/hdfs dfs -help lists the commands supported by Hadoop shell. Furthermore, the command bin/hdfs dfs -help command-name displays more detailed help for a command. These commands support most of the normal files system operations like copying files, changing file permissions, etc. It also supports a few HDFS specific operations like changing replication of files. For more information see File System Shell Guide.

      DFSAdmin Command

      The bin/hadoop dfsadmin command supports a few HDFS administration related operations. The bin/hadoop dfsadmin -help command lists all the commands currently supported. For e.g.:
      • -report : reports basic statistics of HDFS. Some of this information is also available on the NameNode front page.
      • -safemode : though usually not required, an administrator can manually enter or leave Safemode.
      • -finalizeUpgrade : removes previous backup of the cluster made during last upgrade.
      • -refreshNodes : Updates the set of hosts allowed to connect to namenode. Re-reads the config file to update values defined by dfs.hosts and dfs.host.exclude and reads the entires (hostnames) in those files. Each entry not defined in dfs.hosts but in dfs.hosts.exclude is decommissioned. Each entry defined in dfs.hosts and also in dfs.host.exclude is stopped from decommissioning if it has aleady been marked for decommission. Entires not present in both the lists are decommissioned.
      • -printTopology : Print the topology of the cluster. Display a tree of racks and datanodes attached to the tracks as viewed by the NameNode.
      So I used the $hadoop -report to find the report on the HDFS name and data nodes. It showed me a complete detail of all the Hadoop nodes (active + dead) -> this way I was able to diagnose one of the node "Data Node 3" having "I.P. like in xxx.xxx.xxx.37" was dead . I removed it and made another node on its place . "Pretty cool stuff right "

      Now I stopped hadoop by stop-all.sh and started all the nodes by start-all.sh but found out that hive is not started , so

      1. I started one more name node 
      2. Installed winscp
      3. Configured winscp for ssh between my laptop and newly created name node.
      4. Copied the /mnt/biginsights/opt/ibm/bidinsights/hive folder to local directory of my laptop
      5. Started one more winscp for ssh between my laptop and old name node.
      6. copied the hive files into the same location as it was earlier
      7. now go to hive directory
      8. go to /conf
      9. vi hive-site.xml
      10. in the site name -> xxx.xxx.xxx.xxx rename to ur namenode xxx.xxx.xxx.xxx 
      11. Start hadoop by start-all.sh
      12. It starts successfully.


      But still the sites from the link are not opening up.

      SO NOW I looked into the log files : By default the $BIGINSIGHTS_VAR has your biginsights folder so did following

      1. cd $BIGINSIGHTS_VAR/console/log
      2. vi console-wasce.log 
      3. Found following error : 
      2012-10-22 15:50:48,218 ERROR [[JobServlet]] Servlet.service() for servlet JobServlet threw exception
      java.lang.NullPointerException
              at com.ibm.xap.console.job.JobUtil.jobOperationExceptionHandler(Unknown Source)
              at com.ibm.xap.console.job.JobUtil.createJob(Unknown Source)
              at com.ibm.xap.console.job.JobUtil.handleJobCmd(Unknown Source)
              at com.ibm.xap.console.job.JobOperationHandler.handleJobCmd(Unknown Source)
              at com.ibm.xap.console.job.JobOperationHandler.handleCmd(Unknown Source)
              at com.ibm.xap.console.servlet.JobServlet.doGet(Unknown Source)
              at com.ibm.xap.console.servlet.JobServlet.doPost(Unknown Source)
              at javax.servlet.http.HttpServlet.service(HttpServlet.java:713)
              at javax.servlet.http.HttpServlet.service(HttpServlet.java:806)
              at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
              at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
              at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
              at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
              at org.apache.geronimo.tomcat.valve.DefaultSubjectValve.invoke(DefaultSubjectValve.java:56)
              at org.apache.geronimo.tomcat.GeronimoStandardContext$SystemMethodValve.invoke(GeronimoStandardContext.java:406)
              at org.apache.geronimo.tomcat.valve.GeronimoBeforeAfterValve.invoke(GeronimoBeforeAfterValve.java:47)
              at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
              at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
              at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
              at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555)
              at org.apache.geronimo.tomcat.valve.ThreadCleanerValve.invoke(ThreadCleanerValve.java:40)
              at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
              at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
              at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
              at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
              at java.lang.Thread.run(Thread.java:736)


      This is the root cause of all problems. Lets figure out whats happening in the connectionhandler.process...

      I searched a lot for the jobUtil file but could not find it. So Ultimately I looked into two of the folders under /mnt/BI/opt/ibm/BI/ -> home directory for installation of hadoop BI is biginsights
      1. hadoop-conf -> all file pertaining to hadoop -> hadoop.sh hadoop-conf.xml etc
      2. conf -> biginsights-conf.sh -> all environment variable like $BIGINSIGHTS_VAR AND $ BIGINSIGHTS_HOME
      Finally, I looked into the console folder under BIGINSIGHTS_HOME, which has a folder wascs -> contains information about the BIconsole.WAR file -> I think this might have got corrupt or something like that. 

      Also I did tests and ran wordcount 2-3 times so no problem => hadoop is fine its webconsole war is corrupt. DONT KNOW WHAT TO DO NEXT....

      UPDATE ON 10/23/2012

      SO I finally resolved the problem. Apparently I found the following exception in $BIGINSIGHTS_VAR or /mnt/bI/var/ibm/BI/hadoop/logs/hadoop-<username>-namenode-<hostname>.log

      2012-10-23 13:39:48,297 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 9000, call delete(/hadoop/mapred/system/job_201210221816_0013, true) from 170.224.161.37:36870: error: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /hadoop/mapred/system/job_201210221816_0013. Name node is in safe mode.
      The ratio of reported blocks 1.0000 has reached the threshold 0.9990. Safe mode will be turned off automatically in 23 seconds.
      org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /hadoop/mapred/system/job_201210221816_0013. Name node is in safe mode.
      The ratio of reported blocks 1.0000 has reached the threshold 0.9990. Safe mode will be turned off automatically in 23 seconds.
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:1700)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:1680)
      at org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:517)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
      at java.lang.reflect.Method.invoke(Method.java:611)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
      at java.security.AccessController.doPrivileged(AccessController.java:284)
      at javax.security.auth.Subject.doAs(Subject.java:573)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)

      The solution is simple : This error occurs where some blocks were never reported in.So, I had to forcefully let the namenode leave safemode (hadoop dfsadmin -safemode leave) and then (optionally) run an fsck to delete missing files.

      I only run the leave safenode command and than clicked on the console from the BIGINSIGHTS web console. Everything is working now.








      Friday, October 12, 2012

      Papers on Map Reduce

      Following are a few papers that might interest you if you are in the field of Machine Learning Data Mining  and BIG DATA

      Atbrox is startup company providing technology and services for Search and Mapreduce/Hadoop. Our background is from Google, IBM and research. Contact us if you need help with algorithms for mapreduce
      This posting is the May 2010 update to the similar posting from February 2010, with 30 new papers compared to the prior posting, new ones are marked with *.
      Motivation
      Learn from academic literature about how the mapreduce parallel model and hadoop implementation is used to solve algorithmic problems.
      Which areas do the papers cover?
        Ads Analysis
        For an example of Parallel Machine Learning with Hadoop/Mapreduce, check out ourprevious blog post.

      Who wrote the above papers?
      Companies: China Mobile, eBay, Google, Hewlett Packard and Intel, Microsoft, Wikipedia, Yahoo and Yandex.
      Government Institutions and Universities: US National Security Agency (NSA)
      , Carnegie Mellon University, TU Dresden, University of Pennsylvania, University of Central Florida, National University of Ireland, University of Missouri, University of Arizona, University of Glasgow, Berkeley University and National Tsing Hua University, University of California, Poznan University, Florida International University, Zhejiang University, Texas A&M University, University of California at Irvine, University of Illinois, Chinese Academy of Sciences, Vrije Universiteit, Engenharia University, State University of New York, Palacky University, University of Texas at Dallas