Tuesday, January 8, 2013

Gartner: Six Best Practices for Apache Hadoop Pilot

Gartner outlines some best practices that help cross-functional teams deploying a Hadoop pilot project, and assist IT and business leaders in avoiding common pitfalls.

1. Define the use case(s) well
2. Enlist and build a competent team
3. Choose the appropriate distribution vendor
4. Pilot, Test and Scale for Price/Performance
5. Plan for Data Integration
6. Perform a Thorough Postpilot Analysis

Key Challenges
Key challenges for undertaking Apache Hadoop pilot projects include:
  • Finding an appropriate use case that aligns well with goals of business teams and is feasible to implement.
  • Enlisting a competent team in the face of an acute shortage of Hadoop-related skills.
  • Choosing an appropriate distribution, given the multitude of Hadoop projects and version releases.
  • Dealing with data ingestion and integration challenges that can result in poor analytical outcomes.
Recommendations
  • Identify current skunkworks projects to find skills and experience within the organization, and build a cross-functional team to tackle a pilot.
  • Define a use case that leverages Hadoop's strengths and has measurable business outcomes.
  • Identify skill gaps that should be mitigated by either training or engaging external consultants.
  • Choose Hadoop software distribution based on use case rather than vice versa, and consider future scalability when running pilot projects.
  • Identify future integration requirements and opportunities to connect newly exploited data with existing analytics teams and tools.

(Skunkworks project is one typically developed by a small and loosely structured group of people who research and develop a project primarily for the sake of radical innovation. (Source: Wikipedia).

No comments:

Post a Comment