Guides

  • User Guide
    • Tutorials: Tutorials on how to use Data Preparation and Data Pipelines for common data science use cases
    • Data Preparation: Documentation on usage of Data Preparation and Data Preparation transforms
    • Data Pipelines: Documentation on usage of Data Pipelines UI
    • Analytics: Interactive, UI-driven analytics and machine learning
  • Developer Manual
    • Getting Started Developing: A quick, hands-on introduction to developing with CDAP
    • Overview: The overall architecture, abstractions, modes, and components behind CDAP
    • Building Blocks: The two core abstractions in CDAP: Data and Applications, and their components
    • Metadata: CDAP can automatically capture metadata and let you see how data is flowing
    • Pipelines: A capability of CDAP that that allows users to build, deploy, and manage data pipelines
    • Cloud Runtimes: Set up profiles to run pipelines in different cloud environments
    • Security: Perimeter security, configuration and client authentication
    • Testing and Debugging: Test framework plus tools and practices for debugging your applications
    • Ingesting Data: Different techniques for ingesting data into CDAP
    • Advanced Topics: Adding a custom logback, best practices for CDAP development, class loading in CDAP, configuring program resources and retry policies
  • Administration Manual
    • Installation: Putting CDAP into production, with installation, configuration and upgrading for different distributions
    • Security: CDAP supports securing clusters using a perimeter security model
    • Operations: Logging, monitoring, metrics, runtime arguments, scaling instances, resource guarantees, transaction service maintenance, and introduces the CDAP UI
    • Appendices: Covers the CDAP installation and security configuration files
  • Integrations
    • Hub: A source for re-usable applications, data, and code for all CDAP users
    • Cloudera: Integrating CDAP into Cloudera, using Cloudera Manager, running interactive queries with Impala, and bridging CDAP Metadata with Cloudera's data management tool, Navigator
    • Apache Sentry: Configuring and integrating CDAP with Apache Sentry
    • Apache Ranger: Configuring and integrating CDAP with Apache Ranger
    • Apache Hadoop KMS: Configuring and integrating CDAP with Apache Hadoop Key Management Service (KMS)
    • JDBC: The CDAP JDBC driver, included with CDAP
    • ODBC: The CDAP ODBC driver available for CDAP
    • Pentaho: Pentaho Data Integration, a business intelligence tool that can be used with CDAP
    • Squirrel: SquirrelSQL, a simple JDBC client that can be integrated with CDAP