Installation on Amazon EMR using Bootstrap Actions


This section describes installing CDAP on Amazon EMR clusters using the Amazon EMR "Run If" Bootstrap Action to:

  • Install necessary EMR components;
  • Restrict CDAP installation to the EMR master node;
  • Download, install, and automatically configure CDAP for EMR; and
  • Run all services as the 'cdap' user

Information on Amazon EMR is available online.

CDAP 5.1 is compatible with Amazon EMR 4.6.0 through 5.3.1.

Using the Create Cluster Wizard

Note: For any settings not listed or specified below, we recommend using the default settings.

  1. Open the Amazon EMR console at

  2. Choose "Create cluster."

  3. In the Advanced Options, Step 1: Software and Steps, set:

    • Vendor: Amazon
    • Release: emr-4.6.0 through emr-5.3.1
    • Software: Hadoop, HBase, Hive, Spark
    • No auto-terminate

    EMR Create Cluster Wizard: Step 1: Software and Steps

  4. In Step 2: Hardware, set:

    • Network: use defaults
    • EC2 Subnet: use defaults
    • Master
      • EC2 Instance type: m3.xlarge
      • Instance count: 1
    • Core
      • EC2 Instance type: m3.xlarge
      • Instance count: 4 (as a minimum)
    • Task
      • Instance count: 0 (not required)

    EMR Create Cluster Wizard: Step 2: Hardware

  5. In Step 3: General Cluster Settings, set:

    • Logging
    • Debugging
    • Termination protection (no auto-terminate)

    EMR Create Cluster Wizard: Step 3: General Cluster Settings

  6. In Step 3: General Cluster Settings, add a Bootstrap Action:

    • Type: Run If

    • Optional arguments:

      instance.isMaster=true "curl | sudo bash -s"

    EMR Create Cluster Wizard: Add Bootstrap Action

  7. In Step 4: Security, set following defaults, and then add a security group (next step).


    EMR Create Cluster Wizard: Step 4: Security

  8. In Step 4: Security, set additional EC2 Security Groups to the master node:

    • Master (one of the following):
      • A Security Group with ports 11011/11015 open; or
      • An SSH Tunnel

    EMR Create Cluster Wizard: Assigning additional security group to master node

Once the cluster is created, CDAP services will start up. This will take about 10 minutes after the cluster is in a Waiting state.


CDAP Smoke Test

The CDAP UI may initially show errors while all of the CDAP YARN containers are starting up. Allow for up to a few minutes for this.

The Administration page of the CDAP UI shows the status of the CDAP services. It can be reached at http://<cdap-host>:11011/cdap/administration, substituting for <cdap-host> the host name or IP address of the CDAP server:


CDAP UI: Showing started-up, Administration page.

Further instructions for verifying your installation are contained in Verification.