An Application is a collection of building blocks that read and write data through the data abstraction layer in CDAP.

Applications are composed from Programs, Services, and Schedules.

Programs include Flows, MapReduce programs, Workflows, Spark programs, and Workers are used to process data. Services are used to serve data.

Data abstractions include Streams and Datasets.

Applications are created using an Artifact and optional configuration. An Artifact is a JAR file that packages the Java Application class that defines how the Programs, Services, Schedules, Streams, and Datasets interact. It also packages any dependent classes and libraries needed to run the Application.

🔗Implementing an Application Class

To implement an application class, extend the AbstractApplication class, specifying the application metadata and declaring and configuring each of the application components:

public class MyApp extends AbstractApplication {
  public void configure() {
    setDescription("My Sample Application");
    addStream(new Stream("myAppStream"));
    createDataset("myAppDataset", Table.class);
    addFlow(new MyAppFlow());
    addService(new MyService());
    addMapReduce(new MyMapReduce());
    addWorkflow(new MyAppWorkflow());

Notice that Streams are defined using the provided Stream class, and Datasets are defined by passing a Table class; both are referenced by name.

Other components are defined using user-written classes that implement correspondent interfaces and are referenced by passing an object, in addition to being assigned a unique name.

Names used for streams and datasets need to be unique across the CDAP namespace, while names used for programs and services need to be unique only to the application.

🔗A Typical CDAP Application Class

A typical design of a CDAP application class consists of:

  • Streams to ingest data into CDAP;
  • Flows, consisting of Flowlets linked together, to process the ingested data in real time or batch;
  • MapReduce programs, Spark programs, and Workflows for batch processing tasks;
  • Workers for processing data in an ad-hoc manner that doesn't fit into real-time or batch paradigms
  • Datasets for storage of data, either raw or the processed results; and
  • Services for serving data and processed results.

Of course, not all components are required: it depends on the application. A minimal application could include a stream, a flow, a flowlet, and a dataset. It's possible a stream is not needed, if other methods of bringing in data are used. In the next pages, we'll look at these components, and their interactions.

🔗Application Version

Applications can be created with a version string. This can be useful when a newer version of the same application needs to be created, and you need to distinguish them and run them at the same time. Programs of a specific version of an application can be started and stopped using the calls of the version-aware Lifecycle HTTP RESTful APIs.

If a version is not provided while creating an application (i.e., the application is created using a non-version-aware API), a default version of "-SNAPSHOT" is used.

If an application version is specified that matches one that already exists, it will be overwritten only if the version string ends with "-SNAPSHOT". Otherwise, versions are immutable, and the only way to change a version is to delete the application of that version and then redeploy it.

Information about the version-aware RESTful APIs to create, list, and delete applications using versions can be found in the Lifecycle HTTP RESTful API documentation.

🔗Application Configuration

Application classes can use a Config class to receive a configuration when an Application is created. For example, configuration can be used to specify—at application creation time—a stream to be created or a dataset to be read, rather than having them hard-coded in the AbstractApplication's configure method. The configuration class needs to be the type parameter of the AbstractApplication class. It should also extend the Config class present in the CDAP API. The configuration is provided as part of the request body to create an application. It is available during configuration time through the getConfig() method in AbstractApplication.

Information about the RESTful call is available in the Lifecycle HTTP RESTful API documentation.

We can modify the MyApp class above to take in a Configuration MyApp.MyAppConfig:

public class MyApp extends AbstractApplication<MyApp.MyAppConfig> {

  public static class MyAppConfig extends Config {
    String streamName;
    String datasetName;

    public MyAppConfig() {
      // Default values
      this.streamName = "myAppStream";
      this.datasetName = "myAppDataset";

  public void configure() {
    MyAppConfig config = getConfig();
    setDescription("My Sample Application");
    addStream(new Stream(config.streamName));
    createDataset(config.datasetName, Table.class);
    addFlow(new MyAppFlow(config));
    addService(new MyService(config.datasetName));
    addMapReduce(new MyMapReduce(config.datasetName));
    addWorkflow(new MyAppWorkflow());

In order to use the configuration in programs, we pass it to individual programs using their constructor. If the configuration parameter is also required during runtime, you can use the @Property annotation. In the example below, the uniqueCountTableName is used in the configure method to register the usage of the dataset. It is also used during the runtime to get the dataset instance using getDataset() method:

public class UniqueCounter extends AbstractFlowlet {
  private final String uniqueCountTableName;

  private UniqueCountTable uniqueCountTable;

  public void configure(FlowletConfigurer configurer) {

  public UniqueCounter(String uniqueCountTableName) {
    this.uniqueCountTableName = uniqueCountTableName;

  public void initialize(FlowletContext context) throws Exception {
    uniqueCountTable = context.getDataset(uniqueCountTableName);

  public void process(String word) {

🔗Application Example

Applications are included in just about every CDAP application, tutorial, guide or example.

An example demonstrating the usage of a configuration is the WordCount example.