Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Cask Data Application Platform (CDAP) provides an abstraction— Datasets —to abstraction, called Datasets, to store data. In this guide, you will learn how to integrate and analyze Datasets with BI (Business Intelligence) Tools.

...

This guide will take you through building CDAP application that a CDAP application that processes purchase events from Streama Stream, persists the results in a Dataset, and then analyzes them using a BI tool. You will:

  • build a CDAP Application that consumes purchase events from a Stream and stores them in Dataseta Dataset;

  • build Flowlet that a Flowlet that processes purchase events in realtime, writing the events into a Dataset; and

  • finally, access this Dataset from a BI tool to run queries by joining purchase events in the Dataset with a product catalog—a local data source in the BI tool.

...

Let’s Build It!

The following sections will guide you through building an application from scratch. If you are interested in deploying and running the application right away, you can clone its source code and binaries from this GitHub repository. In that case, feel free to skip the next two sections and jump right to the Build the Build and Run Application sectionApplication section.

Application Design

In this example, we will learn how to explore purchase events using the Pentaho BI Tool. We can ask questions such as "What is the total spend of a customer for a given day?"

...

Purchase events are injected into the purchases Stream. The sink Flowlet reads events from the Stream and writes them into the PurchasesDataset. The PurchasesDataset has Hive integration enabled and can be queried, like any regular Database table, from a BI tool by using the CDAP JDBC Driver.

Implementation

The first step is to get our application structure set up. We will use a standard Maven project structure for all of the source code files:

...

The application is identified by the PurchaseApp class. This class extends AbstractApplicationextends AbstractApplication, and overrides the configure() method to define all of the application components:

...

We also need a place to store the purchase event records that we receive; PurchaseApp next creates a Dataset to store the processed data. PurchaseApp uses an ObjectStore Dataset an ObjectStore Dataset to store the purchase events. The purchase events are represented as a Java class, Purchase:

...