...
Cask Data Application Platform (CDAP) provides an abstraction— Datasets —to abstraction, called Datasets, to store data. In this guide, you will learn how to integrate and analyze Datasets with BI (Business Intelligence) Tools.
...
This guide will take you through building a CDAP application that a CDAP application that processes purchase events from a Streama Stream, persists the results in a Dataset, and then analyzes them using a BI tool. You will:
build a CDAP Application that consumes purchase events from a Stream and stores them in a Dataseta Dataset;
build a Flowlet that a Flowlet that processes purchase events in realtime, writing the events into a Dataset; and
finally, access this Dataset from a BI tool to run queries by joining purchase events in the Dataset with a product catalog—a local data source in the BI tool.
...
Let’s Build It!
The following sections will guide you through building an application from scratch. If you are interested in deploying and running the application right away, you can clone its source code and binaries from this GitHub repository. In that case, feel free to skip the next two sections and jump right to the Build the Build and Run Application sectionApplication section.
Application Design
In this example, we will learn how to explore purchase events using the Pentaho BI Tool. We can ask questions such as "What is the total spend of a customer for a given day?"
...
Purchase events are injected into the purchases
Stream. The sink
Flowlet reads events from the Stream and writes them into the PurchasesDataset
. The PurchasesDataset
has Hive integration enabled and can be queried, like any regular Database table, from a BI tool by using the CDAP JDBC Driver.
Implementation
The first step is to get our application structure set up. We will use a standard Maven project structure for all of the source code files:
...
The application is identified by the PurchaseApp
class. This class extends AbstractApplicationextends AbstractApplication, and overrides the configure()
method to define all of the application components:
...
We also need a place to store the purchase event records that we receive; PurchaseApp
next creates a Dataset to store the processed data. PurchaseApp
uses an ObjectStore Dataset an ObjectStore Dataset to store the purchase events. The purchase events are represented as a Java class, Purchase
:
...
Congratulations! You have now learned how to analyze CDAP Datasets from a BI tool. Please continue to experiment and extend this sample application.
Related Topics
...