Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This guide will take you through building a simple CDAP application that ingests web logs, aggregates the request counts for different combinations of fields, and that can then be queried for the volume over a time period. You can then retrieve insights on the traffic of a web site and the web site’s health. You will:

  • Use a Stream to ingest real-time log data;

  • Build a FlowWorkflow to process log entries as they are received into multidimensional facts;

  • Use a Dataset to store the aggregated numbers; and

  • Build a Service to query the aggregated data across multiple dimensions.

...

The following sections will guide you through building an application from scratch. If you are interested in deploying and running the application right away, you can clone its source code from this GitHub repository. In that case, feel free to skip the next two sections and jump right to the Build the Build and Run Application sectionApplication section.

Application Design

For this guide we will assume we are processing logs of a web-site that are produced by an Apache web server. The data could be collected from multiple servers and then sent to our application over HTTP. There are a number of tools that can help you with the ingestion task. We’ll skip over the details of ingesting the data (as this is covered elsewhere) and instead focus on storing and retrieving the data.

...