Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Current »

Checklist

  • User Stories Documented
  • User Stories Reviewed
  • Design Reviewed
  • APIs reviewed
  • Release priorities assigned
  • Test cases reviewed
  • Blog post

Introduction 

Metrics are written to the metrics table with a row key that is composed from all the dimensions in the metrics context and the metric name. The metric context contains the instance id of the container, which, in a MapReduce job, is the task ID of the mapper or reducer. A large MapReduce that runs 1000 mappers and emits 10 different metrics would therefore write to 10000 rows every second. Because the metrics table is not salted or otherwise spread out, all these writes (having the same prefix of namespace, app, program id, run id) go to the same region. This region goes under very heavy load, which can impact performance of other regions served on the same region server.

Goals

The goal is to salt following metrics tables to avoid hot spotting.

  • cdap_system:metrics.v2.table.ts.1
  • cdap_system:metrics.v2.table.ts.2147483647
  • cdap_system:metrics.v2.table.ts.3600
  • cdap_system:metrics.v2.table.ts.60

User Stories 

  • CDAP system should be able to salt metrics tables so that all the writes does not go to same region.

Design

In order to salt HBase metrics tables, we can create a new salted metrics tables:

  • cdap_system:metrics.v3.table.ts.1
  • cdap_system:metrics.v3.table.ts.2147483647
  • cdap_system:metrics.v3.table.ts.3600
  • cdap_system:metrics.v3.table.ts.60

Tables that do not require salting:

  • cdap_system:metrics.kafka.meta
  • cdap_system:metrics.v2.entity

Once the new salted tables are created, all the writes will go to salted tables. However, for reading, we can use CombinedHBaseMetricsTable which encapsulates both unsalted an salted metrics tables. Any read query on tables will be performed on both the tables and the results will be merged. While any write queries are only performed on salted HBase tables.

 

public class CombinedHBaseMetricsTable implements MetricsTable {

  private final MetricsTable unsaltedHBaseTable;
  private final MetricsTable saltedHBaseTable;

  @Nullable
  @Override
  public byte[] get(byte[] row, byte[] column) {
	// read from old and new tables
    return new byte[0];
  }

  @Override
  public void put(SortedMap<byte[], ? extends SortedMap<byte[], Long>> updates) {
    // write only to saltedHBaseTable
  }

  @Override
  public void putBytes(SortedMap<byte[], ? extends SortedMap<byte[], byte[]>> updates) {
    // write only to saltedHBaseTable
  }

  @Override
  public boolean swap(byte[] row, byte[] column, byte[] oldValue, byte[] newValue) {
    return false;
  }

  @Override
  public void increment(byte[] row, Map<byte[], Long> increments) {

  }

  @Override
  public void increment(NavigableMap<byte[], NavigableMap<byte[], Long>> updates) {
     // write only to saltedHBaseTable
  }

  @Override
  public long incrementAndGet(byte[] row, byte[] column, long delta) {
    return 0;
  }

  @Override
  public void delete(byte[] row, byte[][] columns) {
    // delete records from both the tables
  }

  @Override
  public Scanner scan(@Nullable byte[] start, @Nullable byte[] stop, @Nullable FuzzyRowFilter filter) {
    // scan and merge records from both the tables
    return null;
  }

  @Override
  public void close() throws IOException {

  }
}

 

Deletion of Existing Unsalted Tables

Metrics Table with resolution 1s has retention of 2 hours by default while 1m and 1h resolution tables have retention of 30 days by default. Now, once data in these tables gets expired, a background thread running in metrics processor can delete these tables. However, totals resolution table retains metrics for Integer.MAX_VALUE time. This table we can copy each row from unsalted table to new table with additional column `u`. This additional column can store last timestamp copied by the background thread. If a thread dies in the middle, it can start copying cell versions from last timestamp stored.

Approach

Approach #1

Approach #2

API changes

New Programmatic APIs

New Java APIs introduced (both user facing and internal)

Deprecated Programmatic APIs

New REST APIs

PathMethodDescriptionResponse CodeResponse
/v3/apps/<app-id>GETReturns the application spec for a given application

200 - On success

404 - When application is not available

500 - Any internal errors

 

     

Deprecated REST API

PathMethodDescription
/v3/apps/<app-id>GETReturns the application spec for a given application

CLI Impact or Changes

  • Impact #1
  • Impact #2
  • Impact #3

UI Impact or Changes

  • Impact #1
  • Impact #2
  • Impact #3

Security Impact 

What's the impact on Authorization and how does the design take care of this aspect

Impact on Infrastructure Outages 

System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect

Test Scenarios

Test IDTest DescriptionExpected Results
   
   
   
   

Releases

Release X.Y.Z

Release X.Y.Z

Related Work

  • Work #1
  • Work #2
  • Work #3

 

Future work

  • No labels