Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 HBaseLevelDB
Cleanup StrategyCoprocessors attached to tables (one for messageTable and another for payloadTable). No cleanup is required for MetadataTable.Periodically scheduled thread that cleans up data in MessageTable and PayloadTable. Frequency of schedule is configurable via cdap-site.xml.
TTL Expiration (MessageTable and Payload Table)Get TableId (namespace, topic) from rowKey and get the TopicMetadata from the MetadataTable (cached). From the cell rowkey, get the timestamp of write and use the TTL and determine if that cell needs to be skipped.Scan the tables, topic wise and remove rows that have exceeded TTL based on TopicMetadata info.
Older Generation (MessageTable and PayloadTable)Check the generation id of the row and compare it with the one we get from MetadataTable. If it is the current generation, then do nothing. If it is an older generation (gen < abs(currentgen) || gen == -1*currentgen), then skip the cell.Same logic as in HBase. While pruningMessages, scan the should start with generation '1' of that topic. Only difference between HBase and LevelDB implementation in cleanup of old generation is that, in HBase if a topic is deleted and not recreated at all, those entries will not be deleted (not even by TTL). This is probably OK for SDK. We can address this if required in the next release.
Invalid Transactions (MessageTable only)This requires (periodically refreshed) tx.snapshot in the Coprocessor. If the cell belongs to a TX_COL column, then get the tx id from it. If the transaction id is present in the invalid list (from the tx.snapshot), then invert the sign (-1 * tx_id) and put back the value. This way, the data is still visible to non-tx consumption and will be eventually cleared by TTL or when the topic is deleted.Not necessary in LevelDB since we don't support pruning invalid transactions in SDK!
Latest min tx timestamp (for manual invalid TX pruning, required only for MessageTable)The last used tx.snapshot info can be directly used to prune the invalid transaction list! So we need to log that info. Also need to write a TMS table debugger tool that can print this info as well. 

...