Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction:

...

SNAPSHOT dataset is a good solution for this.


User stories:

 Creating adapters with SNAPSHOT source/sink:

...

Users can read from the SNAPSHOT datasets using other applications. They should see the data with the current latest snapshot. They will not be able to see any updates in the dataset unless they try to get the dataset again. Therefore, the users should use the SNAPSHOT dataset for batch programs. For long run programs such as flow, the user can only see one SNAPSHOT and cannot see any updates.


Design:

My test process and results for the existing table sink:

1. I created an adapter with a DB source and a table sink. Initially the mysql table had 4 entries.

...

   the table sink correctly updated the inserted entry but the deleted entry was still there.


SNAPSHOT dataset design (TBD):

The SNAPSHOT dataset will be a subclass of a table dataset, since the functionality of a Snapshot dataset will be similar to a table dataset. It basically will have all the functionalities a Table dataset has.

...

  1. SnapshotDataset.java, which provides implementation of data operations that can be performed on this dataset instance.

  2. SnapshotDatasetAdmin.java, which defines the administrative operations the Snapshot dataset will support.

  3. SnapshotDatasetDefinition.java, which defines a way to configure the snapshot dataset and a way to perform administrative or data manipulation operations on the dataset instance by providing implementation of SnapshotDatasetAdmin and implementation of SnapshotDataset.

  4. SnapshotModule.java, which defines the dataset type.


Approach:

  • Description:

Have two tables for the SNAPSHOT dataset, one is metadata table use to store the current version, the other is use to store the data. When querying for the dataset, we will first try to get the current version from the metadata table and filter the other table for the corresponding version. Use TTL or minVersion to maintain the out-dated data.

...