Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Provide NoopOutputFormat for the sink
    1. In order to make sure writing will not happen in real space, we will not let sink write anything to the datasets. 
    2. With a NoopOutputFormat, no write will happen for the sink and in the meantime, all the logic in transform will be preserved and tested by the preview run. 
  2. Expose if the pipeline is running in preview mode to the source plugins
    1. Since our pipeline accepts runtime arguments for the name of the plugin properties, sometimes we will not know the name of the dataset until runtime, therefore, letting the plugin know the pipeline is running in preview mode will help us read and create the dataset.
    2. Some of the sources will create dataset at runtime to do some writing, e,g. FileBatchSource has a timeTable which records the last read time. We need to make sure we do not create these datasets in real space while running in preview mode. 
    3. For datasets that requires reading, we check if the dataset exists in real space, if so we read from it. If not, create one in preview space. For datasets that requires writing, we ONLY create in preview space.
  3. Improve PreviewStore performance
    1. Since the records are StructuredRecord, it is easy to serialize it.

...