Marklogic plugins

Introduction

MarkLogic Server is a powerful software solution for harnessing your digital content all in a single database. MarkLogic enables you to build complex applications that interact with large volumes of JSON, XML, SGML, HTML, RDF triples, binary files, and other popular content formats. The unique architecture of MarkLogic ensures that your applications are both scalable and high-performance, delivering query results at search-engine speeds while providing transactional integrity over the underlying database. These plugins will allow you to integrate data in Marklogic with the rest of your data using CDAP.

User Storie(s)

  • As a pipeline developer, I would like to read data in Marklogic in batch using CDAP, so that I can integrate it easily with the rest of my data.
  • As a pipeline developer, I would like to write complex structures (XML, JSON, SGML, HTML, RDF triples, binary data, etc) to Marklogic in batch using CDAP, so that I do not have to develop custom code to load my data into Marklogic, and take advantage of the standardization that CDAP offers.
  • As a pipeline developer, I would like CDAP to support ELT in Marklogic, so that I can take advantage of Marklogic's powerful search and analytics features after loading the data, while still maintaining standardization and lineage in CDAP

Plugin Type

  • Batch Source
  • Batch Sink 
  • Real-time Source
  • Real-time Sink
  • Action
  • Post-Run Action
  • Aggregate
  • Join
  • Spark Model
  • Spark Compute

Configurables

Marklogic batch source. 

CategoryUser Facing NameTypeDescriptionConstraints
BasicHosttextThe host running the Marklogic REST ServerShould validate URL
PortnumberThe port that the Marklogic REST Server listens on
DatabasetextDatabase
Input methodradio buttonMethod to get files: QUERY or PATH
PathtextPath to read documents from
Input QuerytextQuery for data search
CredentialsUsertextThe user to perform operations as. The user should have appropriate read privileges
PasswordpasswordThe password for the user
ConnectionAuthentication Typeradio buttonThe type of authentication to use - Digest or
Connection Typeradio buttonThe type of connection to use - Direct or Gateway
AdvancedFormatselectType of document (AUTO/JSON/XML/TEXT/BLOB/DELIMITED), default: BLOB
DelimitertextDelimiter if the format is 'delimited'
Bounding QuerytextQuery for splits generation
Max SplitsnumberMaximum amount of splits
File Name FieldtextField to store information about the file
Payload FieldtextField to store data from Binary and Text files

Marklogic batch sink. 

CategoryUser Facing NameTypeDescriptionConstraints
BasicHosttextThe host running the Marklogic REST ServerShould validate URL
PortnumberThe port that the Marklogic REST Server listens on
DatabasetextDatabase
PathtextPath to document folder
File Name FieldtextWhich input field will be used to generate file name. If this field is not set, than UUID will be generated
CredentialsUsertextThe user to perform operations as. The user should have appropriate read privileges
PasswordpasswordThe password for the user
ConnectionAuthentication Typeradio buttonThe type of authentication to use - Digest or
Connection Typeradio buttonThe type of connection to use - Direct or Gateway
AdvancedBatch sizenumberThe batch size for writing to Marklogic
Max retriesnumberThe maximum retries for requests to marklogic
FormatselectType of document, default: JSON
DelimitertextDelimiter if the format is 'delimited'

Marklogic query executor action. 

CategoryUser Facing NameTypeDescriptionConstraints
BasicHosttextThe host running the Marklogic REST ServerShould validate URL
PortnumberThe port that the Marklogic REST Server listens on
DatabasetextDatabase
QuerytextareaThe query to execute in Marklogic
CredentialsUsertextThe user to perform operations as. The user should have appropriate read privileges
PasswordpasswordThe password for the user
ConnectionAuthentication Typeradio buttonThe type of authentication to use - Digest or
Connection Typeradio buttonThe type of connection to use - Direct or Gateway

Design / Implementation Tips

Design

Approach(s)

Properties

Security

Limitation(s)

Future Work

  • Some future work – HYDRATOR-99999
  • Another future work – HYDRATOR-99999

Test Case(s)

  • Test case #1
  • Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data. 

Pipeline #1

Pipeline #2



Table of Contents

Checklist

  • User stories documented 
  • User stories reviewed 
  • Design documented 
  • Design reviewed 
  • Feature merged 
  • Examples and guides 
  • Integration tests 
  • Documentation for feature 
  • Short video demonstrating the feature