Marklogic plugins
- Bhooshan Mogal
- Maksym Lozbin
Owned by Bhooshan Mogal
Introduction
MarkLogic Server is a powerful software solution for harnessing your digital content all in a single database. MarkLogic enables you to build complex applications that interact with large volumes of JSON, XML, SGML, HTML, RDF triples, binary files, and other popular content formats. The unique architecture of MarkLogic ensures that your applications are both scalable and high-performance, delivering query results at search-engine speeds while providing transactional integrity over the underlying database. These plugins will allow you to integrate data in Marklogic with the rest of your data using CDAP.
User Storie(s)
- As a pipeline developer, I would like to read data in Marklogic in batch using CDAP, so that I can integrate it easily with the rest of my data.
- As a pipeline developer, I would like to write complex structures (XML, JSON, SGML, HTML, RDF triples, binary data, etc) to Marklogic in batch using CDAP, so that I do not have to develop custom code to load my data into Marklogic, and take advantage of the standardization that CDAP offers.
- As a pipeline developer, I would like CDAP to support ELT in Marklogic, so that I can take advantage of Marklogic's powerful search and analytics features after loading the data, while still maintaining standardization and lineage in CDAP
Plugin Type
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Configurables
Marklogic batch source.
Category | User Facing Name | Type | Description | Constraints |
---|---|---|---|---|
Basic | Host | text | The host running the Marklogic REST Server | Should validate URL |
Port | number | The port that the Marklogic REST Server listens on | ||
Database | text | Database | ||
Input method | radio button | Method to get files: QUERY or PATH | ||
Path | text | Path to read documents from | ||
Input Query | text | Query for data search | ||
Credentials | User | text | The user to perform operations as. The user should have appropriate read privileges | |
Password | password | The password for the user | ||
Connection | Authentication Type | radio button | The type of authentication to use - Digest or | |
Connection Type | radio button | The type of connection to use - Direct or Gateway | ||
Advanced | Format | select | Type of document (AUTO/JSON/XML/TEXT/BLOB/DELIMITED), default: BLOB | |
Delimiter | text | Delimiter if the format is 'delimited' | ||
Bounding Query | text | Query for splits generation | ||
Max Splits | number | Maximum amount of splits | ||
File Name Field | text | Field to store information about the file | ||
Payload Field | text | Field to store data from Binary and Text files |
Marklogic batch sink.
Category | User Facing Name | Type | Description | Constraints |
---|---|---|---|---|
Basic | Host | text | The host running the Marklogic REST Server | Should validate URL |
Port | number | The port that the Marklogic REST Server listens on | ||
Database | text | Database | ||
Path | text | Path to document folder | ||
File Name Field | text | Which input field will be used to generate file name. If this field is not set, than UUID will be generated | ||
Credentials | User | text | The user to perform operations as. The user should have appropriate read privileges | |
Password | password | The password for the user | ||
Connection | Authentication Type | radio button | The type of authentication to use - Digest or | |
Connection Type | radio button | The type of connection to use - Direct or Gateway | ||
Advanced | Batch size | number | The batch size for writing to Marklogic | |
Max retries | number | The maximum retries for requests to marklogic | ||
Format | select | Type of document, default: JSON | ||
Delimiter | text | Delimiter if the format is 'delimited' |
Marklogic query executor action.
Category | User Facing Name | Type | Description | Constraints |
---|---|---|---|---|
Basic | Host | text | The host running the Marklogic REST Server | Should validate URL |
Port | number | The port that the Marklogic REST Server listens on | ||
Database | text | Database | ||
Query | textarea | The query to execute in Marklogic | ||
Credentials | User | text | The user to perform operations as. The user should have appropriate read privileges | |
Password | password | The password for the user | ||
Connection | Authentication Type | radio button | The type of authentication to use - Digest or | |
Connection Type | radio button | The type of connection to use - Direct or Gateway |
Design / Implementation Tips
- Use Marklogic Server docker image for testing
- Marklogic Java Client
- Tutorial
- Marklogic Hadoop Connector
Design
Approach(s)
Properties
Security
Limitation(s)
Future Work
- Some future work – HYDRATOR-99999
- Another future work – HYDRATOR-99999
Test Case(s)
- Test case #1
- Test case #2
Sample Pipeline
Please attach one or more sample pipeline(s) and associated data.
Pipeline #1
Pipeline #2
Table of Contents
Checklist
- User stories documented
- User stories reviewed
- Design documented
- Design reviewed
- Feature merged
- Examples and guides
- Integration tests
- Documentation for feature
- Short video demonstrating the feature