Introduction

A separate database plugin to support Neo4j-specific features and configurations.

Use-case

Users can choose and install Neo4j source and sink plugins.
Users should see Neo4j logo on plugin configuration page for better experience.
Users should get relevant information from the tool tip:
- The tool tip should describe accurately what each field is used for.
Users should not have to specify any redundant configuration.
Users should get field level lineage for the source and sink that is being used.
Reference documentation should be updated to account for the changes.
The source code for Neo4j database plugin should be placed in repo under data-integrations.org.
The data pipeline using source and sink plugins should run on both mapreduce and spark engines.

User Storie

User should be able to install Neo4j specific database source and sink plugins from the Hub.
Users should have each tool tip accurately describe what each field does.
Users should get field level lineage information for the Neo4j source and sink.
Users should be able to setup a pipeline avoiding specifying redundant information.
Users should get updated reference document for Neo4j source and sink.
Users should be able to read all the DB types.

Plugin Type

Batch Source
Batch Sink
Real-time Source
Real-time Sink
Action
Post-Run Action
Aggregate
Join
Spark Model
Spark Compute

Design Tips

Reference to the Neo4j driver manual: https://neo4j.com/docs/driver-manual/1.7/
Reference to Cypher Query Language manual: https://neo4j.com/docs/cypher-manual/current/

Design

Neo4j Overview

Neo4j is a graph database management system with native graph storage and processing. In Neo4j, everything is stored in the form of an edge, node, or attribute. Each node and edge can have any number of attributes. Both nodes and edges can be labelled. Labels can be used to narrow searches.

Cypher Query Language

Cypher is a declarative graph query language that allows for expressive and efficient querying and updating of the graph.
Cypher is inspired by a number of different approaches and builds on established practices for expressive querying. Many of the keywords, such as WHERE and ORDER BY, are inspired by SQL. Pattern matching borrows expression approaches from SPARQL. Some of the list semantics are borrowed from languages such as Haskell and Python.

Here are a few clauses used to read from the graph:

MATCH: The graph pattern to match. This is the most common way to get data from the graph.
WHERE: Not a clause in its own right, but rather part of MATCH, OPTIONAL MATCH and WITH. Adds constraints to a pattern, or filters the intermediate result passing through WITH.
RETURN: What to return.

Here’s an example of simple Cypher Query:

`MATCH (n) RETURN n`

Source Properties

User Facing Name	Widget Type	Description	Constraints
Label	textbox	Label for UI.
Reference Name	textbox	Uniquely identified name for lineage.
Connection String	textbox	Neo4j connection string.	Required
Username	textbox	User identity for connecting to the Neo4j.
Password	password	Password to use to connect to the Neo4j.
Input Query	textbox	Neo4j query for import data.	Required

Source Data Types Mapping

CDAP Schema Data Types	Neo4j Data Types
null	null
array	List
map	Map
boolean	Boolean
long	Integer
double	Float
string	String
bytes	ByteArray
date	Date
time-micros	Time
time-micros	LocalTime
timestamp-micros	DateTime
timestamp-micros	LocalDateTime
	Duration
-	Point
	Node
	Relationship
	Path

Sink Properties

User Facing Name	Widget Type	Description	Constraints
Label	textbox	Label for UI.
Reference Name	textbox	Uniquely identified name for lineage.
Connection String	textbox	Neo4j connection string.	Required
Username	textbox	User identity for connecting to the Neo4j.
Password	password	Password to use to connect to the Neo4j.
Output Query	textbox	Neo4j query for export data.	Required

Source Data Types Mapping

CDAP Schema Data Types	Neo4j Data Types
null	null
array	List
map	Map
boolean	Boolean
long	Integer
double	Float
string	String
bytes	ByteArray
date	Date
time-micros	Time
time-micros	LocalTime
timestamp-micros	DateTime
timestamp-micros	LocalDateTime
	Duration
	Point

Approach

Create a new maven project in it's own repository.

Pipeline Samples

Please attach one or more sample pipeline(s) and associated data.

Releases

Release X.Y.Z

Related Work

Database plugin enhancements

Neo4j database plugin

Introduction

Use-case

User Storie

Plugin Type

Design Tips

Design

Neo4j Overview

Cypher Query Language

Source Properties

Source Data Types Mapping

Sink Properties

Source Data Types Mapping

Approach

Pipeline Samples

Releases

Related Work