Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
stylecircle

...

Here’s an example of simple Cypher Query:

Code Block
MATCH (n) RETURN n

Example of Neo4j data

Neo4j database contains information about Persons and Movies and relation between them.
For getting information what movies related with person 'Meg Ryan' can be used nex CQL query:

Code Block
MATCH (person:Person {name: "Meg Ryan"})-[rel]-(movie) RETURN person, rel, movie

Result of this query will be next:

Graf viewText view

Image Added

"person"
"rel"
"movie"
{"name":"Meg Ryan","born":1961}
{"roles":["DeDe","Angelica Graynamore","Patricia Graynamore"]}
{"title":"Joe Versus the Volcano",
"tagline":"A story of love, lava andburning desire.",
"released":1990}
{"name":"Meg Ryan","born":1961}
{"roles":["Sally Albright"]}
{"title":"When Harry Met Sally",
"tagline":"Can two friends sleep toget her and still love each other in the morning?",
"released":1998}
{"name":"Meg Ryan","born":1961}
{"roles":["Kathleen Kelly"]}
{"title":"You've Got Mail",
"tagline":"At odds in life... in love on-line.",
"released":1998}
{"name":"Meg Ryan","born":1961}
{"roles":["Carole"]}
{"title":"Top Gun",
"tagline":"I feel the need, the need for speed.",
"released":1986}
{"name":"Meg Ryan","born":1961}
{"roles":["Annie Reed"]}
{"title":"Sleepless in Seattle",
"tagline":"What if someone you never met, someone you never saw, someone you never
knew was the only someone for you?",
"released":1993}

Source Splitter

The proposal is to add "Splits Number" Source configuration property, which allows specifying the desired number of splits to divide the query into when reading from Neo4j. 
Fewer splits may be created if the query cannot be divided into the desired number of splits.
Also, we can use '0' as the default value for this configuration property and determine the number of splits according to the number of map tasks (controlled by the "mapreduce.job.maps" property):

Code Block
public List<InputSplit> getSplits(JobContext job) throws IOException {
 
    ...
 
    int targetNumTasks = job.getConfiguration().getInt(MRJobConfig.NUM_MAPS, 1);
     
    ...

'MATCH ... RETURN COUNT(*)' CQL query can be used in order to get a total number of documents, that will be divided between splits using 'SKIP' and 'LIMIT'

Source Properties

SectionUser Facing NameWidget TypeDescriptionConstraints
GeneralLabeltextboxLabel for UI.

Reference NametextboxUniquely identified name for lineage.Required

Neo4j Host
textboxNeo4j database host.Required

Neo4j PortnumberNeo4j database port.Required

Input Querytextbox

The query to use to import data from the Neo4j database.
Query example: 'MATCH (n:Label) RETURN n.property_1, n.property_2'.

Required
CredentialsUsernametextboxUser identity for connecting to the Neo4j.Required

PasswordpasswordPassword to use to connect to the Neo4j.Required
AdvancedSplits NumbernumberThe number of splits to generate. If set to one, the orderBy is not needed.

Order Bytextbox

Field Name which will be used for ordering during splits generation. This is required unless numSplits is set to one.


...

Neo4j Data TypesCDAP Schema Data Types
nullnull
Listarray
Maprecord
Booleanboolean
Integerlong
Floatdouble
Stringstring
ByteArraybytes
Datedate
Timetime-micros
LocalTimetime-micros
DateTimetimestamp-micros
LocalDateTimetimestamp-micros
Node

record

Schema example:

Code Block
{"name": "n", "type": {
	"type": "record", "name": "n", "fields": [
		{"name": "born", "type": "long"}, 
		{"name": "name", "type": "string"}, 
		{"name": "_id", "type": "long"}, 
		{"name": "_labels", "type": {"type": "array", "items": "string"}}
	]
}}
Relationship

record


Schema example:

Code Block
{"name": "r", "type": {
    "type": "record", "name": "r", "fields": [
        {"name": "_startId", "type": "long"},
        {"name": "roles", "type": {"type": "array", "items": "string"}},
        {"name": "_type", "type": "string"},
        {"name": "_endId", "type": "long"},
        {"name": "_id", "type": "long"}
    ]
}}

Duration

A Duration represents a temporal amount, capturing the difference in time between two instants, and can be negative.

record

Schema example:

Code Block
{"name": "dr", "type": {
    "type": "record", "name": "dr", "fields": [
        {"name": "duration", "type": "string"},
        {"name": "seconds", "type": "long"},
        {"name": "months", "type": "long"},
        {"name": "days", "type": "long"},
        {"name": "nanoseconds", "type": "stringint"}
    ]
}}
Point

record

Schema example:

Code Block
{"name": "p", "type": {
    "type": "record", "name": "p", "fields": [
        {"name": "crs", "type": "string"},
        {"name": "x", "type": "double"},
        {"name": "y", "type": "double"},
        {"name": "srid", "type": "string"}
    ]
}}
Path

...

SectionUser Facing NameWidget TypeDescriptionConstraints
GeneralLabeltextboxLabel for UI.

Reference NametextboxUniquely identified name for lineage.Required

Neo4j HosttextboxNeo4j database host.Required

Neo4j PortnumberNeo4j database port.Required

Output QuerytextboxThe query to use to export data to the Neo4j database.
Query example: 'CREATE (n:<label_field>l {property_1, property_2})'.
Required
CredentialsUsernametextboxUser identity for connecting to the Neo4j.Required

PasswordpasswordPassword to use to connect to the Neo4j.Required

...