...
This also includes removing the temporary datasets that could be created as a part of workflows.
API Changes:
The two choices Approach Choices for APIs:
...
#1 Programtype=workflow: For this API, programs in all relations are simply replaced by workflows, if applicable. This means that this lineage view will not have a mapping of programs to workflows.
- JSON Response Changes:
- In the "programs" section: the ProgramId of the Program will be replaced by ProgramId of the workflow if the program is associated with a workflow. If not the Programs show as it is.
- In the "relations" section: the programs field will carry the name of the workflows if applicable.
#2 Collapse=program: Programs are collapsed into workflows if applicable. Mapping of program to workflow is maintained in this case. This approach is an extension of the collapse approach used today. But since workflows are also programs, it has the ambiguity when collapsing on "programs" because it still shows workflows.
...
- JSON Response Changes:
- In the "programs" section: the ProgramId of the Program will be replaced by ProgramId of the workflow if the program is associated with a workflow. If not the Programs show as it is.
- In the "relations" section: a new field called "workflow" will be added for all the relations that could be collapsed based on workflows. The programs field will be a collection of all the programs that were collapsed to form this workflow.
#3 [ NOT considering now ] Group-by on Workflow: In addition to the collapse, if a group-by is provided, it can be used as group-by=workflow and the programs that dont have workflows associated are left as it is. This approach also has some level of ambiguity listed above but is extensible if in future there is a requirement for group-by on application. [Because collapse workflow is confusing if there are no workflows present]
Implementation Changes:
- ....
Code Block |
---|
curl "http://127.0.0.1:10000/v3/namespaces/default/datasets/EmpAgg/lineage?collapse=access&collapse=run&collapse=component& |
...
programtype= |
...
workflow&end=now&levels=1&start=now-7d" | python -m |
...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1006 100 1006 0 0 121k 0 --:--:-- --:--:-- --:--:-- 140k
{
...
json.tool { "data": |
...
{ "dataset.default.EmpAgg": |
...
{ "entityId": |
...
{ "id": |
...
{ "instanceId": "EmpAgg", |
...
"namespace": |
...
{ "id": "default" |
...
}
},
"type": "datasetinstance"
}
},
...
} }, "type": "datasetinstance" } }, "dataset.default.conn-0.e0591f36-9661-11e6-af4a-0000007182af": |
...
{ "entityId": |
...
{ "id": |
...
{ "instanceId": "conn-0.e0591f36-9661-11e6-af4a-0000007182af", |
...
"namespace": |
...
{ "id": "default" |
...
}
},
"type": "datasetinstance"
}
}
},
"end": 1476926333,
"programs": {
"<workflow name>": {
"entityId": {
"id": {
"application": {
"applicationId": "EmployeePipe_Long_copy",
"namespace": {
"id": "default"
}
},
"id": "phase-2",
"type": "Mapreduce"
},
"type": "<type workflow>"
}
}
},
"relations": [
{
"accesses": [
"read",
"write"
],
"components": [],
...
} }, "type": "datasetinstance" } } }, "end": 1476926333, "programs": { "<workflow name>": { "entityId": { "id": { "application": { "applicationId": "EmployeePipe_Long_copy", "namespace": { "id": "default" } }, "id": "phase-2", "type": "Mapreduce" }, "type": "<type workflow>" } } }, "relations": [ { "accesses": [ "read", "write" ], "components": [], "data": "dataset.default.conn-0.e0591f36-9661-11e6-af4a-0000007182af", |
...
"workflow" : "<workflow name>"
"program": [
"mapreduce.default.EmployeePipe_Long_copy.phase-2",
"mapreduce.default.EmployeePipe_Long_copy.phase-1"
]
"runs": [
...
"program" : "<workflow name>" "runs": [ "e4051038-9661-11e6-8060-000000d79ea8", |
...
"e4051038-9661-11e6-8060-000000d79ea8" |
...
]
},
{
...
] }, { "accesses": |
...
"write",
"read"
],
"components": [],
...
[ "write", "read" ], "components": [], "data": "dataset.default.EmpAgg", |
...
"workflow": "",
...
"program": "mapreduce.default.EmployeePipe_Long_copy.phase-2", |
...
"runs": |
...
[ "e4051038-9661-11e6-8060-000000d79ea8" |
...
]
}
],
...
] } ], "start": 1476321533 |
...
} |