Goal
The Row Denormalizer plugin allows one to de-normalise data along with ability to convert data types.
Checklist
- User stories documented
- User stories reviewed
- Design documented
- Design reviewed
- Feature merged
Examples and guides- Integration tests
- Documentation for feature
- Short video demonstrating the feature
Use-case
Imagine that a source database has a table that stores a variable set of custom attributes for an entity like the one defined below. This model allows adding any number of attributes to an entity.
Key Field
Field Name
Field Value
Key Field | Field Name | Field Value |
---|---|---|
joltie | FIRST_NAME | Nitin |
joltie | LAST_NAME | Motgi |
joltie | ADDRESS | 150 Grant Ave, Suite A |
joltie | CITY | Palo Alto |
But once the data is being passed through the Row Denormaliser it should map to the following making it easy to query the data.
Key | FIRST_NAME | LAST_NAME | addr | CITY |
---|---|---|---|---|
joltie | Nitin | Motgi | 150 Grant Ave, Suite A | Palo Alto |
Conditions
In case a field is value is not present then it’s considered as NULL
Options
User is able to specify the Key Field based on the Input Schema (has to be field in Input Schema). This the key of the output row. From the above example it’s “Key Field”
User is able to specify the list of fields that should be considered to form a denormalized record. From the above example it should be ‘FIRST_NAME’, ‘LAST_NAME’, ‘ADDRESS’ & ‘CITY’
Users are able to specify the output field name for each through mapping. From the above example ‘ADDRESS’ in input is mapped to ‘addr’ in output schema.
Similarly simple type conversions should be attempted - {int, long, float, double} -> string
Design
Examples
Properties:
- datasetName: name of the database table to be de-normalized.
- keyField: key on the basis of which input record will be de-normalized. This field should be included in input schema.
- outputFieldSchema: list of the fields and its mappings to be included in de-normalized output. For example, ADDRESS (in input) to Addr (in output).
Example:
{
"name": "RowDenormaliser",
"type": "transform",
"properties":
{
"keyField" : "",
"datasetName" : "",
"outputFieldSchema": " {..output table schema ...}",
"inputSchema": "{.. input table schema..}",
}
}
The transform takes DataBase table as input record that has a 'KeyField' field (column name) specified by user, de-normalizes it on the basis of this field, and then returns a de-normalised table according to the output schema specified by the user.
For example, if it receives as an input record:
Key Field | Field Name | Field Value |
---|---|---|
joltie | FIRST_NAME | Nitin |
joltie | LAST_NAME | Motgi |
joltie | ADDRESS | 150 Grant Ave, Suite A |
joltie | CITY | Palo Alto |
it will transform it to this output record on the basis of Key Field value "joltie" :
Key | FIRST_NAME | LAST_NAME | Addr | CITY |
---|---|---|---|---|
joltie | Nitin | Motgi | 150 Grant Ave, Suite A | Palo Alto |