The purpose of this page is to document the plan for redesigning the Github statistic collection in the Caskalytics app.
...
Current Implementation of Github Metrics
Use a Workflow Custom Action to run periodic RESTful calls to the Github API
Results will be written into the GitHub partition of the Fileset.
A MapReduce job will periodically read from the GitHub partition of the Fileset, and update the Cube dataset.
...
- Dataset will contain two stores: a Table to hold the raw messages and a Cube to hold the metrics.
- As the raw data is written to the Table store, the metrics in the Cube will be updated as needed
- The JSON message is first flattened and then each value inserted as a column in the Table. A final field called rawPayload is also written to capture the full payload.
- The key to the table will be <fullRepoName>-<messageType>-<timestampInSeconds>-<X-GitHub-Delivery>. This will allow scanning by message and by time.
- The Cube will have the following properties
- Resolutions: 60,3600,86400,604800
- Dimensions:
- repository
- message_type
- repository, message_type
- sender
- sender, message_type
Integrating with Caskalytics
- Caskalytics code will need to be updated to call the new repo stats endpoints as needed
- From what I can tell, we would only need to update the front end to query these endpoints. The backend logic used to query data from github can be removed.