Page Comparison

The purpose of this page is to document the plan for redesigning the Github statistic collection in the Caskalytics app.

...

API: https://developer.github.com/v3/
Use a Workflow Custom Action to run periodic RESTful calls to the Github API
Results will be written into the GitHub partition of the Fileset.
A MapReduce job will periodically read from the GitHub partition of the Fileset, and update the Cube dataset.

...

Dataset will contain two stores: a Table to hold the raw messages and a Cube to hold the metrics.
As the raw data is written to the Table store, the metrics in the Cube will be updated as needed
The JSON message is first flattened and then each value inserted as a column in the Table. A final field called rawPayload is also written to capture the full payload.
The key to the table will be <fullRepoName>-<messageType>-<timestampInSeconds>-<X-GitHub-Delivery>. This will allow scanning by message and by time.
The Cube will have the following properties
- Resolutions: 60,3600,86400,604800
- Dimensions:
  - repository
  - message_type
  - repository, message_type
  - sender
  - sender, message_type

Caskalytics code will need to be updated to call the new repo stats endpoints as needed
From what I can tell, we would only need to update the front end to query these endpoints. The backend logic used to query data from github can be removed.

Versions Compared