The purpose of this page is to document the plan for redesigning the Github statistic collection in the Caskalytics app.
...
Current Implementation of Github Metrics
Use a Workflow Custom Action to run periodic RESTful calls to the Github API
Results will be written into the GitHub partition of the Fileset.
A MapReduce job will periodically read from the GitHub partition of the Fileset, and update the Cube dataset.
...
- These will be REST endpoints used to get repo stats for Caskalytics
Method Endpoint Description ParmetersParameters Response GET /{org}/{repo}/stats Returns the stats of the given repo Name Description Required? org String - the org for the repo Yes repo String - the name of the repo Yes Code Block { "name": "russorat/savage-leads", "size": 481, "forks": 0, "watchers": 1, "stargazers": 1, "openIssues": 3, "totalPullRequests": 2 }
GET /{org}/{repo}/messages/{messageType} Returns the messages for a given repo Name Description Required Default org String - the org for the repo Yes repo String - the name of the repo Yes messageType String - the type of message to return Yes startTime start time to search for in Seconds. Defaults to 0 No 0 endTime end time to search for in Seconds. Defaults to nowNo No now Code Block { totalMessages: 2, messages: ["{...}","{...}"] }
Github Dataset
- Dataset will contain two stores: a Table to hold the raw messages and a Cube to hold the metrics.
- As the raw data is written to the Table store, the metrics in the Cube will be updated as needed
- The JSON message is first flattened and then each value inserted as a column in the Table. A final field called rawPayload is also written to capture the full payload.
- The key to the table will be <fullRepoName>-<messageType>-<timestampInSeconds>-<X-GitHub-Delivery>. This will allow scanning by message and by time.
- The Cube will have the following properties
- Resolutions: 60,3600,86400,604800
- Dimensions:
- repository
- message_type
- repository, message_type