The purpose of this page is to document the plan for redesigning the Github statistic collection in the Caskalytics app.
...
Current Implementation of Github Metrics
Use a Workflow Custom Action to run periodic RESTful calls to the Github API
Results will be written into the GitHub partition of the Fileset.
A MapReduce job will periodically read from the GitHub partition of the Fileset, and update the Cube dataset.
...
- These will be REST endpoints used to get repo stats for Caskalytics
Method Endpoint Description Parameters Response GET /{org}/{repo}/stats Returns the stats of the given repo Name Description Required? org String - the org for the repo Yes repo String - the name of the repo Yes Code Block { "name": "russorat/savage-leads", "size": 481, "forks": 0, "watchers": 1, "stargazers": 1, "openIssues": 3, "totalPullRequests": 2 }
GET /{org}/{repo}/messages/{messageType} Returns the messages for a given repo. A list of events can be found here: https://developer.github.com/webhooks/#events Name Description Required Default org String - the org for the repo Yes repo String - the name of the repo Yes messageType String - the type of message to return Yes startTime start time to search for in Seconds No 0 endTime end time to search for in Seconds No now Code Block { totalMessages: 2, messages: ["{...}","{...}"] }
GET /{sender}/stats Returns statistics for a given github user (sender). If no sender is found, an empty stats list is returned. Name Description Required Default sender String - The github username to get stats for Yes Code Block { "sender": "russorat", "stats": { "issue_comment": 1, "issues": 3, "create": 1, "ping": 1, "push": 1 } }
Code Block { "sender": "russoratsdfsdf", "stats": {} }
GET /topSenders/{messageType}?limit={limit} Returns an array of the top senders for the given message type Name Description Required Default messageType String - The type of message to get the top senders for Yes limit long - The number of results to return No 10 Code Block [ { "sender": "russorat", "stats": { "push": 1 } } ]
GET /{org}/{repo}/metric?metric={metric} Returns a given custom metric for a repo Name Description Required Default org String - the org for the repo Yes repo String - the name of the repo Yes metric String - the custom metric to return Yes Code Block { repoName: "russorat/savage-leads", metricName: "repository.watchers", metric: 0 }
Github Dataset
- Dataset will contain two stores: a Table to hold the raw messages and a Cube to hold the metrics.
- As the raw data is written to the Table store, the metrics in the Cube will be updated as needed
- The JSON message is first flattened and then each value inserted as a column in the Table. A final field called rawPayload is also written to capture the full payload.
- The key to the table will be <fullRepoName>-<messageType>-<timestampInSeconds>-<X-GitHub-Delivery>. This will allow scanning by message and by time.
- The Cube will have the following properties
- Resolutions: 60,3600,86400,604800
- Dimensions:
- repository
- message_type
- repository, message_type
- sender
- sender, message_type
...