Page Comparison

The purpose of this page is to document the plan for redesigning the Github statistic collection in the Caskalytics app.

...

API: https://developer.github.com/v3/
Use a Workflow Custom Action to run periodic RESTful calls to the Github API
Results will be written into the GitHub partition of the Fileset.
A MapReduce job will periodically read from the GitHub partition of the Fileset, and update the Cube dataset.

...

These will be REST endpoints used to get repo stats for Caskalytics

Method

Endpoint

Description

ParmetersParameters

Response

GET

/{org}/{repo}/stats

Returns the stats of the given repo

Name	Description	Required?
org	String - the org for the repo	Yes
repo	String - the name of the repo	Yes

Code Block
{ "name": "russorat/savage-leads", "size": 481, "forks": 0, "watchers": 1, "stargazers": 1, "openIssues": 3, "totalPullRequests": 2 }

GET

/{org}/{repo}/messages/{messageType}

Returns the messages for a given repo

Name	Description	Required	Default
org	String - the org for the repo	Yes
repo	String - the name of the repo	Yes
messageType	String - the type of message to return	Yes
startTime	start time to search for in Seconds. Defaults to 0	No	0
endTime	end time to search for in Seconds. Defaults to nowNo	No	now

Code Block
{ totalMessages: 2, messages: ["{...}","{...}"] }

Dataset will contain two stores: a Table to hold the raw messages and a Cube to hold the metrics.
As the raw data is written to the Table store, the metrics in the Cube will be updated as needed
The JSON message is first flattened and then each value inserted as a column in the Table. A final field called rawPayload is also written to capture the full payload.
The key to the table will be <fullRepoName>-<messageType>-<timestampInSeconds>-<X-GitHub-Delivery>. This will allow scanning by message and by time.
The Cube will have the following properties
- Resolutions: 60,3600,86400,604800
- Dimensions:
  - repository
  - message_type
  - repository, message_type

Versions Compared