Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The purpose of this page is to document the plan for redesigning the Github statistic collection in the Caskalytics app.

...

Current Implementation of Github Metrics

  • API: https://developer.github.com/v3/

  • Use a Workflow Custom Action to run periodic RESTful calls to the Github API

  • Results will be written into the GitHub partition of the Fileset.

  • A MapReduce job will periodically read from the GitHub partition of the Fileset, and update the Cube dataset.

...

  • Dataset will contain two stores: a Table to hold the raw messages and a Cube to hold the metrics.
  • As the raw data is written to the Table store, the metrics in the Cube will be updated as needed
  • The JSON message is first flattened and then each value inserted as a column in the Table. A final field called rawPayload is also written to capture the full payload.
  • The key to the table will be <fullRepoName>-<messageType>-<timestampInSeconds>-<X-GitHub-Delivery>. This will allow scanning by message and by time.
  • The Cube will have the following properties
    • Resolutions: 60,3600,86400,604800
    • Dimensions: 
      • repository
      • message_type
      • repository, message_type
      • sender
      • sender, message_type

Integrating with Caskalytics

  • Caskalytics code will need to be updated to call the new repo stats endpoints as needed
  • From what I can tell, we would only need to update the front end to query these endpoints. The backend logic used to query data from github can be removed.