Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

The purpose of this page is to document the plan for redesigning the Github statistic collection in the Caskalytics app.

Goals for Redesign

The idea behind the redesign is to create a standalone "mini" app that someone can install in their CDAP platform which will passively collect Github webhook messages and analyze them as needed. The current implementation is limited because it uses periodic polling of the Github API to gather only information about specific repos. The idea behind this redesign is to expose an endpoint that will collect and store all information posted to it from Github webhooks including comments, PRs, new issues, new repos, etc. For more information, see https://developer.github.com/webhooks/ .

Ideally, an organization could configure this webhook at the org level to passively capture all changes made to their Github account.

Current Implementation of Github Metrics

  • API: https://developer.github.com/v3/

  • Use a Workflow Custom Action to run periodic RESTful calls to the Github API

  • Results will be written into the GitHub partition of the Fileset.

  • A MapReduce job will periodically read from the GitHub partition of the Fileset, and update the Cube dataset.

New implementation of Github Metrics

  • Expose a service that accepts and verifies valid webhook messages from Github and writes those messages to a Datatable.
    • This will collect both the raw messages as well as a metrics table for collecting stats at a repo and user level
  • Expose a RESTful endpoint to query metrics from the aggregates table and return results in JSON
  • Use the data service to create some sort of visual display of the information.

Metrics Calculated

  • Metrics will be stored in a seperate dataset from the raw messages
  • Repo messages will overwrite each time a new message is recieved from Github
  • Per User metrics will be incremented 
  • All Time Metrics
    • Per Repository
      • repo size
      • stargazers_count
      • watchers_count
      • forks_count
    • Per User
      • Issues <action> (opened, closed, reopened)
      • Issue Comment Created

Capture Endpoint

  • The capture endpoint will be a catch all endpoint where the 

 

  • No labels