Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The purpose of this page is to document the plan for redesigning the Github statistic collection in the Caskalytics app.

...

Current Implementation of Github Metrics

  • API: https://developer.github.com/v3/

  • Use a Workflow Custom Action to run periodic RESTful calls to the Github API

  • Results will be written into the GitHub partition of the Fileset.

  • A MapReduce job will periodically read from the GitHub partition of the Fileset, and update the Cube dataset.

...

  • Metrics will be stored in a seperate dataset from the raw messages
  • Repo messages will overwrite each time a new message is recieved received from GithubPer User metrics will be incremented 
  • All Time Metrics
    • Per Repository
      • repo size
      • stargazers_count
      • watchers_count
      • forks_count
    • Per User
    • Issues <action> (opened, closed, reopened)
    • Issue Comment CreatedMessage
      • count
    • Per Repo / Per Message
      • count

Capture Endpoint

  • The capture endpoint will be a catch all endpoint where the 

...

  • that accepts POST messages from Github, verifies their authenticity, and writes the message to the data store.
  • Each message should have the following headers to be considered "valid"
    • User-Agent should start with GitHub-Hookshot/<id>
    • X-GitHub-Delivery should be a UUID for the message
    • X-GitHub-Event should be the name of the message
    • X-Hub-Signature should contain an sha1 digest of the message for verification
    • payload should be the json message
  • If any required headers are missing or invalid, the response will be UNAUTHORIZED with a message stating that they are not authorized to call the service.
  • If the Event is missing, a BAD_REQUEST is returned.
  • If there is no payload, a BAD_REQUEST is returned
  • if the payload digest does not match the one provided in the Signature header  or there is an error generating it, a BAD_REQUEST is returned
  • When everything is successful, an OK is returned with a message that it was successfully processed

Metrics Endpoints

  • These will be REST endpoints used to get repo stats for Caskalytics
    • EndpointDescriptionParmeters 
      /{org}/{repo}/statsReturns the stats of the given repo
      NameDescriptionRequired?
      orgString - the org for the repoYes
      repoString - the name of the repoYes
       
      /{org}/{repo}/messages/{messageType}Returns the messages for a given repo
      NameDescriptionRequired
      orgString - the org for the repoYes
      repoString - the name of the repoYes
      messageTypeString - the type of message to returnYes
      startTimestart time to search for in Seconds. Defaults to 0No
      endTimeend time to search for in Seconds. Defaults to nowNo
       
          

 

Github Dataset

  • Dataset will contain two stores: a Table to hold the raw messages and a Cube to hold the metrics.
  • As the raw data is written to the Table store, the metrics in the Cube will be updated as needed
  • The JSON message is first flattened and then each value inserted as a column in the Table. A final field called rawPayload is also written to capture the full payload.
  • The key to the table will be <messageType>-<timestampInSeconds>-<X-GitHub-Delivery>. This will allow scanning by message and by time.
  • The Cube will have the following properties
    • Resolutions: 60,3600,86400,604800
    • Dimensions: 
      • repository
      • message_type
      • repository, message_type