Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The purpose of this page is to document the plan for redesigning the Github statistic collection in the Caskalytics app.

...

Current Implementation of Github Metrics

  • API: https://developer.github.com/v3/

  • Use a Workflow Custom Action to run periodic RESTful calls to the Github API

  • Results will be written into the GitHub partition of the Fileset.

  • A MapReduce job will periodically read from the GitHub partition of the Fileset, and update the Cube dataset.

...

  • These will be REST endpoints used to get repo stats for Caskalytics
    • MethodEndpointDescriptionParmetersParametersResponse 
      GET/{org}/{repo}/statsReturns the stats of the given repo
      NameDescriptionRequired?
      orgString - the org for the repoYes
      repoString - the name of the repoYes
       
      Code Block
      {
        "name": "russorat/savage-leads",
        "size": 481,
        "forks": 0,
        "watchers": 1,
        "stargazers": 1,
        "openIssues": 3,
        "totalPullRequests": 2
      }
      GET/{org}/{repo}/messages/{messageType}Returns the messages for a given repo
      NameDescriptionRequiredDefault
      orgString - the org for the repoYes 
      repoString - the name of the repoYes 
      messageTypeString - the type of message to returnYes 
      startTimestart time to search for in Seconds. Defaults to 0No0
      endTimeend time to search for in Seconds. Defaults to nowNoNonow
      Code Block
      {
        totalMessages: 2,
        messages: ["{...}","{...}"]
      }
           

 

Github Dataset

  • Dataset will contain two stores: a Table to hold the raw messages and a Cube to hold the metrics.
  • As the raw data is written to the Table store, the metrics in the Cube will be updated as needed
  • The JSON message is first flattened and then each value inserted as a column in the Table. A final field called rawPayload is also written to capture the full payload.
  • The key to the table will be <fullRepoName>-<messageType>-<timestampInSeconds>-<X-GitHub-Delivery>. This will allow scanning by message and by time.
  • The Cube will have the following properties
    • Resolutions: 60,3600,86400,604800
    • Dimensions: 
      • repository
      • message_type
      • repository, message_type