Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The purpose of this page is to document the plan for redesigning the Github statistic collection in the Caskalytics app.

...

Current Implementation of Github Metrics

  • API: https://developer.github.com/v3/

  • Use a Workflow Custom Action to run periodic RESTful calls to the Github API

  • Results will be written into the GitHub partition of the Fileset.

  • A MapReduce job will periodically read from the GitHub partition of the Fileset, and update the Cube dataset.

...

  • These will be REST endpoints used to get repo stats for Caskalytics
    • MethodEndpointDescriptionParametersResponse
      GET/{org}/{repo}/statsReturns the stats of the given repo
      NameDescriptionRequired?
      orgString - the org for the repoYes
      repoString - the name of the repoYes
      Code Block
      {
        "name": "russorat/savage-leads",
        "size": 481,
        "forks": 0,
        "watchers": 1,
        "stargazers": 1,
        "openIssues": 3,
        "totalPullRequests": 2
      }
      GET/{org}/{repo}/messages/{messageType}Returns the messages for a given repo. A list of events can be found here: https://developer.github.com/webhooks/#events
      NameDescriptionRequiredDefault
      orgString - the org for the repoYes 
      repoString - the name of the repoYes 
      messageTypeString - the type of message to returnYes 
      startTimestart time to search for in SecondsNo0
      endTimeend time to search for in SecondsNonow
      Code Block
      {
        totalMessages: 2,
        messages: ["{...}","{...}"]
      }
      GET/{sender}/statsReturns statistics for a given github user (sender). If no sender is found, an empty stats list is returned.
      NameDescriptionRequiredDefault
      senderString - The github username to get stats forYes 
      Code Block
      {
        "sender": "russorat",
        "stats": {
          "issue_comment": 1,
          "issues": 3,
          "create": 1,
          "ping": 1,
          "push": 1
        }
      }
      Code Block
      {
        "sender": "russoratsdfsdf",
        "stats": {}
      }
      GET/topSenders/{messageType}?limit={limit}Returns an array of the top senders for the given message type
      NameDescriptionRequiredDefault
      messageTypeString - The type of message to get the top senders forYes 
      limitlong - The number of results to returnNo10
      Code Block
      [
        {
          "sender": "russorat",
          "stats": {
            "push": 1
          }
        }
      ]
      GET/{org}/{repo}/metric?metric={metric}Returns a given custom metric for a repo
      NameDescriptionRequiredDefault
      orgString - the org for the repoYes 
      repoString - the name of the repoYes 
      metricString - the custom metric to returnYes 
      Code Block
      {
        repoName: "russorat/savage-leads",
        metricName: "repository.watchers",
        metric: 0
      }

 

Github Dataset

  • Dataset will contain two stores: a Table to hold the raw messages and a Cube to hold the metrics.
  • As the raw data is written to the Table store, the metrics in the Cube will be updated as needed
  • The JSON message is first flattened and then each value inserted as a column in the Table. A final field called rawPayload is also written to capture the full payload.
  • The key to the table will be <fullRepoName>-<messageType>-<timestampInSeconds>-<X-GitHub-Delivery>. This will allow scanning by message and by time.
  • The Cube will have the following properties
    • Resolutions: 60,3600,86400,604800
    • Dimensions: 
      • repository
      • message_type
      • repository, message_type
      • sender
      • sender, message_type

...