The goal of this page is to describe the redesign of the Web Analytics portion of Caskalytics
Background
Much of the tracking that happens on the web today is done via beacons (pixels, tags, etc) that are requested from a 3rd party server when the website user loads a webpage as seen below.
Probably the most popular tracker used online in Google Analytics. The system consists of two parts. The first is a small piece of javascript that runs on the client's browser to gather specific page information such as url, page title, screen resolution and other metrics. A full list of the metrics collected can be found here: https://developers.google.com/analytics/devguides/collection/protocol/v1/parameters#aip
And here is a sample request made to Google Analytics from the browser:
http://www.google-analytics.com/collect ?v=1 &_v=j41 &a=2076467505 &t=pageview &_s=1 &dl=http%3A%2F%2Fdocs.cask.co%2Fcdap%2F3.4.0-SNAPSHOT%2Fen%2Fsearch.html%3Fq%3Dcdap%2Bconfiguration%26utm_campaign%3Dcampaign%26utm_source%3Dsource%26utm_medium%3Dmedium%26utm_content%3Dcontent%26utm_keyword%3Dkeyword &ul=en-us &de=UTF-8 &dt=Search%20%E2%80%94%20Cask%20Data%20Application%20Platform%203.4.0-SNAPSHOT%20Documentation &sd=24-bit &sr=1280x800 &vp=1265x235 &je=0 &fl=21.0%20r0 &_u=QCCAgAAB~ &jid=17949930 &cid=1415295176.1456379066 &tid=UA-XXXXXX-X >m=GTM-XXXXXXX &z=1909841856
The second part of the system is the endpoint that handles the request from the tracking code. The service collects the information passed to it from the url as well as the request headers including Referer, User-Agent, Cookies, and Remote Ip Address. This information is stored in a datastore and metrics are calculated, both in real time as well as batched.
Important Features of a Tracking Pixel
- Versioned - each request is versioned so that if non-compatible changes are made to the API, they can continue to support legacy code
- Property Ids - This allows multiple websites to be tracked from the same endpoint
- Client Id - This is a unique identifier for the user and is stored in a cookie on the user's browser. If the user clears their cookies, a new client id is generated. This is used to calculate new vs returning visitors.
- Campaign/Source tracking - By adding url parameters to their url, a site owner can record the source of where the user came to the site from. Sometimes this information can be obtained from the referrer, but in the case of https or redirects, that information can be missing or not accurate. If a user lands on your page with specific campaign and source information, you can be relatively certain thats where they came from. NOTE: These tracking parameters "leak" as urls are copied and shared so switching campaigns regularly is advised.
- Multiple Activities - The most common activity is pageview, but other activities a person performs could be useful as well such as transactions or events.
- Timings - Allows analytics to track dns lookups and page load times.