Mixpanel Batch Source

Introduction

Mixpanel is a business analytics service and tracks user interactions with web and mobile applications and provides tools for targeted communication with them. Its tool set contains in-app A/B tests and user survey forms. Data collected is used to build custom reports and measure user engagement and retention. This plugin will extract raw event-level data from Mixpanel that can be further transformed and enriched with other data sources.

Use case(s)

  • Retrieve raw events data from Mixpanel service

User Storie(s)

  • As a data pipeline developer, I should be able to retrieve raw Mixpanel events data for specified start and end date so that I can enrich and transform raw data for further analysis
  • As a data pipeline developer, I should be able to specify API secret to authenticate the export request
  • As a data pipeline developer, I should be able to see all the errors while fetching the raw events data from Mixpanel so that I can review and fix those export issues 

Plugin Type

  • Batch Source
  • Batch Sink 
  • Real-time Source
  • Real-time Sink
  • Action
  • Post-Run Action
  • Aggregate
  • Join
  • Spark Model
  • Spark Compute

Configuration


User Facing NameTypeDescriptionDefault valueNotes
API SecretstringMixpanel API secret
https://developer.mixpanel.com/docs/exporting-raw-data#section-required-parameter
From datestringStart date for reports data
YYYY-MM-DD format
To datestringEnd date for reports data
YYYY-MM-DD format
Eventsmulti-selectComma separated list of events you would like to get data on
Optional
FilterstringExpression to filter events by
Optional


Design / Implementation Tips

  • Mixpanel only supports batch source so there will not be any streaming source for Mixpanel
  • Data export API always returns data in JSON format.
  • API details provide best practices around API usage 

Design

Approach(s)

Properties

Security

Limitation(s)

Future Work

Test Case(s)

  • Test case #1
  • Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data. 

Pipeline #1

Pipeline #2

References