Introduction

Plugin is used to fetch issues from from Jira using JQL or filtering properties or filter id. The plugin works in a parallel fashion.

Plugin Type

Batch Source
Batch Sink
Real-time Source
Real-time Sink
Action
Post-Run Action
Aggregate
Join
Spark Model
Spark Compute

Configurables

Section	Name	Description	Default	Widget	Validations
Basic	Jira URL	URL of Jira instance Example: https://issues.cask.co		Text Box
	Track Updates	If true, source will track updates of issues, not only their creations	False	Text Box
	Filter Mode	Possible values: Basic JQL Jira Filter Id	Basic	Select
	Projects (for mode Basic)	List of project names.		List
	Issue Types (for mode Basic)	List of issue types. E.g. Improvement, Bug, Task etc.		List
	Statuses (for mode Basic)	List of Issue statuses. E.g. Open, In Progress, Reopened, Resolved		List
	Priorities (for mode Basic)	List of Issue priorities. E.g. Critical		List
	Reporters (for mode Basic)	List reporter name ids. e.g. aonishuk		List
	Assignees (for mode Basic)	List assignee reporter ids e.g. aonishuk		List
	Fix Versions (for mode Basic)	List of fix versions e.g. 6.1.0		List
	Affected Versions (for mode Basic)	List of affected versions e.g. 6.1.0		List
	Labels (for mode Basic)	List of labels on issues. e.g. urgent.		List
	updateDateFrom (for mode Basic)	Start for range of update date. Can be used without end date.		Text Box	Validate if valid date.
	updateDateTo (for mode Basic)	End for range of update date. Can be used without start date.		Text Box	Validate if valid date.
	JQL query (for mode JQL)	A query in Jira Query Language (JQL), which is used to fetch issues. Example: project = CDAP AND priority >= Critical AND (fixVersion = 6.0.0 OR fixVersion = 6.1.0)		Text Box	Check if is valid URI
	Jira Filter Id (for Jira Filter Id)	An id of jira filter, which will be used to fetch issues.		Number
Authentication	Username	Used for basic authentication. If not set along with password. Login as anonymous user.		Text Box
Authentication	Password	Used for basic authentication. If not set along with username. Login as anonymous user.		Password
Advanced	Max Split Size (only for batch source)	Maximum number of issues which will be processed with a single request in a single split. If set to 0 everything will be processed in a single split.	50	Number

Design/Implementation

Structured Record Schema Structure

Note:

By default the schema contains all the fields possible.
If user wants to exclude some fields. It's enough to simply remove them from output schema.
~~We will query only fields which are in schema to increase efficiency.~~ Unfortunately this option does not work correctly in Jira API and does not give some fields even if they are queried. So will have to query all the fields every time.

Schema field	Type	Example	Notes	Nullable
key	String	NETTY-15
summary	String	Netty caches race condition
id	Long	21371	API id of issue in jira
project	String	Netty-HTTP
status	String	Open
description	String	... description of issue ...		true
resolution	String	Fixed		true
reporter	Record	{ name=aonishuk, displayName=Andrew Onischuk, emailAddress=null, #nullable active=true, avatarUris={48x48=https://www.gravatar.com/avatar/...}, groups=null, #nullable timezone=America/Los_Angeles #nullable }		true
assignee	Record	{ name=aonishuk, displayName=Andrew Onischuk, emailAddress=null, active=true, avatarUris={48x48=https://www.gravatar.com/avatar/...}, groups=null, timezone=America/Los_Angeles }
fields	array<record>	[{ 'id':'customfield_10005', 'name':'Epic Link', 'type': null, #string/nullable 'value': null #string/nullable },...]	Custom Fields
affectedVersions	array<string>	['NETTY-1.0']		true
fixVersions	array<string>	['NETTY-1.0-maint', 'NETTY-1.1']		true
components	array<string>	['NETTY-SERVER', 'NETTY-DOCS']
priority	string	Ciritical
issueType	string	Improvement
isSubtask	boolean	false
creationDate	LogicalType timestamp	2016-12-21T23:21:42.000+02:00
updateDate	LogicalType timestamp	2016-12-21T23:21:42.000+02:00
dueDate	LogicalType timestamp	2016-12-30T23:21:42.000+02:00
attachments	array<record>	[{ 'filename': 'image.png', 'author': 'aonishuk', 'creationDate': '2016-12-30T23:21:42.000+02:00' 'size': 21454, 'mimeType': 'image/png', 'contentUri': 'http://.../image.png' }, ...]
comments	array<record>	[{ 'author': 'aonishuk', 'updateAuthor': 'aonishuk', 'creationDate': '2016-12-30T23:21:42.000+02:00', 'updateDate': '2016-12-30T23:21:42.000+02:00', 'body': 'actual comment contents' }, ...]
issueLinks	array<record>	[{ 'type': ''is blocked by', # inward 'link': https://issues.cask.co /rest/api/2/issueLink/97018' }, ...]		true
votes	int	3
worklog	array<record>	[{ 'author': 'aonishuk', 'updateAuthor': 'aonishuk', 'startDate': '2016-12-30T23:21:42.000+02:00', 'creationDate': '2016-12-30T23:21:42.000+02:00', 'updateDate': '2016-12-30T23:21:42.000+02:00', 'comment': 'actual comment contents', 'minutesSpent': 3600 }, ...]
watchers	int	0		true
isWatching	boolean	false		true
timeTracking	record	{ 'originalEstimateMinutes': 3600, # nullable 'remainingEstimateMinutes': 100, # nullable 'timeSpentMinutes', 3500 # nullable }		true
subtasks	array<record>	[{ 'key': 'NETTY-44' 'summary': 'Http connection is broken' 'issueType': 'BUG' 'status': 'Open' }, ...]		true
labels	array<string>	['urgent', 'ready_for_review']

Why no OAuth2 Authentication?

Jira does not support creating OAuth2 applications for its users (not to be confused with OpenId access setup by some people via services like google etc.), which accepts OAuth2 of google, not of jira own.

Jira supports OAuth2 only for applications which are published to Atlassian market (aka. plug-ins for jira). Which is not our case. Link: https://developer.atlassian.com/cloud/jira/platform/oauth-2-authorization-code-grants-3lo-for-apps/

Implementation and Parallellization For Batch Source

For implementation the JIRA API framework will be used. Here's a generic example of the code using it

https://ecosystem.atlassian.net/wiki/spaces/JRJC/pages/27164680/Tutorial

The framework allows to get count of records which are fetched by JQL query and also to fetch only records from certain point, let's say starting at 100th issue to 150th issue.

Which make it perfect for a parallellization.

A single MapReduce split will proccess maximum maxSplitSize issues. And transform method will transform them from Issue objects to structured records.

STEP 1. Generating splits:

- execute JQL query asking minimal set of fields to get count of issues (this is not done when maxSplitSize is 0)

- create splits according to maxSplitSize

STEP 2. RecordReader routine:

- return issues one by one from startingPosition to endPosition defined by current split

STEP 3. Transform method:

- transform Issue object into structuredRecord.

Realtime Source Specifics

When plugin is run for the first time it will load all the issues. After that every X seconds (configurable via batchInterval), a plugin will fetch only newly created/updated issues.

Since we will just add the date of last found issue as a new condition for the next filter (this will avoid race conditions which would happen when using current date).

Also we can make the plugin continue from the place where it was stopped. This can be done using Spark checkpointing, we can save the date of creation of newest fetched issue and than add that to condition to filter to pipeline restart.

Table of Contents

Checklist
User stories documented
User stories reviewed
Design documented
Design reviewed
Feature merged
Examples and guides
Integration tests
Documentation for feature
Short video demonstrating the feature

Jira Source