Jira Batch Source

General

Section	Name	Description	Default	Widget	Validations
Basic	Jira URL	URL of Jira instance Example: https://issues.cask.co		Text Box
	Filter Mode	Possible values: Basic JQL	Basic	Select
	project (for mode Basic)	List of project names.		List
	issueType (for mode Basic)	Type of issue. E.g. Improvement, Bug, Task etc.		List
	status (for mode Basic)	Status of issue. E.g. Open, In Progress, Done...		List
	reporter (for mode Basic)	Name id of reporter. e.g. aonishuk		List
	assignee (for mode Basic)	Name id of assignee. e.g. aonishuk		List
	fixVersions (for mode Basic)			List
	affectedVersions (for mode Basic)			List
	updateDateFrom (for mode Basic)			Text Box
	updateDateTo (for mode Basic)			Text Box
	labels (for mode Basic)			List
	JQL query (for mode JQL)	A query in Jira Query Language (JQL), which is used to fetch issues. Example: project = CDAP AND priority >= Critical AND (fixVersion = 6.0.0 OR fixVersion = 6.1.0)		Text Box	Check if is valid URI
Authentication	Username	Used for basic authentication.		Text Box	Validate if we can authenticate
Authentication	Password	Used for basic authentication.		Password	Validate if we can authenticate
Advanced	Max Split Size	Maximum number of issues which will be processed in a single split. If set to 0 everything will be processed in a single split.	1000	Number

Why no OAuth2 Authentication?

Jira does not support creating OAuth2 applications for its users (not to be confused with OpenId access setup by some people via services like google etc.), which accepts OAuth2 of google, not of jira own.

Jira supports OAuth2 only for applications which are published to Atlassian market (aka. plug-ins for jira). Which is not our case. Link: https://developer.atlassian.com/cloud/jira/platform/oauth-2-authorization-code-grants-3lo-for-apps/

Structured Record Schema Structure

Note:

By default the schema contains all the fields possible.
If user wants to exclude some fields. It's enough to simply remove them from output schema.
We will query only fields which are in schema to increase efficiency.

Schema field	Type	Example	Notes	Nullable
key	String	NETTY-15
summary	String	Netty caches race condition
id	Long	21371	API id of issue in jira
project	String	Netty-HTTP
status	String	Open
description	String	... description of issue ...		true
resolution	String	Fixed		true
reporter	Record	{ name=aonishuk, displayName=Andrew Onischuk, emailAddress=null, #nullable active=true, avatarUris={48x48=https://www.gravatar.com/avatar/...}, groups=null, #nullable timezone=America/Los_Angeles #nullable }		true
assignee	Record	{ name=aonishuk, displayName=Andrew Onischuk, emailAddress=null, active=true, avatarUris={48x48=https://www.gravatar.com/avatar/...}, groups=null, timezone=America/Los_Angeles }
fields	array<record>	[{ 'id':'customfield_10005', 'name':'Epic Link', 'type': null, #string/nullable 'value': null #string/nullable },...]	Custom Fields
affectedVersions	array<string>	['NETTY-1.0']		true
fixVersions	array<string>	['NETTY-1.0-maint', 'NETTY-1.1']		true
components	array<string>	['NETTY-SERVER', 'NETTY-DOCS']
issueType	string	Improvement
isSubtask	boolean	false
creationDate	LogicalType timestamp	2016-12-21T23:21:42.000+02:00
updateDate	LogicalType timestamp	2016-12-21T23:21:42.000+02:00
dueDate	LogicalType timestamp	2016-12-30T23:21:42.000+02:00
attachments	array<record>	[{ 'filename': 'image.png', 'author': 'aonishuk', 'creationDate': '2016-12-30T23:21:42.000+02:00' 'size': 21454, 'mimeType': 'image/png', 'contentUri': 'http://.../image.png' }, ...]
comments	array<record>	[{ 'author': 'aonishuk', 'updateAuthor': 'aonishuk', 'creationDate': '2016-12-30T23:21:42.000+02:00', 'updateDate': '2016-12-30T23:21:42.000+02:00', 'body': 'actual comment contents' }, ...]
issueLinks	array<record>	[{ 'type': ''is blocked by', # inward 'link': https://issues.cask.co /rest/api/2/issueLink/97018' }, ...]		true
votes	int	3
worklog	array<record>	[{ 'author': 'aonishuk', 'updateAuthor': 'aonishuk', 'startDate': '2016-12-30T23:21:42.000+02:00', 'creationDate': '2016-12-30T23:21:42.000+02:00', 'updateDate': '2016-12-30T23:21:42.000+02:00', 'comment': 'actual comment contents', 'minutesSpent': 3600 }, ...]
watchers	int	0		true
isWatching	boolean	false		true
timeTracking	record	{ 'originalEstimateMinutes': 3600, # nullable 'remainingEstimateMinutes': 100, # nullable 'timeSpentMinutes', 3500 # nullable }		true
subtasks	array<record>	[{ 'key': 'NETTY-44' 'summary': 'Http connection is broken' 'issueType': 'BUG' 'status': 'Open' }, ...]		true
labels	array<string>	['urgent', 'ready_for_review']

Implementation and Parallellization

For implementation the JIRA API framework will be used. Here's a generic example of the code using it

https://ecosystem.atlassian.net/wiki/spaces/JRJC/pages/27164680/Tutorial

The framework allows to get count of records which are fetched by JQL query and also to fetch only records from certain point, let's say starting at 100th issue to 150th issue.

Which make it perfect for a parallellization.

A single MapReduce split will proccess maximum maxSplitSize issues. And transform method will transform them from Issue objects to structured records.

STEP 1. Generating splits:

- execute JQL query asking minimal set of fields to get count of issues (this is not done when maxSplitSize is 0)

- create splits according to maxSplitSize

STEP 2. RecordReader routine:

- return issues one by one from startingPosition to endPosition defined by current split

STEP 3. Transform method:

- transform Issue object into structuredRecord.

Jira Batch Source

[data-colorid=bhdzv9kcep]{color:#222222} html[data-color-mode=dark] [data-colorid=bhdzv9kcep]{color:#dddddd}General

Why no OAuth2 Authentication?

Structured Record Schema Structure

Implementation and Parallellization

General