Perform date transformations in Wrangler
This article is posted on the CDAP wiki and will be maintained there: Performing simple date transformations in Wrangler.
The Wrangler allows you to interactively and visually specify transformations on a sample of data. It provides a list of built-in directives that you can use to transform your data. Additionally, it also provides a set of directives such as set-column
, where you can invoke JEXL expressions. One of the most common and powerful use cases of this is to perform date manipulation. This article will describe some of the available transformations.
Before you begin
Make sure you have a Data Fusion instance.
Download this sample CSV dataset. This is only a sample. This guide will apply to any dataset that has a date column.
Uploading a sample
In the Pipeline Studio, go to Wrangler by clicking the main menu using the Hamburger icon on the top left, and choosing Wrangler.
Navigate to the connections view in Wrangler, and choose Upload.
Upload the file you downloaded above, and click the upload button. You should then see the file in your Wrangler workspace as below:
Parsing the data
Since this dataset is CSV, let’s parse it by clicking the directive dropdown, then choosing Parse > CSV, with delimiter as comma and selecting Set first row as header.
Click OK.
Since the data is now parsed into multiple columns, remove the body column by clicking the directive dropdown, then choosing Delete Column.
Transforming dates
Parsing the Transaction_date column as a date
Let’s parse the Transaction_date column as a date by clicking the directive dropdown in that column, then choosing Parse > Simple Date.
Specify a custom format as MM/dd/yy HH:mm, since that’s the format of the dates in this column.
In the Transformation steps column, you should see the transformation registered as parse-as-simple-date :Transaction_date MM/dd/yy HH:mm.
Also observe that the data type of the column has now changed from String to ZonedDateTime, which exposes a lot of utility functions on dates that we will use in the next section.
Getting components
You can get various components of the date such as Month, Day, Year, Minute, Second, and Hour by applying functions such as:
set-column :month Transaction_date.getMonth()
set-column :year Transaction_date.getYear()
set-column :day_of_week Transaction_date.getDayOfWeek()
set-column :day_of_month Transaction_date.getDayOfMonth()
set-column :day_of_year Transaction_date.getDayOfYear()
set-column :minute Transaction_date.getMinute()
set-column :hour Transaction_date.getHour()
set-column :second Transaction_date.getSecond()
set-column :zone Transaction_date.getZone()
Adding time
You can add various units of time to the date by using the directives such as:
set-column :add_days Transaction_date.plusDays(1)
set-column :add_hours Transaction_date.plusHours(200)
set-column :add_minutes Transaction_date.plusMinutes(3600)
set-column :add_months Transaction_date.plusMonths(3)
set-column :add_weeks Transaction_date.plusWeeks(2)
set-column :add_years Transaction_date.plusYears(2)
set-column :add_seconds Transaction_date.plusSeconds(36000)
set-column :add_nonoseconds Transaction_date.plusNanos(3600000)
Subtracting time
You can subtract various units of time to the date by using the directives such as:
set-column :subtract_days Transaction_date.minusDays(1)
set-column :subtract_hours Transaction_date.minusHours(200)
set-column :subtract_minutes Transaction_date.minusMinutes(3600)
set-column :subtract_months Transaction_date.minusMonths(3)
set-column :subtract_weeks Transaction_date.minusWeeks(2)
set-column :subtract_years Transaction_date.minusYears(2)
set-column :subtract_seconds Transaction_date.minusSeconds(36000)
set-column :subtract_nonoseconds Transaction_date.minusNanos(3600000)
Related articles
Reference of all available date manipulation functions: ZonedDateTime.