Step 6: Orchestrate with jobs

Try Pentaho Data Integration and Analytics

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA000-11
Jobs are used to coordinate ETL activities such as:
  • Defining the flow and dependencies that control the linear order for the transformations to run.
  • Preparing for execution by checking conditions such as, "Is my source file available?" or "Does a table exist?"
  • Performing bulk load database operations.
  • Assisting file management, such as posting or retrieving files using FTP, copying files, and deleting files.
  • Sending success or failure notifications through email.

For this part of the tutorial, imagine that an external system is responsible for placing your sales_data.csv input in its source location every Saturday night at 9 p.m. You want to create a job that will verify that the file has arrived and then run the transformation to load the records into the database. In a subsequent exercise, you will schedule the job to run every Sunday morning at 9 a.m.

The following steps assume that you have built a Getting Started transformation as described in Step 1: Extract and load data of the tutorial.

  1. Go to File > New > Job.

    PDI job window
  2. Expand the General folder and drag a Start job entry onto the canvas.
    The Start job entry defines where the execution will begin.
    Note: Jobs run in a sequential order of steps and transformations can run in a parallel order of steps.
  3. Expand the Conditions folder and add a File Exists job entry.
  4. Draw a hop from the Start job entry to the File Exists job entry.

    Draw hop from Start to File exists
  5. Double-click the File Exists job entry to open its properties dialog box. Click Browse and set the filter near the bottom of the window to All Files. Select the sales_data.csv from the following directory: ...\design-tools\data-integration\samples\transformations\files.
  6. Click OK to exit the Open File window.
  7. Click OK to exit the Check if a file exists window.
  8. Expand the General folder and add a Transformation job entry.
  9. Draw a hop between the File Exists and the Transformation job entries.
  10. Double-click the Transformation job entry to open its properties dialog box.
  11. Click Browse to open the Select repository object window. Browse to and select the Getting Started transformation.
  12. Click OK to close the Transformation window.
  13. Save your job as Sample Job.
  14. Click Run icon in the toolbar. When the Run Options window appears, select Local environment type and click Run. The Execution Results panel should open showing you the job metrics and log information for the job execution.

    Job sample