Workspace 6.21.5
Batch execution

Introduction

Workflows can be used in a variety of different ways. We can interact with them via the Workspace editor (as we have done in the previous tutorials), embed them in custom applications (beyond the scope of these tutorials), or run them as a stand-alone process from the command-line. In this tutorial, we are going to learn how to do the latter by using the workspace-batch utility. Using this utility, we can run workflows as though they are command-line utilities, allowing us to run them on remote machines, schedule them as jobs, or share them to others who are unfamiliar with Workspace. By the end of the tutorial, you will:

  • Understand how to run the workspace-batch utility
  • Understand how to assign values to inputs using the workspace-batch utility
  • Understand what a global name is and how it can be useful

Just in case you get stuck, a sample workflow has been provided for you.

Note
This tutorial builds upon the workflow constructed in the Modifying and writing data tutorial. If you have not completed this tutorial, you can use this sample workflow.

Contents


Creating the workflow

As stated above, we're going to begin by opening the workflow we created in the Modifying and writing data tutorial:

  1. Open up the workflow using the Workspace editor
  2. Make sure it executes successfully by clicking the execute button
The workflow we're going to execute

This workflow takes a CSV file, calculates some values for a new column, writes out the new CSV data, as well as an image of a chart of the data.

Running the workflow from the command line

Now we know we have a workflow that we can run in one step. Let's learn how to run it from the command line:

Note
It is important that prior to performing the upcoming steps, that the Workspace bin directory is added to your PATH environment variable. For example, if the workspace-batch executable is located somewhere such as:
  • /opt/csiro.au/workspace/bin/workspace-batch (Linux) or
  • C:\Program Files\csiro.au\workspace\bin\workspace-batch.exe (Windows) or
  • /usr/local/csiro.au/workspace/bin/workspace-batch (OSX)
then the directory:
  • /opt/csiro.au/workspace/bin (Linux) or
  • C:\Program Files\csiro.au\workspace\bin (Windows. Note: do not use double-quotes to surround the path)
  • /usr/local/csiro.au/workspace/bin (OSX)
should be added to the path respectively. If you do not wish to add these values to your PATH environment variable, then you will need to execute the workspace-batch commands listed below using the full path to workspace-batch, such as:
C:\Program Files\csiro.au\workspace\bin\workspace-batch.exe.
  1. Open up a terminal window on your machine.
    • Under Windows, do this by clicking Start, Run, then typing in cmd and clicking run.
    • Under Linux, load up your favourite terminal app. You probably already have one open.
    • On the Mac, launch the Terminal application from the Applications / Utilities folder in the dock.
  2. Enter the command
    workspace-batch --help
    
  3. You will see something like the following output displayed:
    Usage: workspace-batch.exe [OPTIONS] [workspaceFilename]
    
     --help                         Show help summary
     --version                      Show version information
     --no-plugins                   Don't load plug-ins
     --no-user-plugins              Only load the default plugins that ship with Workspace
     --no-settings                  Don't load user settings
     --no-rendering-plugin          Don't load the rendering plugin
     --config [filename]            Specify configuration filename
     --globalName [globalName] [value]
                                    Specify an initial value for a global name in batch mode
     --globalNameFile [globalNameFile]
                                    Specify initial values for global name from a global name variable file, file could be exported in Workspace editor by: 'File'->'Export global names'
     --input [inputName] [value]    Specify an initial value for a top level workspace input in batch mode
     --maxThreads [maxThreads]      Specify the maximum number of local threads to use
     --runAsServer                  Presence of this switch indicates the batch utility should execute in server mode
     --saveOnExit                   Specify whether to save the workspace to file post execution
     --reportWorkspaceProgress      Specify whether to report the internal workspace progress messages
     --provenance [reportingLevel]  Specify whether to capture and report provenance (basic|external|internal)
     --usedPluginsOnly              Specify whether to load used plugins only
     --requiredFeature [requiredFeature]
                                    Specify a feature required of a scheduler before it can be used to queue a workflow.
     --exitWithProcess [processId]  Specify whether to exit once a given process exits.
  4. This shows us how to run the workspace-batch utility. Since our workspace is already set-up to run without any additional settings, we don't need to provide any arguments just yet. On the command line, navigate to the directory containing your workspace. For example, if my workspace is located at ~/Documents/sample_batch.wsx, I would enter the following on the command line:
    cd ~/Documents
    
  5. Now that we're in the same directory as our workspace, we can run it using the following command (replace "sample_batch.wsx" with the name of your workspace if necessary):
    workspace-batch sample_batch.wsx
    
    The output will look something like the following:
    Starting workspace-batch on Thu Aug 17 17:37:58 2017
    Adding Built-in version 4.3.0
    Searching for Workspace plugin directories according to WORKSPACE_INSTALL_AREAS environment and C:\CSIRO\WorkspaceDocs\Workspace_reldeb\Install\installAreas
    Searching for Workspace plugins (4.3.0.MSVC1900.x64.2) in:
    C:\CSIRO\WorkspaceDocs\Workspace_reldeb\Install\lib\Plugins
    Adding Application Support version 4.3.0
    Adding Data analysis version 4.3.0
    Adding HDF5Plugin version 4.3.0
    Adding Mesh version 4.3.0
    Adding Package version 4.3.0
    Adding Python35 version 4.3.0
    Adding C:\CSIRO\WorkspaceDocs\Workspace\Tools\ThirdParty\Python35 to PYTHONPATH
    Adding Rendering version 4.3.0
    Adding Parallel & remote execution version 4.3.0
    Adding Help implementation version 4.3.0
    Loading of workspace plugins complete
    Running workspace file C:/CSIRO/WorkspaceDocs/Workspace/Source/Workspace/Examples/sample_batch.wsx from working directory C:/CSIRO/WorkspaceDocs/Workspace_reldeb/Install/bin
    Starting workspace update
    Image of size 1024x768 written to file ws:outputimage.png
    Workspace update completed
    Execution time = 11.303 seconds
    Shutting down application
    
    This shows us that our workflow has run successfully. If we view the contents of the directory, we will see two new files: outputimage.png and newcsvfile.csv. We can open these files to view the results, for example:
    The 1024x768 output image
Note
As the workflow writes two new files into the current directory, you must have write access to this directory. By default, the sample files that ship with Workspace may be installed in a directory where users do not have writer access, if you are using the sample files that ship with Workspace directly then you may need to copy them to another location.

Setting input values from the command line

Now that we've successfully executed our workflow from the command line, we need to learn how to change inputs from the command line. This will allow us to pass data into the workflow in the form of command line arguments. In this example, we're going to specify the size of the chart image that we want.

  1. Using the Workspace editor, open up the workflow for editing.
  2. Find the operation named Render Chart to Image (of type ChartToImage) on the canvas and display its inputs:
    The ChartToImage operation
  3. Right click on the Width input and in the provided context menu, select Assign global name.
    Edit the global name
    A global name is a name unique to an entire workflow - including any nested workflows it contains. Because the names are globally unique, the values of inputs that have global names assigned to them can be easily set by external utilities, such as custom user interfaces or the workspace-batch utility.
  4. In the dialog displayed, enter the text "width". This assigns the global name "width" to this input.
    Editing the global name
  5. Assign the global name "height" to the Height input of the ChartToImage operation.
  6. You can see that an input has a global name by the G icon on it:
    Identifying an input the global name
  7. You can also look in the global name table to see all global names in your workflow:
    Global Name table
  8. Save the workflow

Running the workflow from the command line with global name input values

We've now successfully exposed two inputs as part of our Workspace's "global interface". We're now going to use the workspace-batch utility to assign values to these globally exposed inputs.

  1. From the command-line, execute the following command (replace "sample_batch.wsx" with the name of your workflow):
    workspace-batch --globalName width 400 --globalName height 300 sample_batch.wsx
    
  2. The following will be printed to the command line:
    Starting workspace-batch on Thu Aug 17 17:51:00 2017
    Adding Built-in version 4.3.0
    Searching for Workspace plugin directories according to WORKSPACE_INSTALL_AREAS environment and C:\CSIRO\WorkspaceDocs\Workspace_reldeb\Install\installAreas
    Searching for Workspace plugins (4.3.0.MSVC1900.x64.2) in:
    C:\CSIRO\WorkspaceDocs\Workspace_reldeb\Install\lib\Plugins
    Adding Application Support version 4.3.0
    Adding Data analysis version 4.3.0
    Adding HDF5Plugin version 4.3.0
    Adding Mesh version 4.3.0
    Adding Package version 4.3.0
    Adding Python35 version 4.3.0
    Adding C:\CSIRO\WorkspaceDocs\Workspace\Tools\ThirdParty\Python35 to PYTHONPATH
    Adding Rendering version 4.3.0
    Adding Parallel & remote execution version 4.3.0
    Adding Help implementation version 4.3.0
    Loading of workspace plugins complete
    Running workspace file C:/CSIRO/WorkspaceDocs/Workspace/Source/Workspace/Examples/sample_batch.wsx from working directory C:/CSIRO/WorkspaceDocs/Workspace_reldeb/Install/bin
    Update SQL table with DataSeries.
    Set global name 'height' to '300' from command line
    Set global name 'width' to '400' from command line
    Starting workspace update
    Image of size 400x300 written to file ws:outputimage.png
    Workspace update completed
    Execution time = 11.533 seconds
    Shutting down application
    
  3. Note how the image size is now 400x300 rather than 1024x768. If you open the outputimage.png file in a viewer, it should look like this:
    The 400x300 output image

Nicely done. You've now learned how to modify input values via the workspace-batch.


Summary

This concludes the tutorial on batch execution. You should now know how to:

  • Execute a workflow from the command line
  • Assign global names to inputs and set them from the command line

A sample workflow for this tutorial can be found here.