Workspace 7.0.2
|
This OpenCL tutorial will step through a slightly more complicated example that populates an image using a Julia set fractal (http://mathworld.wolfram.com/JuliaSet.html). This time we'll show how to use an OpenCL kernel defined in a file, allowing modifications to be made without needing to recompile your plugin.
In this tutorial, you will learn:
This tutorial assumes you have successfully completed the first OpenCL tutorial Getting Started with OpenCL in Workspace.
As with the first tutorial, the first step is to create our new operation. We will call this operation "OpenCLJuliaSet" and its inputs and outputs needed when using the operation wizard are shown below. It is assumed this operation is being added to the plugin used in the first tutorial meaning the necessary OpenCL dependencies have already been taken care of in your plugins CMakeLists.txt
We will explain each of the inputs and outputs as we use them throughout the tutorial. With the basic operation created by the wizard we can now add the exciting stuff into the execute() method. In this tutorial we are going to create an operation that can run in both OpenCL and non-OpenCL mode, allowing us to compare the two methods. To start with we shall look at the traditional (non-OpenCL) implementation.
Our Julia set uses complex numbers so the first thing we need to do is define a struct to represent complex numbers. If complex numbers were going to be used by multiple operations this struct would be a prime candidate for turning into a new Workspace data type. For simplicity in this tutorial we'll simply define it as a local class at the top of the operation's cpp file.
At the start and end of our operation’s execute() method we’re going to add the following QTime code. This will allow us to track and trace how long our operation takes to execute so we can compare our OpenCL and non-OpenCL implementations at various output image sizes. The start of our execute method will also assign a new QImage of the correct size into our output so it’s ready to be populated. We use the dimension_
input as the size of the output image.
The pattern of the Julia set is determined by a complex number we’re calling juliaConstant. The value of this complex number is an input into the operation so users can modify it at runtime. To default to an interesting pattern we'll set the default values for dataJuliaReal_
and dataJuliaImaginary_
in our operation’s constructor as shown below. We'll also set a default size for the output image dimension.
We evaluate the value of the Julia set for each pixel of the output image using the code below in our execute() function
At this point you should be able to compile and run your operation. Drag your new OpenClJulaSet operation onto a blank Workspace canvas, click on the Image output and select Display with ImageWidget. Initialise its values using the Operation editor if they aren't there by default.
Run the workflow to produce an image like the one below. You should also see the timing output we added in the log window - Julia set calculated in 0.281s
. By increasing the Dimension
input you can see that the operation takes longer to run since there are more pixels to calculate.
Now we're going to extend our operation to calculate the same image using OpenCL. We will use the Use OpenCL
input on the operation to control which method we use (OpenCL or non-OpenCL) allowing the user to compare the timing. First lets setup our execute() method to use this conditional.
At this point we're ready to implement our OpenCL version of the Julia set calculation.
As with the first tutorial, the first step is to check there is at least one OpenCL device available on the system and to acquire the first available one.
This calculation does not require any input buffers so we only need to allocate a buffer to write the output image into. OpenCL supports the concept of image buffers which we could use in this example (cl::Image2D and cl::Image3D). Since we're only calculating true or false for each pixel, don't need any filtering and don't want to directly inter-op with OpenGL, we'll stick with standard cl::Buffer for simplicity. For the output buffer we use the CL_MEM_WRITE_ONLY flag to help OpenCL use the best memory.
Note the use of checkOpenCLResult() to check for any errors. This and getOpenCLErrorString() are defined in hpcdevicemanager.h
In the first OpenCL tutorial we defined the kernel as a string inside the operation. This means our operation needs to be recompiled whenever we wanted to change the kernel. Another option is take a file name as an input to the operation and load the kernel from file. This means the kernel can be changed at runtime without recompiling your plugin (as long as the host code stays the same). In this example we're going to define the kernel in a file called julisset.cl and pass that to the operation as an input. Before we look at the host code in our operation, lets look at the OpenCL kernel in juliaset.cl
With the kernel defined in our source file, lets look at the code we need in our operation to use it.
You'll notice it's very similar to what we used in the first OpenCL tutorial. The difference being we use openCLdevice.buildProgramFromFile() with the openCLSourceFileName
passed in as an input. Workspace will cache a copy of the compiled kernel for the given device. This means it will only need to recompile the kernel if the file time stamp changes or you're using a different device for any subsequent executes of your operation.
We’re now ready to execute the kernel which is done using enqueueNDRangeKernel on the command queue as shown below.
As with the first tutorial, we use enqueueMapBuffer() on the command queue to map the output buffer used by the kernel back to a pointer that we can access on the host. We’ve passed CL_TRUE as the blocking parameter for this call so that the method will only return once the kernel has finished executing and the results are ready. We can then use the pointer returned from this call to populate the output QImage. We use a different color for this image, qRgb(0,100,200), to highlight that this image was created with OpenCL.
Finally we need to unmap the output buffer as shown below. All other resources (such as the device buffers and kernel object) are automatically released because we’re using the OpenCL C++ Wrapper API. If using the straight C API a number of release calls need to be made manually.
You should now be able to compile your operation and test out the performance improvement from your OpenCL implementation of the Julia set calculation. You should be able to toggle the "Use OpenCL" input on the operation and see the image change from green to blue depending on the method used. As you increase the "Scale" input you should be able to see the vast improvement in execute time the OpenCL version has over the traditional C++ version in the log window.
That concludes the tutorial. We have now learned:
For further references and tutorials on OpenCL check out Khronos Group OpenCL Resources