Workspace 6.21.5
Serialization

The following are suggested pre-reading for this tutorial:


Tutorial contents


Files used in this tutorial


Tutorial goals

Some applications load and save workflows from/to files. These files are formatted as ordinary XML files and include information such as the operations and connections in a workflow, and the value of each operation input and view-related information, if available. Converting a workflow to or from this XML form is called serialization. Workspace provides easy-to-use serialization capabilities and it handles most of the work for you. For your own custom data types, however, you need to provide a couple of the pieces so that your data type can be serialized.

There are two general approaches to enabling serialization of your own custom data types. The first uses the standard streaming capabilities that are part of the C++ language and which don't require you to modify the class itself. The second approach requires you to add a base class to your custom data type and to implement three functions. This tutorial shows you how to do both methods and compares the strengths and weaknesses of both approaches. A third method exists using template specialization, but that is a somewhat advanced technique and is beyond the scope of this tutorial.

It is also worth mentioning that serialization is used in other parts of Workspace functionality. For instance, the undo/redo framework and the cut-and-paste functionality of the Workspace editor application uses serialization to load and save the workflow items being edited. There may be other novel uses of serialization in the future too, so where possible you should endeavour to make your custom data types serializable.


Serializing by providing streaming operators

Let us suppose that we want to add serialization capabilities to MyClass from the Writing a Simple Workspace Plugin tutorial. We will call it MyClassStreamed here to avoid confusion later. We can do so by adding C++ stream operators as shown in the following header file for the class:

#ifndef CSIRO_MYNAMESPACE_MYCLASSSTREAMED_H
#define CSIRO_MYNAMESPACE_MYCLASSSTREAMED_H
#include <iosfwd>
#include "serializeplugin_api.h"
namespace CSIRO
{
namespace MyNamespace
{
class CSIRO_SERIALIZEPLUGIN_API MyClassStreamed
{
int value1_;
int value2_;
public:
MyClassStreamed() :
value1_(0), value2_(0) {}
void setValue1(int i) { value1_ = i; }
int getValue1() const { return value1_; }
void setValue2(int i) { value2_ = i; }
int getValue2() const { return value2_; }
};
// Declare streaming operators
std::ostream& operator<<(std::ostream& os, const MyClassStreamed& myClass);
std::istream& operator>>(std::istream& is, MyClassStreamed& myClass);
} // namespace MyNamespace
} // namespace CSIRO
DECLARE_WORKSPACE_DATA_FACTORY(CSIRO::MyNamespace::MyClassStreamed, CSIRO_SERIALIZEPLUGIN_API)
#endif
#define DECLARE_WORKSPACE_DATA_FACTORY(T, WORKSPACE_EXPORT_SYMBOL)
Definition: datafactorytraits.h:759
Top level namespace for all Workspace code.
Definition: applicationsupportplugin.cpp:32
double i
Definition: opencljuliaset.cpp:45
std::ostream & operator<<(std::ostream &os, const QString &t)
Definition: streamqstring.h:37
std::istream & operator>>(std::istream &is, feature_parser &feature)
Definition: wsscheduler_mongodb.cpp:14

The C++ language defines the insertion and extraction stream operators so that you can easily serialize your own data types along with those provided by the language. If you define these for your own data type, Workspace will detect them and use them for serialization without you having to do anything else. This is often the simplest approach if your custom data type is very basic and has only a small amount of data to serialize. The header which declares the std::ostream and std::istream classes in the above is iosfwd. The implementation of the two streaming operators for the class example above is as follows:

#include <istream>
#include <ostream>
#include "serializeplugin.h"
namespace CSIRO
{
namespace MyNamespace
{
std::ostream& operator<<(std::ostream& os, const MyClassStreamed& myClass)
{
os << myClass.getValue1() << " " << myClass.getValue2();
return os;
}
std::istream& operator>>(std::istream& is, MyClassStreamed& myClass)
{
int value1;
int value2;
is >> value1 >> value2;
myClass.setValue1(value1);
myClass.setValue2(value2);
return is;
}
} // namespace MyNamespace
} // namespace CSIRO
DEFINE_WORKSPACE_DATA_FACTORY(CSIRO::MyNamespace::MyClassStreamed,
#define DEFINE_WORKSPACE_DATA_FACTORY(T, P)
Definition: typeddatafactory.h:1426
CSIRO::MyNamespace::SerializePlugin::getInstance())

Note that the above implementation performs no error checking on the streams. In practice, you should check that the data being supplied is of the correct format and verify the state of the streams, etc.


Serializing by inheriting the Serialize base class

A second approach to serialization is to make your custom data type inherit from the Serialize class. You then need to implement three functions to support serialization for the class. The class declaration would look something like the following:

#ifndef CSIRO_MYNAMESPACE_MYCLASSINHERIT_H
#define CSIRO_MYNAMESPACE_MYCLASSINHERIT_H
#include "serializeplugin_api.h"
namespace CSIRO
{
namespace MyNamespace
{
class CSIRO_SERIALIZEPLUGIN_API MyClassInherit : public DataExecution::Serialize
{
int value1_;
int value2_;
public:
MyClassInherit() :
value1_(0), value2_(0) {}
void setValue1(int i) { value1_ = i; }
int getValue1() const { return value1_; }
void setValue2(int i) { value2_ = i; }
int getValue2() const { return value2_; }
// Reimplemented functions from the Serialize base class
bool canSerialize() const override;
bool save(DataExecution::SerializedItem& item) const override;
bool load(const DataExecution::SerializedItem& item) override;
};
} // namespace MyNamespace
} // namespace CSIRO
DECLARE_WORKSPACE_DATA_FACTORY(CSIRO::MyNamespace::MyClassInherit, CSIRO_SERIALIZEPLUGIN_API)
#endif

The two changes to the class are to add the base class:

class CSIRO_SERIALIZEPLUGIN_API MyClassInherit : public DataExecution::Serialize

and to add the three required functions:

bool canSerialize() const override;
bool save(DataExecution::SerializedItem& item) const override;
bool load(const DataExecution::SerializedItem& item) override;

An implementation of the three functions could be something like the following:

#include "serializeplugin.h"
namespace CSIRO
{
namespace MyNamespace
{
bool MyClassInherit::canSerialize() const
{
return true;
}
bool MyClassInherit::save(DataExecution::SerializedItem& item) const
{
// Store the data in attributes because it is less sensitive to
// formatting errors. This is unsuitable for large blocks of data
// or where there are lots of things to save.
item.setAttribute("value1", value1_);
item.setAttribute("value2", value2_);
return true;
}
bool MyClassInherit::load(const DataExecution::SerializedItem& item)
{
// This gets the two values and returns true if successful. If either
// attribute is missing, the return value will be false.
return item.getAttribute("value1", value1_) &&
item.getAttribute("value2", value2_);
}
} // namespace MyNamespace
} // namespace CSIRO
DEFINE_WORKSPACE_DATA_FACTORY(CSIRO::MyNamespace::MyClassInherit,
CSIRO::MyNamespace::SerializePlugin::getInstance())

This implementation uses the getAttribute() and setAttribute() functions of the SerializedItem class. These make it easy to robustly serialize any number of built in data types that the class uses. The getAttribute() function returns true if the named attribute was present and able to be converted to the data type of the second parameter. In such cases, the value is stored in the second parameter. If getAttribute() is unable to do this for some reason (eg the attribute is missing or of the wrong data type), it returns false.

An alternative would be to use the setText() and getText() functions to convert the class into a string form, but this is less robust. Another possibility if your class is a bit more complicated might be to add children to the serialized item passed as the parameter to save() and load(). Normally though, there are better ways to define your class in those circumstances such that Workspace does all the child handling for you. This will be covered in the More about object groups tutorial.

In the scenario where your class has streaming operators and it is also derived from Serialize , Workspace favours implementing serialization through the Serialize base class because this is likely to be more robust.


Serializing an enum type

Because an enum is essentially just a special case of an int, Workspace provides some special handling for enum types when it comes to serialization. In particular, when you define a data factory for an enum, the templates detect that the custom type is an enum and require that you provide an explicit template specialization for one function. This might sound complicated, but in reality it is very straightforward. The best way to illustrate how this works is with an example. Let's say you have a class which defines an enum called MyEnum and your definition for it in the header file looks something like this (this is a stripped down example, but you will get the idea):

#include "serializeplugin_api.h"
namespace CSIRO
{
namespace MyNamespace
{
class CSIRO_SERIALIZEPLUGIN_API MyClass
{
public:
enum MyEnum
{
SomeItem,
AnotherItem
AndAnotherItem
};
// Constructor and other functions omitted for clarity.....
};
}}

You want the MyClass::MyEnum enum to be made available as a Workspace data type, which you do with the usual DECLARE_WORKSPACE_DATA_FACTORY macro. Because MyEnum is an enum type, however, we also have to provide a template specialization for getEnumNames() and this specialization must appear before the DECLARE_WORKSPACE_DATA_FACTORY macro. Note also that this function template specialization must be in the CSIRO::DataExecution namespace:

namespace CSIRO
{
namespace DataExecution
{
template<> void DataFactoryTraits<CSIRO::MyNamespace::MyClass::MyEnum>::getEnumNames(QStringList& names)
{
names.push_back("Some item");
names.push_back("Another item");
names.push_back("And another item");
}
}}
DECLARE_WORKSPACE_DATA_FACTORY(CSIRO::MyNamespace::MyClass::MyEnum, CSIRO_SERIALIZEPLUGIN_API)
DECLARE_WORKSPACE_ENUMTOINTADAPTOR(CSIRO::MyNamespace::MyClass::MyEnum, CSIRO_SERIALIZEPLUGIN_API)
#define DECLARE_WORKSPACE_ENUMTOINTADAPTOR(T, WORKSPACE_EXPORT_SYMBOL)
Definition: enumtointadaptor.h:183
QStringList
Definition: vectornumbertostringlistadaptor.cpp:133

When adapting the above example code to your particular enum type, all you need to change is the name of the enum type (CSIRO::MyNamespace::MyClass::MyEnum in the above example) and then the lines inside the function which define the names for each of the enum values. These names do not have to be identical to the names used in the enum itself, but they would normally be similar and should not be excessively long. As a guide, consider that these names will appear in combo boxes in GUI applications, so very long names will be truncated.

Note that in the above, there is also a DECLARE_WORKSPACE_ENUMTOINTADAPTOR macro. This is optional, but by including it Workspace will be able to automatically convert between integers and your enum type when connecting inputs and outputs. Including this optional macro is highly recommended. If using it, you need to include the enumtointadaptor.h header, as shown above.

To complete the support for your enum type, you simply need to put the matching DEFINE_WORKSPACE_DATA_FACTORY macro in your implementation file as you would for any other workspace data type. If you included the optional DECLARE_WORKSPACE_ENUMTOINTADAPTOR macro in your header, then you also need to include the matching DEFINE_WORKSPACE_ENUMTOINTADAPTOR macro in your implementation file. The relevant lines at the end of the implementation file for the above header might look something like the following:

// Include whatever you normally would here for the rest of MyClass
// ....
DEFINE_WORKSPACE_DATA_FACTORY(CSIRO::MyNamespace::MyClass::MyEnum,
CSIRO::MyNamespace::MyPlugin::getInstance())
DEFINE_WORKSPACE_ENUMTOINTADAPTOR(CSIRO::MyNamespace::MyClass::MyEnum,
CSIRO::MyNamespace::MyPlugin::getInstance())
#define DEFINE_WORKSPACE_ENUMTOINTADAPTOR(T, P)
Definition: enumtointadaptor.h:143

By following the above pattern for your own enum types, the macros will automatically take care of serializing, cloning, etc. without you having to provide code for those things. A cautionary note is in order though if your enum type does not have sequentially numbered values beginning from zero. The default serialization code provided for enum types assumes sequential numbering beginning from zero, so if your enum type does not satisfy this assumption, you will need to override the default serialization selection mechanisms. The easiest way to do this is to provide streaming operators for the enum type, since these are selected ahead of the automatic enum handling. Alternatively, you can explicitly specialize the save() and load() functions in TypedDataFactory .


Selecting the best serialization strategy

If your class already has C++ insertion and extraction streaming operators defined, then you are probably best to go with that strategy. You already have everything you need in order to support serialization. If not, then which strategy you should use essentially comes down to whether or not you are able to modify the class. If you cannot modify the class, then the only simple choice you have is to define insertion and extraction streaming operators. If, however, you are able to modify the class, then making it inherit from the Serialize base class will usually give you a more robust serialization, since you can make use of the functions in the SerializedItem class to load and save individual items of data without ever having to parse or construct a string.

If your custom type is an enum and its values are all sequentially numbered from zero, you don't need to do anything to get serialization support except provide the getEnumNames() specialization (you will have to do this whether you want serialization support or not). It is advisable to use this default handling if your enum type satisfies these criteria.


Summary of important points

This tutorial has introduced you to the main ways of adding serialization capabilities to your custom data types. The main points to remember are the following:

  • To add serialization support for classes that you cannot modify, define insertion and extraction streaming operators.
  • To add serialization support for classes that you can modify, add Serialize as a base class and implement the canSerialize(), save() and load() functions.
  • When inheriting from Serialize , prefer to use setAttribute() and getAttribute() instead of constructing or parsing a string containing all the data at once.
  • Serialization support is mostly pre-implemented for enum types, provided that the enum values are sequentially numbered from zero. You just need to provide an explicit specialization of the CSIRO::DataExecution::DataFactoryTraits::getEnumNames() function.