Serialization format changes

Changes between serialization format versions 0,1 and 2

With release 1.0.0 (February 2020), significant improvements have been made to the serialization framework and formats of serialized data in order to overcome certain limitations that became apparent with the existing implementation. The documentation above already refers to the current (newer) version.

This section is briefly describing the intentions, the changes that were done and in particular how compatiblity with old serialization formats is/can be ensured (where needed). For users that do not deal with pre-existing installations or serialized data that was stored before this change, this section is of merely historical interest.

Limitations of the previous serialization format (referred to as serialization format version 0 now) occured mainly in conjunction with using class versioning. Different kinds of problems were manifest in binary serialization and in XML/JSON serialization:

Although the version information in serialized data should ensure that data can still be read even when the serialization of a class changes (e.g. when adding new members/properties), for binary serialization this assumption only held if a version number was already present before such a change. If, however, versioning should be added to a class that did not maintain a version before at all, this would disturb the deserialization, as in older serialized data there would be no version number in the place where it would be expected now, and in particular there was no robust way to recognize the absence of version information (binary data just being a stream of bytes with no identifying context information like e.g. tag names in XML etc.) This would have forced each class to e.g. declare a version(1) in its reflect() method in order to enable the declaration of a different version at any later point in time (and still be able to read older serialized data). This is impractical and was in general not done, most classes did not declare a version at all, and thus would be locked out from changing serialized content and declaring a different version in the future.

Another limitation was related to the XML and JSON serialization which stored version information in specified tags or elements, thus could trivially detect the presence or absence of version information and assume e.g. version "0" (or in general, older than any other known version) if no version information was found in serialized data. Here, the issue was the restriction to 1 version number per object. This could not be sufficient in some cases, e.g. when a class is implemented as a hierarchy of base and subclasses, which can independently implement their reflection and therefore might need to independently declare a version for themselves. This means the resulting object might need to declare multiple versions (one for each of the classes in its inheritance hierarchy). (As a side note, this was not a problem for binary serialization, as there, version numbers could appear at any position in the byte stream, any number of times, as long as serialization and deserialization would write/read them in sync).

Serialization format version 2 is meant to fix these issues and provide an implementation that is really flexible enough to allow for more or less arbitrary future changes of a class' serialization. (What happend to version 1? See below).

The main changes to the serialization format are two-fold: XML and JSON serialized data will store type info (class name) with each version entry, it can thus support independent versioning for an arbitrary number of base classes (or e.g. external helper functions). The binary serializer is aware of an object's internal segmentation (which data is created by one reflect() method), and will provide space for a version information, whether the class reflection actually declares a version or not. During deserialization, any call to version() within that reflect() retrieves the stored version (which is 0 if never set). This even works when changing the position where the version is queried (in relation to other serialized elements in the same reflection method), and additionally allows to warn when the class tries to set a version multiple times during serialization, or when it ignores an existing version (> 0) during deserialization.

I both cases, the implementation of version 2 will automatically recognize serialized data of version 0 format and correctly read it (in some cases showing warnings to e.g. update stored XML config files). However, data that is serialized by the newer implementation will not be deserializable by the version 0 implementation (XML/JSON may be, if no version information is used by any of the stored objects).

In order to allow to distinguish between serialized binary data of version 0 and 2 (and even newer versions, if required), the Serializer will now add a format version marker to the start of the serialized data (buffer). Obviously, version 0 will not have this version marker. In order to mark data that is recognized as version 0 persistently, a format version 1 has been defined. Version 1 only differs from version 0 in the existence of the format version marker (declaring serialization format version 1). Apart from that, format versions 0 and 1 are exactly identical. That also means version 0 can be 'converted' to version 1 by just adding the format version marker. Version 1 ist just an intermediate format that is used to distinctively mark version 0 as such when read by a version 2 (or possibly higher) framework, but it is not used natively by any framework.

For tapes recorded in format version 0, there is a tool to convert them to version 1 (which is, as explained, nothing else than just marking them as version 1):

miratape copy tapev0.tape tapev1.tape --addformatversion 1

The tool does not try to check the serialized content's format version, and it allows to set an arbitrary version to the created output, but there is no reasonable combination other than v0 input and v1 output.

If it is required to exchange serialized data between version 0 and 2 installations, serialization to older format can and must be explicitly enforced in the version 2 framework.

MIRA framework connection to a framework using v0 serialization format

A connection between frameworks of different serialization version can only actively be established by the new framework. In this case, the version 2 framework needs to encode all binary serialized data sent to the remote (version 0) framework (Channel data, RPC parameter and result values) in version 0 serialization format. To this purpose, the respective version parameter must be provided when adding the remote framework address to the list of known (connected) hosts. When commanding the connection via the '-k' command line option, this means adding '@v0' to the address (e.g. -k 192.168.1.1:1234@v0). When using the framework's 'connectTo' RPC method, there is a variant that takes this version as parameter. A limitation in case of such a connection is that no meta data will be sent with the binary data (there is no backward compatible meta serialization).

Globally force v0 serialization (Binary, XML and JSON)

It is also possible to globally force a framework to use a specific serialization format version for ALL serialization or deserialization. This can be achieved by setting environment variables MIRA_FORCE_SERIALIZE_VERSION/MIRA_FORCE_DESERIALIZE_VERSION to the version number, e.g. export MIRA_FORCE_SERIALIZE_VERSION=0. Note that for binary serialization, this setting is only checked in debug builds (to avoid the inherent performance hit in release builds). For XML/JSON serialization, this only affects the version information included with objects, here the setting is effective in both debug and release builds.

There is another environment variable MIRA_RPC_METHODS_REFLECT_NO_PARAMS_DOCUMENTATION that can be set (value does not matter) to suppress reflection of RPC parameter documentation. This is required to communicate with a framework before MIRABase 0.44.0 (older than MIRA release 2019-02-11), otherwise deserialization of binary serialized data will break (reflected content was extended and it was not possible in serialization format version 0 to just add an object version to facilitate that change, see initial motivation for change of serialization format).

 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines

Generated on 18 Feb 2020 for MIRA by  doxygen 1.6.1