serialization

Qt and ZIP files

Submitted by mimec on 2012-11-20

This is a follow up to the series of articles about serialization in Qt that I wrote a few months ago. Sometimes it's necessary to create complex files which contain lots of various information. Serializing everything into one stream of data is not always a good idea. Also sometimes it may be a good idea to compress serialized data, because it's usually quite verbose. Instead of coming up with a custom file format, the best idea is to use the solution which is implemented in may applications and document formats, including both OpenOffice and MS Office documents; that is to wrap data in a ZIP file.

There are many advantages of using the ZIP format. We can save data in multiple smaller files which are zipped together. We can include additional files and attachments, for example JPG images. Finally, we can compress selected files to make the resulting file more compact.

There are a few libraries for Qt which handle ZIP files, including OSDaB and QuaZIP. There are also the QZipReader and QZipWriter classes which are part of Qt. They are internal classes that you can find in src/gui/text subdirectory in Qt sources. There are plans to include them in official Qt API (see QTBUG-20896), but they will not make it into Qt 5.0 and it's not clear whether they will be included in later versions. However, if the license permits, you can simply copy them into your own application (but see the notes about static linking below).

A problem for Windows developers is that all these libraries and classes depend on zlib.h. This is fine on Linux, however zlib is (unfortunately) not part of Windows API. The good news is that QtCore not only includes zlib statically when built on Windows, but it also marks its functions as DLL exports. Thanks to this, the QZipReader and QZipWriter classes, which are part of QtGui, can still work on Windows. In my applications, including Descend which uses ZIP file format for its projects, I simply include the zlib.h and zconf.h files from Qt in the source package and use them when system zlib library is not available. Then I use the following simple .pri file to include either system zlib headers or the custom ones:

contains( QT_CONFIG, system-zlib ) {
  if( unix|win32-g++* ): LIBS += -lz
  else: LIBS += zdll.lib
} else {
  INCLUDEPATH += $$PWD
}

Thanks to this, #include "zlib.h" works no matter if zlib is a system library or not. When Qt is built without system-zlib (which is usually the case on Windows), it will include all the necessary exports in QtCore, so the application will link and work correctly.

Including internal Qt classes as part of the application is potentially dangerous, because there can be conflicts between the classes provided by Qt and by the application itself, especially when the application is linked with a different version of Qt. This is even more dangerous when the application is statically linked with the Qt libraries, which I usually do in Windows release builds to prevent DLL dependencies problems.

To avoid problems, I always remove the 'Q' prefix from such classes. This way they are seen as separate entities from the ones provided by Qt. However, I encountered a strange problem with Descend. In dynamically linked debug builds it worked fine, but when compiled in static release mode, it crashed when copying text to the clipboard. At first I suspected a strange bug in Qt or the compiler itself, and the problem was hard to debug because it only happened in release builds. However, after some analysis, I discovered that QPlainTextEdit copies text into the clipboard not only as plain text, but also in HTML and ODT formats. Coincidentally, creating an ODT document is exactly what Qt uses the internal QZipWriter class for...

It finally turned out that there was an innocent looking structure in qzip.cpp called FileHeader, which I slightly modified, but I didn't rename it. A plain structure would be fine, but this one had an implicit constructor and destructor, because it contained a QByteArray member. Unfortunately the linker doesn't detects such conflicts (it would not be possible anyway, because of how C++ works), but happily overrides the symbols from Qt library with identically named symbols defined in my application.

This can lead to unexpected problems not only when copying code from Qt, but also in many other situations. After all, many libraries can include a class named FileHeader. That's why static linking must be used carefully. Qt generally uses the Q prefix everywhere, even in private classes and functions, to prevent this type of problems, but this particular one didn't have it. If you ever experience mysterious crashes in release builds, this can be one of the possible reasons.

Serialization in Qt - part 4

Submitted by mimec on 2012-08-22

I the last few posts I wrote about serializing data in an extensible and effective binary format using QDataStream. So far it was focused on value types and simple structures that can be easily converted to QVariant and QVariantMap. In the last post I mentioned that creating objects dynamically based on class name requires implementing some kind of an object factory. Now let's analyse what is needed to serialize an entire hierarchy of abstract objects that refer to one another. Note that this is a very complex topic and there is no single, universal solution, so I won't provide the full code. Instead I will discuss what is necessary to craft such solution depending on the exact requirements.

Let's assume that we're serializing a project which consists of shapes of various types - circles, squares, etc. There are also some complex shapes, like groups or layers, which consist of other shapes. The first difficulty is that shape is an abstract type, so we need to store the actual class name along with the object data in order to be able to re-create the object upon deserialization. This was more or less covered in the last post.

Another difficulty is that objects refer to one another by pointers, forming a graph of relations, in which one object may be accessed from many other objects. We need to ensure that the object is only serialized and deserialized once, and all other references must be correctly maintained. There even can be cyclic dependencies; for example a parent object can have a pointer to a child, and the child can have a pointer to the parent.

For sake of simplicity I will assume that each serializable class inherits QObject; that's not really necessary, but having a single common base class makes things easier. The class should also be registered in the object factory discussed before. Finally, it should implement the following interface, which provides methods for serializing and deserializing the object:

class Serializable
{
public:
    virtual void serialize( QVariantMap& data, SerializationContext* context ) const = 0;
    virtual void deserialize( const QVariantMap& data, SerializationContext* context ) = 0;
};

The data is stored in a QVariantMap for reasons that were also discussed in one of the previous articles, so that the file format is extensible and backward compatible. The context object is responsible for performing the serialization and deserialization. We will get to it in a moment.

Note that the Serializable class could be an abstract class which inherits QObject. All concrete classes could then inherit it and implement the serialization methods. However, in this case it wouldn't be possible to add serialization capabilities to existing subclasses of QObject, for example widgets. Using a separate interface gives us more flexibility. Although multiple inheritance in C++ is a very complex subject, it's very common in most object oriented languages for a class to inherit behavior and implementation from a single base class, and implement a number of additional interfaces.

An incomplete example of a serializable class containing a pointer might look like this:

class Shape : public QObject, public Serializable
{
    Q_OBJECT
public:
    void serialize( QVariantMap& data, SerializationContext* context ) const
    {
        data[ "Name" ] << m_name;
        data[ "Other" ] = context->serialize( m_other );
    }

    void deserialize( const QVariantMap& data, SerializationContext* context )
    {
        data[ "Name" ] >> m_name;
        m_other = context->deserialize<Shape>( data[ "Other" ] );
    }

private:
    QString m_name;
    Shape* m_other;
};

The serialize method of the SerializationContext first checks if the given object was already serialized. If not, it appends it to the internal list of objects and calls the serialize method on this object to store its data in a QVariantMap. Then it returns a handle to the object, which is a QVariant. Internally it contains an integer value identifying the object in the given context.

The deserialize method checks if the object with the given handle was already deserialized. If not, it creates a new instance of the appropriate class using the object factory and calls the deserialize method. Note that the object is not necessarily a Shape; it might actually be a subclass of it like Square or Circle.

Note that the context doesn't actually read or write any data from/to a stream. Instead, it stores a list of records, which include the pointer to the object, its class name and serialized data. So the handle is simply the position of the object in the list. The entire context can be written into the stream once all objects are serialized. Conversely, when deserializing, the context is first read from the stream, and then individual objects are deserialized.

We may serialize as many objects as we need using the same context, but we need to store the handles in the stream along with the context data, because we will need them when deserializing. Alternatively, we may serialize an entire hierarchy of objects by serializing the "root" object and ensuring that all children are serialized recursively:

QDataStream stream;
SerializationContext contex;

context.serialize( root );

stream << context;

In that case we don't need to store the handle, because we know that the handle of the first serialized object is always integer zero (not to be confused with invalid variant, which represents a NULL pointer):

QDataStream stream;
SerializationContext contex;

stream >> context;

Shape* root = context.deserialize<Shape>( QVariant::fromValue<int>( 0 ) );

So what does the SerializationContext class look like? This is an incomplete definition:

class SerializationContext
{
public:
    template<typename T>
    QVariant serialize( T* ptr );

    template<typename T>
    T* deserialize( const QVariant& handle );

    friend QDataStream& operator <<( QDataStream& stream, const SerializationContext& context );
    friend QDataStream& operator >>( QDataStream& stream, SerializationContext& context );

private:
    struct Record
    {
        QObject* m_object;
        QByteArray m_type;
        QVariantMap m_data;
    };

private:
    QList<Record> m_records;
    QHash<QObject*, int> m_map;
};

The serialize and deserialize methods are discussed below. The shift operators make it possible to read and write the entire context from/to the stream. The list of records stores information about objects, including their type and data. The map is optional; it simply makes lookup slightly faster for a large number of objects.

You can notice that both the serialize and deserialize methods are templates. Why not simply cast everything to void*? Also why the record and the map stores a QObject*, instead of a void*?

This is because of how multiple inheritance works in C++. Let's assume that you have a pointer to a Shape object. When you cast it to QObject*, and to Serializable*, you will receive two different pointers, that may be different from the original one. That's because in memory, the Shape object consists of a QObject, followed by Serializable, so an offset must be added or subtracted to convert one pointer to another.

You can safely cast pointers up and down the hierarchy of classes using the static_cast operator, and the compiler will ensure behid the scenes that the pointers are adjusted accordingly. But when you cast something to void*, you lose all the information, so casting it back to some other pointer may produce wrong results!

Let's take a look at the serialize method:

template<typename T>
QVariant SerializationContext::serialize( T* ptr )
{
    if ( ptr == NULL )
        return QVariant();

    QObject* object = static_cast<QObject*>( ptr );

    QHash<QObject*, int>::iterator it = m_map.find( object );

    if ( it != m_map.end() )
        return QVariant( it.value() );

    int index = m_records.count();

    Record record;
    record.m_object = object;
    record.m_type = object->metaObject()->className();
    m_records.append( record );

    m_map.insert( object, index );

    Serializable* serializable = static_cast<Serializable*>( ptr );

    QVariantMap data;
    serializable->serialize( data, this );

    m_records[ index ].m_data = data;

    return QVariant( index );
}

Notice how the pointer is explicitly casted to QObject*, and later it's casted to Serializable*? It's not possible to cast a QObject* to Serializable*, because they are unrelated classes, and forcing the cast by using reinterpret_cast or casting to void* would certainly crash the application. This is even more apparent in the deserialize method:

template<typename T>
T* SerializationContext::deserialize( const QVariant& handle )
{
    if ( !handle.isValid() )
        return NULL;

    int index = handle.toInt();

    Record& record = m_records[ index ];

    if ( record.m_object != NULL )
        return static_cast<T*>( record.m_object );

    QObject* object = ObjectFactory::createObject( record.m_type );

    record.m_object = object;

    m_map.insert( object, index );

    T* ptr = static_cast<T*>( object );
    Serializable* serializable = static_cast<Serializable*>( ptr );

    serializable->deserialize( record.m_data, this );

    return ptr;
}

Here the QObject* is first casted "up" to the actual type, T*, and then "down" to Serializable*. It doesn't matter if T is a Shape and the object is actually a Square or Circle, because all subclasses of Shape have the same layout of base classes.

Other than that, the code is quite straightforward, though there is another gotcha when dealing with circular references between objects. The record must be appended to the list before the serialize method is called on the object. This way, if the same pointer is encountered while serializing the object, it is not serialized again, which would lead to infinite recursion.

Serialization in Qt - part 3

Submitted by mimec on 2012-07-16

In the previous post I already wrote about backward and forward compatibility when serializing and deserializing data into a binary stream. Let's summarize:

  • backward compatibility - the ability to deserialize data serialized by and older version of the application
  • forward compatibility - the ability to deserialize data serialized by a newer version of the application

I said that backward compatibility can be achieve by storing a version tag in the data stream and conditionally changing the deserialization routine based on the version of data. However forward compatibility cannot be achieved this way because we can't predict what changes will be made in the future. This is fine for configuration data, but in case of documents it's not always acceptable.

The best solution would be to allow the application to skip and ignore information it doesn't understand, and extract as much information as it can. Note that it's not always possible. In case of a text document, the content can be preserved even if some fancy formatting is lost. However, let's recall the example in which we added child bookmarks to the Bookmark class. Even if we could skip loading the child bookmarks, we would still lose a lot of information, as only the top level bookmarks would be available in the old version. So before we start thinking about a fancy solution, we should first ask ourserlves if it's really worth the effort.

There is also a relatively simple workaround available. The new version of the application can be forced to save data in format compatible with an older version. This simply means that we have to add similar conditional code in serialization routines. Many applications work in this way, including MS Office applications. For example, the application could save all bookmarks in a linear fashion, losing the parent-child relationship, but still preserving all bookmarks.

But for true compatibility we need to design the data format in such way, that when the application encounters data that it doesn't understand, it can at least skip it and continue processing. But without additional metadata the application doesn't even know how many bytes it should skip.

A simple solution is to wrap all data in a QVariant before serializing, because QVariant writes a tag which identifies the type of the data before the actual data. Let's start with the following code:

template<typename T>
void operator <<( QVariant& data, const T& target )
{
    data = QVariant::fromValue<T>( target );
}

template<typename T>
void operator >>( const QVariant& data, T& target )
{
    target = data.value<T>();
}

These are generic function templates that convert any data to and from a variant. Now let's specialize these functions for our Bookmark type from the previous post. We will use a map to convert a bookmark to a variant and vice versa:

void operator <<( QVariant& data, const Bookmark& target )
{
    QVariantMap map;
    map[ "Name" ] << target.m_name;
    map[ "URL" ] << target.m_url;
    map[ "Children" ] << target.m_children;
    data << map;
}

void operator >>( const QVariant& data, Bookmark& target )
{
    QVariantMap map;
    data >> map;
    map[ "Name" ] >> target.m_name;
    map[ "URL" ] >> target.m_url;
    map[ "Children" ] >> target.m_children;
}

Note that because the bookmark object is converted to a QVariantMap before serializing, it can be successfully deserialized even if the application that reads the data doesn't know anything about the Bookmark type. What's more, we can add more elements to the map in the future without affecting either backward or forward compatibility. When reading a newer version of the file, the elements which are not understood are simply ignored. When reading an older version, missing elements are automatically replaced with default values for the given type.

When we try to compile the above code, we will receive a cryptic error similar to 'qt_metatype_id' : is not a member of 'QMetaTypeId<T>'. That's because a QList<Bookmark> cannot be converted into a QVariant. Since we know how to convert a Bookmark into a QVariant, we can easily convert a QList<Bookmark> into a QVariantList. This can even be done in a generic way:

template<typename T>
void operator <<( QVariant& data, const QList<T>& target )
{
    QVariantList list;
    list.reserve( target.count() );
    for ( int i = 0; i < target.count(); i++ ) {
        QVariant item;
        item << target[ i ];
        list.append( item );
    }
    data = list;
}

template<typename T>
void operator >>( const QVariant& data, QList<T>& target )
{
    QVariantList list = data.toList();
    target.reserve( list.count() );
    for ( int i = 0; i < list.count(); i++ ) {
        T item;
        list[ i ] >> item;
        target.append( item );
    }
}

This way any QList<T> can be converted from/to a QVariant as long as T can be converted from/to a QVariant. Note that we may also want to create additional specializations for QStringList and QVariantList, so that they are not unnecessarily converted, and to add similar conversion functions for maps and other containers.

To summarize, the following conversions are used before data is serialized:

  • Primitive types (numbers, strings and many other built-in types in Qt) are stored as QVariant
  • Objects are stored as QVariantMap that maps properties to values
  • List of various types are stored as QVariantList

The actual serialization consists of two steps: converting the serialized object into a QVariant and serializing the converted data into the stream. Deserialization is analogous and works in the opposite way.

You can notice that the converted data is somewhat similar to the DOM tree of XML document. A variant of a primitive type is analogous to a leaf XML node, and a map of variants is similar to an XML element with child nodes. However, this approach is more compact, faster and easier to use than XML.

Note that simple custom types don't necessarily have to be stored as a QVariantMap. For example, in a financial application, there may be a Money class, which is really a wrapper over some simple numeric value. We can directly place this numeric value in a variant (e.g. as a qlonglong) without wrapping it in a map.

We can also combine the method of serialization based on QVariantMap with the traditional approach, as described in the previous article, for certain types that highly unlikely to change, and can be treated as primitive types. For example, in a graphic application we might define a Circle class which consists of a central QPoint and a radius. We can use the Q_DECLARE_METATYPE macro and the qRegisterMetaTypeStreamOperators function, so that the Circle object can be directly wrapped into a variant and serialized without any conversions.

Just remember that when the Circle type is introduced in a later version of the application, previous versions will not be able to load a file that contains it, so we must remeber about the version tag, as described in the previous post. Also the version of the data format used by built-in Qt types is important to maintain compatibility across different environments.

Serialization in Qt - part 2

Submitted by mimec on 2012-07-09

In the previous post I wrote about support for different file formats in Qt and the pros and cons of using QDataStreama and a binary format. As I promised, today I will provide some more code. I will also start discussing various issues related to backward and forward compatibility of data files.

For now I will focus on simple cases like storing application settings or simple data like a list of bookmarks. I'm assuming that all serialized data have value-type semantics; i.e. they are stored and copied by value, not by pointer. A lot of classes in Qt are value types, including strings and all kinds of containers (as long as they store value types, not pointers). Also Qt makes it easy to create complex and efficient value types by using QSharedData and the copy on write mechanism, but that's an entirely different story.

In our example I will use the following simple Bookmark class:

class Bookmark
{
public:
    Bookmark();
    Bookmark( const Bookmark& other );
    ~Bookmark();

    const QString& name() const { return m_name; }
    // ... other getters/setters

    Bookmark& operator =( const Bookmark& other );

    friend QDataStream& operator <<( QDataStream& stream, const Bookmark& bookmark );
    friend QDataStream& operator >>( QDataStream& stream, Bookmark& bookmark );

private:
    QString m_name;
    QUrl m_url;
};

There is a default constructor, copy constructor and assignment operator; all that's needed for a value type. Thanks to this, we can store our objects in a container like QList. The two overloaded shift operators provide support for serialization. There's even no need to use Q_DECLARE_METATYPE, unless we need to put the bookmark in a QVariant or use it with asynchronous signal/slot connections.

The implementation of the shift operators is straightforward:

QDataStream& operator <<( QDataStream& stream, const Bookmark& bookmark )
{
    return stream << bookmark.m_name << bookmark.m_url;
}

QDataStream& operator >>( QDataStream& stream, Bookmark& bookmark )
{
    return stream >> bookmark.m_name >> bookmark.m_url;
}

This is serialization in its true sense: the bytes of the name string are directly followed by the bytes of the URL in the data stream. Without knowing the exact sequence of data, it's not possible to determine if the next byte is part of a string or an integer, and whether the string is part of the bookmark or some other structure. This means that extra care must be taken to read the data in exactly the same order as it was written.

While this is certainly efficient and makes the code extremely simple, there is a big problem when something needs to be changed or added. Let's suppose that a newer version of our application needs to support hierarchical bookmarks. We add a QList<Bookmark> m_children member to the Bookmark class and modify the implementation of the operators:

QDataStream& operator <<( QDataStream& stream, const Bookmark& bookmark )
{
    return stream << bookmark.m_name << bookmark.m_url << bookmark.m_children;
}

QDataStream& operator >>( QDataStream& stream, Bookmark& bookmark )
{
    return stream >> bookmark.m_name >> bookmark.m_url >> bookmark.m_children;
}

The QList automatically takes care of serializing all the items it contains by recursively calling the shift operator, so it might appear that nothing else needs to be done. But what happens if the user upgrades the application from the older version, and the new version tries to read the bookmarks file created by that previous version? It will expect the list of child bookmarks just after the URL, but actually it's some entirely different, random data. Attempting to interpret it as something it's not will give unexpected results and might even result in a crash.

The solution is to include a version tag in the stream, so that we can conditionally skip some fields when reading a file created by an older version of our application. For example:

QDataStream& operator <<( QDataStream& stream, const Bookmark& bookmark )
{
    return stream << (quint8)2 << bookmark.m_name << bookmark.m_url << bookmark.m_children;
}

QDataStream& operator >>( QDataStream& stream, Bookmark& bookmark )
{
    quint8 version;
    stream >> version >> bookmark.m_name >> bookmark.m_url;
    if ( version >= 2 )
        stream >> bookmark.m_children;
    return stream;
}

Note, however, that this would only work if the first version of the application also included the version tag in the stream! Otherwise we would attempt to read the first byte of the string as the version, again leading to unexpected results and potential crash. The lesson from this excercise is to think about this up front and always plan for the change.

Using a version tag is a good and universal solution, but it only provides backward compatibility: we can use it to correctly read files created by an older version. What happens if our application attempts to read a file created by a newer version of itself? We cannot predict what changes will be made in the future, so there's not much we can do to handle such situation. We can just close the stream and perhaps throw an exception to prevent the application from crashing.

Forward compatibility usually doesn't matter when it comes to simple configuration files. But what if the file is actually an important document that we need to send to someone else, who might have a slightly older version of the application? One solution would be to use a different format, like XML, but forward compatibility can also be achieved when using the QDataStream. I will write more about it in the next post.

Usually it doesn't make sense to include the version tag with each object, but just once at the beginning of the file. It's also a good idea to write a random "magic" value in the file header, to ensure that the file is really what we think it is. I use a class similar to the following one in my own applications to handle all this automatically:

class DataSerializer
{
public:
    DataSerializer( const QString& path ) :
        m_file( path )
    {
    }

    ~DataSerializer()
    {
    }

    bool openForReading()
    {
        if ( !m_file.open( QIODevice::ReadOnly ) )
            return false;

        m_stream.setDevice( &m_file );
        m_stream.setVersion( QDataStream::Qt_4_6 );

        qint32 header;
        m_stream >> header;

        if ( header != MagicHeader )
            return false;

        qint32 version;
        m_stream >> version;

        if ( version < MinimumVersion || version > CurrentVersion )
            return false;

        m_dataVersion = version;

        return true;
    }

    bool openForWriting()
    {
        if ( !m_file.open( QIODevice::WriteOnly | QIODevice::Truncate ) )
            return false;

        m_stream.setDevice( &m_file );
        m_stream.setVersion( QDataStream::Qt_4_6 );

        m_stream << (qint32)MagicHeader;
        m_stream << (qint32)CurrentVersion;

        m_dataVersion = CurrentVersion;

        return true;
    }

    QDataStream& stream() { return m_stream; }

    static int dataVersion() { return m_dataVersion; }

private:
    QFile m_file;
    QDataStream m_stream;

    static int m_dataVersion;

    static const int MagicHeader = 0xF517DA8D;

    static const int CurrentVersion = 1;
    static const int MinimumVersion = 1;
};

It's basically a wrapper over a file with an associated data stream. It ensures that both the magic header and the version are correct when opening the file for reading, and writes those values when opening it for writing. The CurrentVersion constant should be incremented every time something is added or changed in the serialization code of any class. The MinimumVersion constant allows us to skip support for some really old versions, especially when data format changed too much. The dataVersion static method makes it easy to check the actual version when reading data from the stream:

QDataStream& operator >>( QDataStream& stream, Bookmark& bookmark )
{
    stream >> bookmark.m_name >> bookmark.m_url;
    if ( DataSerializer::dataVersion() >= 2 )
        stream >> bookmark.m_children;
    return stream;
}

Note that the version is stored in a global variable, so this code is not thread safe or re-entrant, but so far it's been enough for me in all situations. More elaborate solutions may be created if necessary.

You can also notice that the DataSerializer explicitly sets the version of the data format to Qt_4_6. This is important, because Qt also has a similar versioning mechanism for it's own serialization format. This may cause problems when data is read and written using different versions of Qt libraries. Here we just enforce compatibility with the minimum supported version of Qt, in this case 4.6. Alternatively, we could store the Qt version in the header along with our internal version and verify both when opening the file.

In the next post I will write more about the forward compatibility problem and about serializing complex hierarchies of objects with cross references.

Serialization in Qt - part 1

Submitted by mimec on 2012-07-05

Almost all applications need to store some data and be able to read it later, whether it's a document file or just some application settings. The data can be anything from a few integers to a complex hierarchy of objects. Although the Qt framework doesn't have a built-in serialization support in the same sense as, for example, .NET or Java, it provides at least three mechanisms that can make storing and reading data easier:

  • QSettings - the standard Qt way of storing application settings. It supports both a variation of INI file format and platform specific storage, e.g Windows Registry.
  • QDomDocument - along with other classes from the QtXml module, it provides support for XML files.
  • QDataStream - can be used to read and write binary files.

Each solution has it's advantages and disadvantages. The XML format is sometimes considered as the only "right" way to store any kind of data. While it certainly has many advantages, the markup adds a lot of overhead, and being text based, it's not very suitable for storing data that is binary in it's nature. The INI format is perhaps more compact, but it's still text based and (arguably) human readable. Although it is possible to store anything that can be wrapped in a QVariant, for example a QImage, reading and writing such data is not very efficient (it has to be serialized in binary format and then converted to escaped textual representation). Also such INI file is no longer human readable, not to mention editable. That makes the benefit of using a INI file over a plain binary file questionable.

Personally I use QSettings only in two situations:

  • For manipulating Registry settings in a more comfortable way than by directly using the Windows API (for example to register a custom file extension).
  • For reading auxiliary configuration files that are rarely changed, but can be altered by the user in certain situations. For example, I store the list of available languages in an INI file. Because the list is not hard-coded, new translations can be created or installed without having to recompile the whole application.

Support for XML files is nice if we need to handle one of the numerous existing file formats which is based on XML, for example SVG, RSS or OpenDocument. However I personally don't see much point for a new, custom file format to be based on XML. Unless it needs to be embedded or mixed with other XML based file formats, or processed with a XSLT processor, using a binary format is usually a better idea. Sometimes XML based formats are seen as more "open", whatever that means, but from a technical point of view that's compeletely irrelevant. There are numerous examples of open, well documented binary formats.

A more reasonable argument is that XML based formats are more flexible, because new attributes and tags can be added without affecting compatibility with older and newer versions of the software. With some additional effort, this can also be achieved when using a binary format. I will write more about this topic in one of the next posts.

Another concern is the binary compatibility of data on various platform. QDataStream nicely takes care of it by ensuring proper endianness. We just have to use types like qint32 instead of the standard C++ types when reading from/writing to the stream, to ensure that data always has the same size. On the other hand, in case of XML it would be necessary to take ensure that numeric precision is not lost when converting values to/from text.

The advantage of binary serialization is that it's very simple, fast and memory efficient. There is no addional overhead of parsing the XML markup, storing the entire DOM tree in memory, etc. It also requires much less code that needs to be written. Manipulating the DOM tree is cumbersome and not very elegant, and using the more efficient SAX-style interface is even more difficult.

In the simplest case, the application settings can be represented by a single QVariantMap object (equivalent to QMap<QString, QVariant>). This is basically the same as what QSettings provides, except that the latter uses additional prefixes to emulate a hierarchy of groups. Note that almost anything can be a variant, including custom types, and even another QVariantMap. This makes it easy to create complex, nested data structures that can be saved and loaded back using a few lines of code:

QVariantMap settings;

MyClass instance;
settings.insert( "Key", QVariant::fromValue( instance ) );

QFile file( "settings.dat" );
file.open( QIODevice::ReadOnly );

QDataStream stream( &file );
stream << settings;

In order for a custom type to be serializable, it only has to implement the << and >> operators taking the data stream object. In addition, to be able to embed the custom type in a QVariant, it must be declared as a metatype using the Q_DECLARE_METATYPE macro and registered using the qRegisterMetaTypeStreamOperators function. I will post an example in the next article.

When reading settings back, it's important to remember about default values. Although defaults can be used when reading the values, it's often better to initialize default values which are missing from the map at startup, just after reading the configuration file. This way the default value is only provided once, and not everywhere it's used.

Note that we don't always have to use QVariant to serialize data. If we want to have a file which stores just a list of bookmarks, we can simply serialize a QList<Bookmark>. All we need is the pair of << and >> operators. There is no need to declare a metatype; the type is static, so it doesn't have to be dynamically resolved upon deserialization. Also note that the Bookmark could even contain a nested list of child bookmarks.