Qt, SQLite and locale aware collation

Submitted by mimec on 2011-04-17

One of the principles of the WebIssues Client is that it stores a local cache of data retrieved from the server and performs many operations (including sorting, filtering, etc.) locally. This saves a lot of network bandwidth and server workload, but it also makes implementing the client more difficult. So far all data has been simply stored in memory using multi-dimensional hash tables (available as RDB classes) and that solution has been working fine for hundreds or even thousands of issues, but it is not scalable for larger amounts of data.

I've been trying to find a solution where unused data is saved to disk and loaded on demand, but it would be difficult to implement it efficiently and reliably. Finally I decided that instead of trying to reinvent the wheel, I should use a technology which solves all these problems and works fine for many applications, including two major web browsers, Firefox and Chrome. I am currently migrating the entire data storage of WebIssues Client to use the SQLite database engine.

Qt comes with built-in support for SQLite which seems enough for most purposes. However after some investigation it turned out that SQLite lacks some important features that can be easily implemented by extending it through its API. One of such features is support for a case sensitive, locale aware collation supporting Unicode characters, which is required by WebIssues to correctly sort string using accented and non-latin characters.

In order to create such collation we need to write a simple function which takes advantage of built-in Unicode support in the Qt framework:

static int localeCompare( void* /*arg*/, int len1, const void* data1,
    int len2, const void* data2 )
{
    QString string1 = QString::fromRawData( reinterpret_cast<const QChar*>( data1 ),
        len1 / sizeof( QChar ) );
    QString string2 = QString::fromRawData( reinterpret_cast<const QChar*>( data2 ),
        len2 / sizeof( QChar ) );

    return QString::localeAwareCompare( string1, string2 );
}

The use of fromRawData ensures that the string is not copied in memory and makes the function very efficient. In order to register the collation we need to call the following function:

sqlite3_create_collation( db, "LOCALE", SQLITE_UTF16, NULL, &localeCompare );

The first three parameters are: the pointer to the sqlite3 structure which is essentially the handle of the database, the name of the collation and the format of text which is passed to the compare function - we ensure that the text is converted to UTF-16 encoding which is expected by QString. The handle of the database can be retrieved from the QSqlDriver object:

QVariant handle = driver->handle();
sqlite3* db = *static_cast<sqlite3* const*>( handle.data() );

We might think that this is enough for our custom collation to work. However, depending of the configuration of the Qt libraries, we may get an error saying that the sqlite3_create_collation function does not exist.

There are at least three possible ways of how Qt can be integrated with SQLite:

  • Qt may use the shared SQLite library provided by the operating system. This can be achieved by configuring Qt with the -system-sqlite switch and it is often how Qt libraries which are part of Linux distributions are configured. We can simply include the system sqlite3.h header and link to the libsqlite3.so library and we're done.
  • Qt may include it's own copy of SQLite compiled as part of the QtSql library. It is the default configuration for static builds of Qt. When our program is statically linked to QtSql it can access the SQLite functions, but we still need to include the copy of the sqlite3.h file with our program or declare the functions manually.
  • Qt may also include it's own copy of SQLite compiled as part of the qsqlite plugin, which is the usual configuration when Qt is compiled as shared libraries. In that case our program may not be able to access the SQlite functions at all (on Windows, functions are not exported from a DLL unless explicitly declared in code). The only solution in that case is to include our own copy of SQLite in our program.

When writing a cross-platform program which does not rely on Qt being built in any particular configuration, our only solution is to handle the third, most generic case. Fortunately the authors of SQLite make it easy to integrate it with any application by publishing the source code "amalgamated" into one source file and one header and the only disadvantage of such solution is the large size of the SQLite code.

When we include our own copy of SQLite, the program will compile fine, but it still may crash when started. The reason is that the QSQLiteDriver, which is part of Qt, will use its own copy of SQLite and our program will refer to a different copy. Each copy will use a separate set of global variables which are needed for synchronization and our copy will try to access some uninitialized objects resulting in the runtime error.

Unfortunately to make this work we need to not only use our own copy of SQLite but also our own copy of the QSQLiteDriver. We should copy the files containing QSQLiteDriver (which can be found in src/sql/drivers/sqlite folder of Qt sources) and also the sources of the internal QSqlCachedResult class (from src/sql/kernel) which is its base class to our program and change the names of these classes by removing the 'Q' prefix to prevent potential conflicts. Then, instead of loading the default driver, we should instantiate our copy using the following code:

SQLiteDriver* driver = new SQLiteDriver();
QSqlDatabase database = QSqlDatabase::addDatabase( driver );

The whole process of extending SQLite would be much easier if the standard QSQLiteDriver provided a pointer to the sqlite3_api_routines structure which includes pointers to all SQLite functions, regardless of which version it is and how it is linked. I suggested this to Nokia and perhaps it will be implemented in a future version of Qt, but until then we have to use tricks such as the one I described above to make it work.

Note that on systems where SQLite is available as a shared library, we can provide an option to link our program with that library instead of using the internal copy of SQLite. This way our program will be slightly smaller on those systems.