SQLite

Debugging the SQLite driver

Submitted by mimec on 2011-05-09

Some time ago I wrote about creating a custom collation for SQLite, integrating SQLite in a Qt application and creating a custom SQLite driver. Although it's a bit of inconvenience, there are also additional benefits of having a custom SQLite driver, for example better support for debugging.

Tracing executed queries is very useful for debugging complex applications because it gives an idea what's happening during its execution without need to debug it line by line. Most server based database systems include tracing functionalities, but in case of a server-less database such as SQLite the only option is to build in tracing into the client.

There are at least two ways of doing that. The simplest one is to modify the SQLiteDriver::prepare method and include code which prints out the query:

qDebug() << "SQLite:" << query;

The disadvantage of that solution is that if the query contains parameters, their values will not be printed. Furthermore, if the query is prepared once and executed multiple times with various parameters, it will still be printed only once.

Luckily SQLite gives us an API function which can be used to trace executed queries. First we need to define a callback function:

static void trace( void* /*arg*/, const char* query )
{
    qDebug() << "SQLite:" << QString::fromUtf8( query );
}

Now we can register the function using the following code:

sqlite3_trace( db, trace, NULL );

Voilà! All executed queries are printed and SQLite automatically replaces the parameter placeholders with appropriate values.

Another thing that is very useful for debugging is information about errors. Printing out all error messages makes debugging a lot easier, especially that it's quite difficult to handle QSqlQuery errors properly in all places.

SQLite does not have a callback for reporting errors, but QSqlResult, which is a base class of the SQLiteResult has a virtual method called setLastError. It is called with appropriate error details whenever some SQLite function return an error. Normally it only stores the error so it is available by using the lastError mehod. All we need to do is override it so that is prints out the error:

void SQLiteResult::setLastError(const QSqlError& e)
{
    if (e.isValid())
        qDebug() << "SQLite error:" << e.driverText() << e.databaseText();
    SqlCachedResult::setLastError(e);
}

The driverText is the description of the operation that was performed and databaseText is the error message returned by SQLite.

When the error occurs while preparing the statement, for example because the query has invalid syntax, the trace method is not called and no information about the query that failed is printed. The simplest solution is to manually call the trace method when sqlite3_prepare16 returns an error:

trace( NULL, query.toUtf8().constData() );

The QSqlDriver class also has a virtual setLastError method which can be overloaded is order to report errors related to opening the database and handling transactions.

Tracing should obviously be disabled in release builds; it's generally a good idea to disable it by default and only enable it conditionally, for example when a special preprocessor macro is defined.

SQLite: UTF-8 or UTF-16?

Submitted by mimec on 2011-04-22

SQLite stores all text as Unicode characters encoded using either UTF-8 or UTF-16 encoding. The API contains functions which allow passing and retrieving text using either UTF-8 or UTF-16 encoding. These functions can be freely mixed and proper conversions are performed transparently when necessary.

The SQLite driver for Qt uses the UTF-16 version of most functions, because that encoding is used internally by the QString class. However SQLite uses the default UTF-8 encoding internally, so it needs to convert all text back and forth when reading and writing data. Usually such conversions does not affect performance very much. The benefit of using UTF-8 encoding internally is that it requires less storage than UTF-16. According to this email from the SQLite mailing list, the performance gain resulting from reduced size of the database is much bigger than the cost of the conversion.

However there is one case when the cost of conversion may become significant. In the previous post I showed how to implement custom collation based on QString::localeAwareCompare. I mentioned that the function expects text to be encoded using UTF-16 encoding because that allows passing it directly to QString. The use of fromRawData ensures that the data is not even copied to a separate buffer. However SQLite still needs to copy both compared strings to a temporary buffer and convert them from UTF-8 to UTF-16 and this happens every time the strings are compared, which is O(N·logN) when sorting query results by a text column.

I made a little experiment to see which storage format is better in such case and the results are a bit surprising. It turns out that UTF-16 is 4% faster than UTF-8 when sorting 1000 rows and 7% faster when sorting 10,000 rows. However UTF-8 becomes faster with large amounts of data - it is 27% faster than UTF-16 when sorting 100,000 rows. At this point the size of the database (which is 4.3 MB for UTF-8 and 5.7 MB for UTF-16) becomes more significant than the cost of conversion.

So the answer to the question which encoding to use is, as usual, "it depends". In my case the database will be rather small and there will rarely be more than a few thousand rows per query, so I will use UTF-16 encoding to avoid the conversions. There are two ways to achieve that. We can execute the following statement directly after opening the database, before any tables are created:

database.exec( "PRAGMA encoding = \"UTF-16\"" );

Another way is to modify the SQLite driver (since we're already using a copy of it) and use sqlite3_open16 instead of sqlite3_open_v2 to open the database - it enables UTF-16 encoding automatically. In that case we won't be able to open the database in read only mode, but if we're not using the advanced connection options then it's not a problem.

Qt, SQLite and locale aware collation

Submitted by mimec on 2011-04-17

One of the principles of the WebIssues Client is that it stores a local cache of data retrieved from the server and performs many operations (including sorting, filtering, etc.) locally. This saves a lot of network bandwidth and server workload, but it also makes implementing the client more difficult. So far all data has been simply stored in memory using multi-dimensional hash tables (available as RDB classes) and that solution has been working fine for hundreds or even thousands of issues, but it is not scalable for larger amounts of data.

I've been trying to find a solution where unused data is saved to disk and loaded on demand, but it would be difficult to implement it efficiently and reliably. Finally I decided that instead of trying to reinvent the wheel, I should use a technology which solves all these problems and works fine for many applications, including two major web browsers, Firefox and Chrome. I am currently migrating the entire data storage of WebIssues Client to use the SQLite database engine.

Qt comes with built-in support for SQLite which seems enough for most purposes. However after some investigation it turned out that SQLite lacks some important features that can be easily implemented by extending it through its API. One of such features is support for a case sensitive, locale aware collation supporting Unicode characters, which is required by WebIssues to correctly sort string using accented and non-latin characters.

In order to create such collation we need to write a simple function which takes advantage of built-in Unicode support in the Qt framework:

static int localeCompare( void* /*arg*/, int len1, const void* data1,
    int len2, const void* data2 )
{
    QString string1 = QString::fromRawData( reinterpret_cast<const QChar*>( data1 ),
        len1 / sizeof( QChar ) );
    QString string2 = QString::fromRawData( reinterpret_cast<const QChar*>( data2 ),
        len2 / sizeof( QChar ) );

    return QString::localeAwareCompare( string1, string2 );
}

The use of fromRawData ensures that the string is not copied in memory and makes the function very efficient. In order to register the collation we need to call the following function:

sqlite3_create_collation( db, "LOCALE", SQLITE_UTF16, NULL, &localeCompare );

The first three parameters are: the pointer to the sqlite3 structure which is essentially the handle of the database, the name of the collation and the format of text which is passed to the compare function - we ensure that the text is converted to UTF-16 encoding which is expected by QString. The handle of the database can be retrieved from the QSqlDriver object:

QVariant handle = driver->handle();
sqlite3* db = *static_cast<sqlite3* const*>( handle.data() );

We might think that this is enough for our custom collation to work. However, depending of the configuration of the Qt libraries, we may get an error saying that the sqlite3_create_collation function does not exist.

There are at least three possible ways of how Qt can be integrated with SQLite:

  • Qt may use the shared SQLite library provided by the operating system. This can be achieved by configuring Qt with the -system-sqlite switch and it is often how Qt libraries which are part of Linux distributions are configured. We can simply include the system sqlite3.h header and link to the libsqlite3.so library and we're done.
  • Qt may include it's own copy of SQLite compiled as part of the QtSql library. It is the default configuration for static builds of Qt. When our program is statically linked to QtSql it can access the SQLite functions, but we still need to include the copy of the sqlite3.h file with our program or declare the functions manually.
  • Qt may also include it's own copy of SQLite compiled as part of the qsqlite plugin, which is the usual configuration when Qt is compiled as shared libraries. In that case our program may not be able to access the SQlite functions at all (on Windows, functions are not exported from a DLL unless explicitly declared in code). The only solution in that case is to include our own copy of SQLite in our program.

When writing a cross-platform program which does not rely on Qt being built in any particular configuration, our only solution is to handle the third, most generic case. Fortunately the authors of SQLite make it easy to integrate it with any application by publishing the source code "amalgamated" into one source file and one header and the only disadvantage of such solution is the large size of the SQLite code.

When we include our own copy of SQLite, the program will compile fine, but it still may crash when started. The reason is that the QSQLiteDriver, which is part of Qt, will use its own copy of SQLite and our program will refer to a different copy. Each copy will use a separate set of global variables which are needed for synchronization and our copy will try to access some uninitialized objects resulting in the runtime error.

Unfortunately to make this work we need to not only use our own copy of SQLite but also our own copy of the QSQLiteDriver. We should copy the files containing QSQLiteDriver (which can be found in src/sql/drivers/sqlite folder of Qt sources) and also the sources of the internal QSqlCachedResult class (from src/sql/kernel) which is its base class to our program and change the names of these classes by removing the 'Q' prefix to prevent potential conflicts. Then, instead of loading the default driver, we should instantiate our copy using the following code:

SQLiteDriver* driver = new SQLiteDriver();
QSqlDatabase database = QSqlDatabase::addDatabase( driver );

The whole process of extending SQLite would be much easier if the standard QSQLiteDriver provided a pointer to the sqlite3_api_routines structure which includes pointers to all SQLite functions, regardless of which version it is and how it is linked. I suggested this to Nokia and perhaps it will be implemented in a future version of Qt, but until then we have to use tricks such as the one I described above to make it work.

Note that on systems where SQLite is available as a shared library, we can provide an option to link our program with that library instead of using the internal copy of SQLite. This way our program will be slightly smaller on those systems.