WebIssues

Issues and Tins

Today I released version 1.1 of WebIssues, a major release which introduces lots of very nice and useful features. I must admit that I'm relieved that this project is finally over. It's also perfect timing, because I'm getting seriously involved in the indie game which I announced some time ago. The game is now officially called Mister Tins and it even already has a Facebook page and a blog (where I will probably post more often than here in the nearest future).

What started as a quick test project, has now become a playable and quite enjoyable game (at least for me). It's still very far from the first official release, but at least now I'm convinced that this is really what I want to do. It's what I always wanted to do and what I probably should have done a long time ago.

I'm not saying that I regret what I've been doing for the last few years. I definitely wouldn't be half as good programmer if it wasn't for WebIssues. I would even go as far as saying that I wouldn't be half the person I am today if it wasn't for all the open source projects I've been involved in. All this technical and non-technical experience should now pay off with this new project.

Of course, there's no guarantee that I will succeed. I know that there is a lot of potential in what I'm doing, and the whole idea of the game, while simple, seems quite innovative. On the other hand, there are many factors involved, and not all of them depend on me. A lot of luck is needed to provide what people need exactly in the right moment. Also, creating games is a team sport, and experience taught me that finding the right team members is not easy an task. But the best thing is that I've reached one very major goal, which was releasing WebIssues 1.1, and I can immediately concentrate on the next goal, without losing the momentum that I have.

Filed under: Blog

The markItUp text editor

As I already explained, version 1.1 of WebIssues will allow using a special markup language when editing comments and descriptions. The syntax will be a hybrid of BBCode, Wiki and various other markup languages. Obviously it's hard to expect that the users will remember Yet Another Markup Language. Instead, they will use a familiar toolbar and key shortcuts to make selected text bold or italic, create a hyperlink or insert a block of code. I decided to use markItUp because it's simple, lightweight (the original uncompressed script is just 20 KB long) and fully customizable. Unlike some other editors, it's not designed for any particular markup language (like markdown or wiki syntax), but lets you design a custom toolbar with whatever markup you need.

After playing with markItUp for a while, I decided to customize it a bit more by modifying the script. It could already generate a preview using AJAX and has multiple ways of showing it - in a popup window, embedded iframe or a custom HTML element. I decided to use the custom element, but I wanted it to be shown dynamically when the preview is first invoked, just like in the two other modes. I also integrated it with prettify, about which I wrote last time, so that syntax highlighting works in the preview.

I also slightly changed the way the markup is added. First, in my version, the openWith/closeWith text is not added to empty lines (or lines with nothing but whitespace). Second, the closeBlockWith text is added before any trailing newlines and other whitespace. It works better this way, especially if you want to apply bold or italic to multiple lines (each line is treated as a separate block, so it must be wrapped in separate bold/italic tags). Finally I removed the special handling for Ctrl and Shift keys when clicking on the toolbar. It's hard to remember and can be confusing, so I decided to simply remove it.

Just like with Prettify, I minified the whole thing using the Closure Compiler, this time in simple mode, because advanced doesn't work too well with jQuery plug-ins. I also had to replace an eval() with direct call to the preview() function, because eval() wouldn't work with minified code. The final script is just 10 KB long. The unminified version of both this script and my version of Prettify are available in the trunk/tools subdirectory of the WebIssues SVN repository, in case you're interested.

Note that obviously I had to implement a similar editor in the Desktop Client, which uses a native QPlainTextEdit to edit the comments and descriptions. JavaScript was not an option here, but adding a few toolbuttons and reimplementing the relevant function in C++ was very straightforward. By the way, I recently also rewrote the entire markup processor, which converts the markup to HTML and also exists in two versions, PHP and C++. It was the third or forth time I wrote it from scratch, but this time I decided to do it "the right way". Instead of a very long, convoluted loop with a state machine, I decided to use a recursive descent parser, which is very simple, because the markup language has a LL(1) grammar with just a few production rules. The code is now slightly longer, but incomparably easier to understand, plus I fixed a few remaining bugs.

Some of you might wonder why I didn't decide to implement a WYSIWYG editor in WebIssues. They are large, complex beasts, which may be useful for large CMS systems designed to be used by non-technical people who like to edit their articles as if they were using MS Word. For a relatively small project like WebIssues, it would be an overkill to include a word processing package in it. Besides, these editors don't always produce valid HTML, and they don't work consistently across various browsers (not to mention the Desktop Client). What's worse, despite the tempting naive implementation (which uses htmlspecialchars_decode to circumvent WebIssues' built-in XSS protection!), it's actually very difficult to sanitize and validate the resulting HTML. Instead, WebIssues will still support the old style plain text format, with no special processing (except for turning URLs into links), which indeed is truly WYSIWYG. Depending on the level of technical skills of the majority your users, you will be able to choose the either plain text or text with markup as the default format.

Filed under: Blog

Syntax highlighting with Prettify

In version 1.1 of WebIssues it will be possible to use the [code] tag in comments and descriptions. Text included in this tag will be displayd using monospace font, with all formatting disabled. This is useful for including fragments of output, log files, etc., but it can also be used for code snippets; after all it's an issue tracking software. Developers generally like their code colored, so all kinds of editors and other development tools support syntax highlighting for various languages.

Of course creating a syntax hightlighter is a very complex task, especially given the vast number of programming languages with very different syntax. No wonder than one of the popular tools, SyntaxHighlighter, contains about 100 kilobytes of (partially minified) JavaScript code, slightly more than jQuery. Another example is GeSHi, a 200 kilobyte PHP class with 3 megabytes (!) of language definitions. But syntax highlighting is just decoration, not a key future, so I want to avoid having to download tons of .js and .css files just to achieve this.

The problem is that these tools try to be much too thorough. I don't care if every single PHP function is highlighted, as long as the most important keywords are, along with comments, strings and fragments of HTML that are embedded into the PHP file (which in turn can contain embedded CSS and JavaScript). That's exactly what Google Code Prettify does. It is actively maintained by folks from Google, and it's used by Google Code itself and Stack Overflow, among others. I decided to use it as well.

The version which is included in the current development version of WebIssues is just 16 kilobytes of code. I removed a few unnecessary features and incorporated some of the additional languages into the main file. Currently supported languages include HTML and XML, C and C++, C#, Java, Bash, Python, Perl, Ruby, JavaScript, CSS, SQL, Visual Basic and PHP. I also packed the final script using the Closure Compiler (also from Google) which decreased the file almost four times.

When I was looking for a syntax highlighter, initially I was thinking about doing it server side, using PHP code. It didn't occur to me that this can be done on the client using JavaScript. At first the idea seemed strange to me. However it's actually great and can significantly reduce the server load. From the user's perspective it doesn't really matter. After all, have you ever noticed that Stack Overflow highlights code snippets on the fly using JavaScript?

There is yet another benefit of using client script instead of PHP: it is possible to highlight code also in the Desktop Client. Otherwise the entire mechanism would have to be reimplemented in C++. Version 1.0 of the Desktop Client displays issue details using QTextBrowser, which doesn't support JavaScript and has very limited support for HTML and CSS. But version 1.1 will use QtWebKit, the Qt port of the same engine which powers Chrome and Safari. The advantage is that issue details will have the same look and feel in both the Web Client and the Desktop Client, and obviously it's possible to embed Prettify. I found some minor issues with QtWebKit, probably worth a separate post, but generally, everything works very well.

Filed under: Blog

Link locator and regular expressions

Remember the old joke about solving problems using regular expressions? It turns out it never gets out of date. I'm just putting together the markup processor for WebIssues, and since it also uses the link locator, I decided to take a closer look at it. The "link locator" is basically a small utility function which takes a piece of plain text, detects any URLs which appear in it and converts everything to HTML with links.

The heart of the link locator is the call to preg_split with an appropriate regular expression which matches any valid links. I've been using the simplest thing that I could come up with. It recognizes emails, URLs and issue identifiers. And identifier is straightforward; it consists of a "#" and one or more digits. But what makes an email address or URLs is much more difficult to define.

Initially I defined an email address as a sequence of non-whitespace characters starting and ending with a letter or digit and containing exactly one "@". It works, but gives false positives for meaningless strings like "a!@#$%^b". Looking for a better alternative I found this article. I decided to use a slightly modified version of the first regex, which allows the mailto: prefix and non-ASCII characters:

\b(?:mailto:)?[\w.%+-]+@[\w.-]+\.[a-z]{2,4}\b

Finding the start of an URL is easy if we assume that it can only start with one of the following prefixes: http://, https://, ftp://, www. or ftp. The last two make it possible to skip the protocol for common addresses like www.mimec.org. But where exactly does the URL end? In the previous sentence, the final dot is clearly punctuation, not part of the URL, even though dot can also be a part of the URL. My original regex assumed that the URL must end with a letter, digit, or slash.

This also works in most cases, but it's not perfect. We can allow more characters at the end of the URL, but the really interesting case is handling parentheses. Consider those two examples:

  • Visit my website (www.mimec.org).
  • For more information, visit http://en.wikipedia.org/wiki/Tool_(band).

In the first sentence, the closing parenthesis is not part of the URL, but in the second it is. That's obvious to a human reader, but what about a machine? Fortunately someone already invented a regex which solves this problem. The final regular expression which I'm going to use looks like this (split into three lines for readability):

(?:\b(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)|\\\\)
(?:\([\w+&@#\/\\%=~|$?!:,.-]*\)|[\w+&@#\/\\%=~|$?!:,.-])*
(?:\([\w+&@#\/\\%=~|$?!:,.-]*\)|[\w+&@#\/\\%=~|$])

I added file:// and \\ prefixes (the latter is for UNC paths, like \\server\folder\file.doc) and added backslash as valid character. They are already recognized by the Desktop Client as requested by one of the users. There is no reason not to handle them in the Web Client as well. Even though most browsers block access to such URLs, they can still be copied and pasted more easily.

While testing the regular expressions I made another interesting observation. When using character classes such as "\w" to match against a UTF-8 string, make sure to include the "u" modifier in the expression, for example "/(\w+)/u". Otherwise the result may break the UTF-8 encoding. For example, the Polish letter "ć" is represented in UTF-8 encoding as two bytes, equivalent to ASCII characters "ć". The first one is a "word" character, and the second is not, so the regular expression running in ASCII mode would break the string in the middle of the multi-byte character. Even the innocent "\s" pattern matches the "\xA0" character which can be part of a multi-byte character, so be careful.

Note that it took a bit of googling until I found information about that "u" modifier. The PHP manual should be more specific about it. What's worse, it seems that it's not always supported, even in recent versions of PHP. Just search for "this version of PCRE is not compiled with PCRE_UTF8 support" and you will see what I mean. Well, nothing is perfect, and PHP certainly isn't...

Filed under: Blog

Text formatting in WebIssues

Recently I wrote about version 1.1 of WebIssues and my plans to introduce issue descriptions and formatting of comments and descriptions. I also listed various markup languages which I considered for using in WebIssues. But first, let's look at how text is handled in the current version of WebIssues.

WebIssues currently uses plain text for comments. All whitespace characters, including indentation and line breaks, are preserved, making it easy to paste fragments of code directly into comments without breaking their formatting. At the same time, WebIssues wraps long lines, making it possible to write long paragraphs of text which are displayed correctly regardless of the width of the window. This basically corresponds to the "white-space: pre-wrap" CSS style. In addition, external URLs issue identifiers are automatically converted to links.

The key idea behind adding extended formatting options is to preserve compatibility with this "plain text" mode. It should be possible to edit an existing comment, enable formatting and add some markup to the existing text without breaking existing formatting. Now the problem is that most existing markup languages either ignore whitespace (for example HTML, unless wrapped in a <pre> block) or handle it in a specific way (for example, double line breaks are converted to paragraphs, indentation indicates a block of code, etc.). I'm not saying that they are wrong; this often makes sense when copying text from text files or plain text emails. However, I don't want to break habits of existing users of WebIssues. I would like to treat spaces and line breaks identically, whether formatting is enabled or not. Thanks to this, pieces of code will not break, even if they are not marked using special tags. These tags will only be used for decorative purposes, for example, by using different background color and enabling syntax highlighting.

There are generally two kinds of markup used in existing languages: punctuation (brackets, quotes, asterisks, etc.) and tags. Punctuation is used for inline formatting in various flavors of Wiki and languages such as Markdown or Textile, for example to indicate bold or italic text, though there is no single standard. It is also used for block formatting, for example trailing '>' may indicate a quote. HTML tags are commonly used by various languages, in addition to other format specifiers, though they are useful mostly for advanced formatting. Finally, various flavors of BBCode use custom tags, which are similar, but simpler than HTML. I decided to use a combination of punctuation for inline formatting and custom tags for block formatting. It's questionable whether yet another language should be invented when there's so many already, but I think it's going to be intuitive for everyone, and thanks to the embedded markItUp editor, there will be no need to remember it.

The following inline formatting tags will be supported: **bold**, __italic__, `monospace text` and [URL custom links]. The * and _ characters appear commonly in a technical text, so they need to be doubled to avoid false positives. Link syntax is quite similar to Wiki external links, however internal links can be created the same way, for example: [#123 some issue]. In the future it will be possible to introduce real Wiki functionality (where names can will be used instead of numeric identifiers).

Three different block level formatting tags will be suported. A [code][/code] pair will indicate a block of code, with optional syntax highlighing based on Google's prettify. A [quote][/quote] pair will indicate a quote with an optional header. Finally, a [list][/list] pair indicates a bullet list, where each item starts with one or more * (multiple asterisks indicate nested levels). Unlike automatic lists used by many markup languages, explicit tags will make it easy to clearly indicate where the list starts and ends. Also if will be possible to freely mix and nest all three kinds of tags.

I'm now playing with the prototype of the converter, and I may still do some minor changes, but so far I'm rather satisfied with the result. So when version 1.1 is going to be released? By the end of this year - that's all I can promise for now. I will probably release some beta version in a few months. But I still have my book and various other things to do, so don't expect miracles.

Filed under: Blog
Tags: WebIssues
Syndicate content