Today I released version 1.1 of WebIssues, a major release which introduces lots of very nice and useful features. I must admit that I'm relieved that this project is finally over. It's also perfect timing, because I'm getting seriously involved in the indie game which I announced some time ago. The game is now officially called Mister Tins and it even already has a Facebook page and a blog (where I will probably post more often than here in the nearest future).
What started as a quick test project, has now become a playable and quite enjoyable game (at least for me). It's still very far from the first official release, but at least now I'm convinced that this is really what I want to do. It's what I always wanted to do and what I probably should have done a long time ago.
I'm not saying that I regret what I've been doing for the last few years. I definitely wouldn't be half as good programmer if it wasn't for WebIssues. I would even go as far as saying that I wouldn't be half the person I am today if it wasn't for all the open source projects I've been involved in. All this technical and non-technical experience should now pay off with this new project.
Of course, there's no guarantee that I will succeed. I know that there is a lot of potential in what I'm doing, and the whole idea of the game, while simple, seems quite innovative. On the other hand, there are many factors involved, and not all of them depend on me. A lot of luck is needed to provide what people need exactly in the right moment. Also, creating games is a team sport, and experience taught me that finding the right team members is not easy an task. But the best thing is that I've reached one very major goal, which was releasing WebIssues 1.1, and I can immediately concentrate on the next goal, without losing the momentum that I have.
As I already explained, version 1.1 of WebIssues will allow using a special markup language when editing comments and descriptions. The syntax will be a hybrid of BBCode, Wiki and various other markup languages. Obviously it's hard to expect that the users will remember Yet Another Markup Language. Instead, they will use a familiar toolbar and key shortcuts to make selected text bold or italic, create a hyperlink or insert a block of code. I decided to use markItUp because it's simple, lightweight (the original uncompressed script is just 20 KB long) and fully customizable. Unlike some other editors, it's not designed for any particular markup language (like markdown or wiki syntax), but lets you design a custom toolbar with whatever markup you need.
After playing with markItUp for a while, I decided to customize it a bit more by modifying the script. It could already generate a preview using AJAX and has multiple ways of showing it - in a popup window, embedded iframe or a custom HTML element. I decided to use the custom element, but I wanted it to be shown dynamically when the preview is first invoked, just like in the two other modes. I also integrated it with prettify, about which I wrote last time, so that syntax highlighting works in the preview.
I also slightly changed the way the markup is added. First, in my version, the openWith/closeWith text is not added to empty lines (or lines with nothing but whitespace). Second, the closeBlockWith text is added before any trailing newlines and other whitespace. It works better this way, especially if you want to apply bold or italic to multiple lines (each line is treated as a separate block, so it must be wrapped in separate bold/italic tags). Finally I removed the special handling for Ctrl and Shift keys when clicking on the toolbar. It's hard to remember and can be confusing, so I decided to simply remove it.
Just like with Prettify, I minified the whole thing using the Closure Compiler, this time in simple mode, because advanced doesn't work too well with jQuery plug-ins. I also had to replace an eval() with direct call to the preview() function, because eval() wouldn't work with minified code. The final script is just 10 KB long. The unminified version of both this script and my version of Prettify are available in the trunk/tools subdirectory of the WebIssues SVN repository, in case you're interested.
Some of you might wonder why I didn't decide to implement a WYSIWYG editor in WebIssues. They are large, complex beasts, which may be useful for large CMS systems designed to be used by non-technical people who like to edit their articles as if they were using MS Word. For a relatively small project like WebIssues, it would be an overkill to include a word processing package in it. Besides, these editors don't always produce valid HTML, and they don't work consistently across various browsers (not to mention the Desktop Client). What's worse, despite the tempting naive implementation (which uses htmlspecialchars_decode to circumvent WebIssues' built-in XSS protection!), it's actually very difficult to sanitize and validate the resulting HTML. Instead, WebIssues will still support the old style plain text format, with no special processing (except for turning URLs into links), which indeed is truly WYSIWYG. Depending on the level of technical skills of the majority your users, you will be able to choose the either plain text or text with markup as the default format.
In version 1.1 of WebIssues it will be possible to use the [code] tag in comments and descriptions. Text included in this tag will be displayd using monospace font, with all formatting disabled. This is useful for including fragments of output, log files, etc., but it can also be used for code snippets; after all it's an issue tracking software. Developers generally like their code colored, so all kinds of editors and other development tools support syntax highlighting for various languages.
Remember the old joke about solving problems using regular expressions? It turns out it never gets out of date. I'm just putting together the markup processor for WebIssues, and since it also uses the link locator, I decided to take a closer look at it. The "link locator" is basically a small utility function which takes a piece of plain text, detects any URLs which appear in it and converts everything to HTML with links.
The heart of the link locator is the call to preg_split with an appropriate regular expression which matches any valid links. I've been using the simplest thing that I could come up with. It recognizes emails, URLs and issue identifiers. And identifier is straightforward; it consists of a "#" and one or more digits. But what makes an email address or URLs is much more difficult to define.
Initially I defined an email address as a sequence of non-whitespace characters starting and ending with a letter or digit and containing exactly one "@". It works, but gives false positives for meaningless strings like "a!@#$%^b". Looking for a better alternative I found this article. I decided to use a slightly modified version of the first regex, which allows the mailto: prefix and non-ASCII characters:
Finding the start of an URL is easy if we assume that it can only start with one of the following prefixes: http://, https://, ftp://, www. or ftp. The last two make it possible to skip the protocol for common addresses like www.mimec.org. But where exactly does the URL end? In the previous sentence, the final dot is clearly punctuation, not part of the URL, even though dot can also be a part of the URL. My original regex assumed that the URL must end with a letter, digit, or slash.
This also works in most cases, but it's not perfect. We can allow more characters at the end of the URL, but the really interesting case is handling parentheses. Consider those two examples:
In the first sentence, the closing parenthesis is not part of the URL, but in the second it is. That's obvious to a human reader, but what about a machine? Fortunately someone already invented a regex which solves this problem. The final regular expression which I'm going to use looks like this (split into three lines for readability):
I added file:// and \\ prefixes (the latter is for UNC paths, like \\server\folder\file.doc) and added backslash as valid character. They are already recognized by the Desktop Client as requested by one of the users. There is no reason not to handle them in the Web Client as well. Even though most browsers block access to such URLs, they can still be copied and pasted more easily.
While testing the regular expressions I made another interesting observation. When using character classes such as "\w" to match against a UTF-8 string, make sure to include the "u" modifier in the expression, for example "/(\w+)/u". Otherwise the result may break the UTF-8 encoding. For example, the Polish letter "ć" is represented in UTF-8 encoding as two bytes, equivalent to ASCII characters "Ä‡". The first one is a "word" character, and the second is not, so the regular expression running in ASCII mode would break the string in the middle of the multi-byte character. Even the innocent "\s" pattern matches the "\xA0" character which can be part of a multi-byte character, so be careful.
Note that it took a bit of googling until I found information about that "u" modifier. The PHP manual should be more specific about it. What's worse, it seems that it's not always supported, even in recent versions of PHP. Just search for "this version of PCRE is not compiled with PCRE_UTF8 support" and you will see what I mean. Well, nothing is perfect, and PHP certainly isn't...
Recently I wrote about version 1.1 of WebIssues and my plans to introduce issue descriptions and formatting of comments and descriptions. I also listed various markup languages which I considered for using in WebIssues. But first, let's look at how text is handled in the current version of WebIssues.
WebIssues currently uses plain text for comments. All whitespace characters, including indentation and line breaks, are preserved, making it easy to paste fragments of code directly into comments without breaking their formatting. At the same time, WebIssues wraps long lines, making it possible to write long paragraphs of text which are displayed correctly regardless of the width of the window. This basically corresponds to the "white-space: pre-wrap" CSS style. In addition, external URLs issue identifiers are automatically converted to links.
The key idea behind adding extended formatting options is to preserve compatibility with this "plain text" mode. It should be possible to edit an existing comment, enable formatting and add some markup to the existing text without breaking existing formatting. Now the problem is that most existing markup languages either ignore whitespace (for example HTML, unless wrapped in a <pre> block) or handle it in a specific way (for example, double line breaks are converted to paragraphs, indentation indicates a block of code, etc.). I'm not saying that they are wrong; this often makes sense when copying text from text files or plain text emails. However, I don't want to break habits of existing users of WebIssues. I would like to treat spaces and line breaks identically, whether formatting is enabled or not. Thanks to this, pieces of code will not break, even if they are not marked using special tags. These tags will only be used for decorative purposes, for example, by using different background color and enabling syntax highlighting.
There are generally two kinds of markup used in existing languages: punctuation (brackets, quotes, asterisks, etc.) and tags. Punctuation is used for inline formatting in various flavors of Wiki and languages such as Markdown or Textile, for example to indicate bold or italic text, though there is no single standard. It is also used for block formatting, for example trailing '>' may indicate a quote. HTML tags are commonly used by various languages, in addition to other format specifiers, though they are useful mostly for advanced formatting. Finally, various flavors of BBCode use custom tags, which are similar, but simpler than HTML. I decided to use a combination of punctuation for inline formatting and custom tags for block formatting. It's questionable whether yet another language should be invented when there's so many already, but I think it's going to be intuitive for everyone, and thanks to the embedded markItUp editor, there will be no need to remember it.
The following inline formatting tags will be supported: **bold**, __italic__, `monospace text` and [URL custom links]. The * and _ characters appear commonly in a technical text, so they need to be doubled to avoid false positives. Link syntax is quite similar to Wiki external links, however internal links can be created the same way, for example: [#123 some issue]. In the future it will be possible to introduce real Wiki functionality (where names can will be used instead of numeric identifiers).
Three different block level formatting tags will be suported. A [code][/code] pair will indicate a block of code, with optional syntax highlighing based on Google's prettify. A [quote][/quote] pair will indicate a quote with an optional header. Finally, a [list][/list] pair indicates a bullet list, where each item starts with one or more * (multiple asterisks indicate nested levels). Unlike automatic lists used by many markup languages, explicit tags will make it easy to clearly indicate where the list starts and ends. Also if will be possible to freely mix and nest all three kinds of tags.
I'm now playing with the prototype of the converter, and I may still do some minor changes, but so far I'm rather satisfied with the result. So when version 1.1 is going to be released? By the end of this year - that's all I can promise for now. I will probably release some beta version in a few months. But I still have my book and various other things to do, so don't expect miracles.