Linux experts Seravo background Linux Debian SUSE
Seravo blog: Linux and open source – technology and strategy

Localization, internationalization and the Finnish language

Open source might make you think about geeks who mostly talk to computers with programming languages. Actually human language skills are just as important. Translators are needed in every international open source project.

Localization becomes a necessary task in almost every open source project in Finland. We have been translating software interfaces into Finnish in projects where the main task has been documentation, software development, or just installation and configuration of a web content management system. Of course there are also pure L10n or localization projects.

During the past year we have participated in the localization of several different WordPress plugins, the book-production platform Booktype, digital asset management platform ResourceSpace, microblogging server StatusNet, collaborative mapping platform OpenStreetMap… And so on. The scope of participation ranges from improving a few badly translated strings to translating the entire file.

The big difference in 2012 was that almost all translations were for different web applications. In earlier years desktop applications were far more common. This might be caused by the generic shift to online services, or maybe the Finnish localization of desktop software is already on a relatively mature level. It is also possible that we just accidentally happened to be more involved with web technologies.

The method of downloading a localization file to your own computer is already outdated due to a variety of reasons. It is often hard to commit the translated localization file to the code repository if you are not actually involved in the software project as a programmer. The sight of thousands of untranslated strings makes the task seem harder than it actually is. Offline translation limits collaboration, since you cannot be sure if somebody is actually translating the same software at the same time, so a lot of work might be duplicated.

Localization files

And there is the issue of dealing with several different file formats used for different projects. GNU gettext is a relatively generic open source localization technology. WordPress is the most common online platform and uses gettext. Joomla is another popular open source Content Management Systen, but uses INI files. Magento is an e-commerce platform for online shopping and uses CSV localization files. Prestashop is a similar web shop and uses XML files.  There are platforms with localization files in PHP and various other formats. This might make it a bit confusing for a potential contributor.

The old fashioned way of localizing open source software certainly makes it less tempting for potential contributors to get involved in software translation. And localization is one of the areas where you can contribute to open source projects without knowledge of programming languages or similar technical skills, which means it really should not require contributors to use source code management tools like Git.

Translation platforms

The solution to this has been available for a long time, but is unfortunately not yet used for all open source software projects. Collaborative translation using an online platform is the proper way to handle open source localization. It solves all of the problems highlighted above and makes it a lot easier to participate in localization projects. Ubuntu’s Launchpad and MediaWiki’s Translatewiki have been around for a long time. Pootle is used by The Document Foundation and other major open source projects. Gnome uses the Damned Lies web application for l10n. GlotPress is the collaborative translation tool for WordPress and is likely to be extremely important in the future, since WordPress is extremely common and has a huge amount of plugins and themes to translate. Transifex is a crowdsourcing localization platform used by several major open source projects.

Update Sept. 2013: there seems to be a whole range of translation platforms, both open and closed source that are free to use in open source projects, e.g. (not affiliated with poEdit the program).

Finnish specialities

The Finnish language is spoken by approximately five million people. Given the amount of speakers the Finnish open source localization projects rank quite well when compared with other languages. The open source localization community always has more work to do. The Finnish translation of Ubuntu 13.04 on Launchpad is currently approximately 50% complete and Finnish is the 21st language when you compare the progress in localization. Czech and Hungarian Ubuntu translations have approximately the same level of completion. While some of the most important programs have been localized very well, you will constantly encounter software with incomplete or outdated Finnish localization, and in many cases nobody has even started the localization yet. WordPress plugins are a good example, since the localization of a complicated plugin might require the same amount of work as the localization of a desktop application, and there are too many plugins to localize. Open source projects evolve and multiply, which makes localization an open-ended and infinite task.

Since open source software projects universally use English as the interface language, the localization of software goes only one way, from English into Finnish. The translation of web content goes both ways, and there is often a need to translate Finnish into English in order to reach the international audience. Members of Seravo are heavily involved in Finnish open source projects that are being internationalized as well as global projects that are being localized. Perhaps the most important examples are the English version of VALO-CD and the Finnish version of FLOSS Manuals.

The feeling you get when thinking about i18n or internationalization is that the Finnish language forms an online bubble. Many Finnish people know English quite well and can easily use international web sites, but international audiences cannot use Finnish language sites. Finnish language still resists machine translation extremely well and the language is not commonly known outside of Finland. It is a common joke to translate a Finnish web page into some other language with Bing or Google Translate, since the results are usually hilarious and the message becomes something really weird. Finnish translators have to be humans instead of machines.

It is easy to understand why machine translations are horrible. Finnish belongs to the Finno-Ugric language family, which makes it unrelated to all of the common languages. Grammar and vocabulary are completely different. Finnish has 14 cases and uses suffixes heavily. There are many variations of the same words. This is probably the main cause of bad computer translations. Finnish uses the Latin character set with only three extra characters, but it still needs a special keyboard.

There are other interesting aspects as well. Even computer interfaces seem too friendly if they are translated directly from English. Everything has to be rewritten in a slightly more serious tone. Computer jargon that is used in real world conversations consists of Finnish versions of English words, and the proper Finnish terminology is actually quite awkward. Every open source localization project is a golden opportunity to invent new words, since there is no central committee for Finnish open source terminology.

Leave a Reply

Your email address will not be published.