Skip to main content

Pandoc GPL

Pandoc, the universal document converter, is probably one of the most well known programs that is written in Haskell. It is immensely useful for many format conversion tasks, including the conversion of Markdown to HTML, a common feature of web software. Many Haskell web applications make use of Pandoc, but there is an issue that is often overlooked: Pandoc uses the GPLv2 license.

GPL

The GPL is a copyleft license. When a program has a copyleft license, any program that is derived from it must have the same distribution terms. The motivation is to ensure that the software remains free.

There are many open source licenses that are not copyleft, such as the BSD licenses and the (poorly named) MIT license. These licenses give users more freedom, including the freedom to distribute the software in proprietary applications, to use a different license for derivative software, etc. Software under a BSD or MIT license can be re-licensed as GPL, which is necessary when integrating with GPL software. Software with a GPL license cannot be re-licensed with a BSD or MIT license, however, because somebody could then do things that are fine under a BSD or MIT licenses but are a violation of the GPL license.

What about software that uses GPL-licensed software? There are a lot of details, and IANAL and TINLA, but the rules can be summarized as follows:

  • If the software includes GPL software, then it must also be GPL. This is the case for Haskell library dependencies, due to the way that Haskell programs are built. If your Haskell program has Pandoc as a dependency, for example, the program includes Pandoc in the compiled binary. (Notice the huge file size of that compiled binary…)
  • If the software links to libraries of GPL software, then it must also be GPL. A separate LGPL license is available for copyleft software that allows non-copyleft software to link to it.
  • If the software just runs the GPL-licensed software in a separate process, perhaps communicating with pipes, then the software does not need to be GPL.

One can develop non-open-source software that uses GPL-licensed software if that software is never distributed. This is fine for personal use, but such dependencies could become a liability for companies, so many companies forbid use of GPL-licensed dependencies in software development.

By using a copyleft license such as GPL, you significantly limit where that software can be used. For this reason, many people use BSD and MIT licenses instead of copyleft licenses. Most of the software in the open-source Haskell ecosystem uses BSD licenses. I suspect that many people avoid using Pandoc as a dependency because of the license issue; I know that I have.

Mitigation

What do you do if you really want to use Pandoc with your software but do not want to use the GPL?

First, it is important to point out that the pandoc-types library uses a BSD license. If you are developing a Pandoc reader, writer, or filter, then you do not need to license it under the GPL. This is necessary to allow creation of readers and writers that integrate with software that is incompatible with the GPL.

One way to use Pandoc in your own software without copyleft contamination is to run Pandoc in a separate process. Your software should not include Pandoc as a dependency or call any Pandoc APIs. Instead, execute Pandoc using a library such as process or typed-process and communicate with it using the filesystem or pipes. This of course hurts performance and increases resource usage, unfortunately.

Another way to use Pandoc in your own software is to design an abstract interface that does not depend on Pandoc and implement Pandoc support in a separate library. The main software can use whatever license you prefer, and only the separate library must be GPL-licensed. Users of your software then have the choice to avoid the GPL by not using the Pandoc support. Unfortunately, this method cannot be used in companies that forbid using GPL-licensed software.

Hakyll

I found an interesting conversation about this issue in respect to the Hakyll static site generator. Some Hakyll developers claim that the Hakyll source code does not need to use a GPL license because Pandoc is only included in the compiled executables. They say that Hakyll users should take care to use an appropriate license (GPL). I think that this claim is invalid.

Looking into the implementation details, I discovered that the Hakyll library uses a Cabal build flag to optionally include Pandoc support. Perhaps such software could be dual-licensed, so that it is GPL when including Pandoc and a BSD license otherwise… Pandoc is included by default, so I think that the BSD license is disingenuous at best. Also, the hakyll-website executable includes Pandoc regardless of the build flag.

I imagine that nobody is pressing the issue because it would be very unlikely for John MacFarlane, the author of Pandoc, to litigate such an open-source violation. I think that it would only become an issue if there is litigation against a proprietary violation, in which case all violations could be brought into the light.

Other Packages?

I wonder what other packages in Hackage have a similar issue. There are currently 849 packages with the gpl tag. If people cared about the issue, it might be worth writing a program that traverses the dependency hierarchy and identifies all non-GPL-licensed packages that (transitively) depend on a GPL-licensed package.

Note that I find this topic thoroughly unenjoyable. I am writing about it only because I think that it is a serious issue that needs to be addressed as Haskell is used in more and more corporate environments. As for Pandoc, my secret wish is that the author would just switch to a BSD license.

Author

Travis Cardwell

Published

Tags