RSS Part 3: Validation Issues
I am currently using Thunderbird as my RSS client for personal feeds, and the only problems that I have had with it is that it sometimes refused to load feeds that do not pass validation. I have had this issue with two different feeds. It seems like many other RSS clients do not have the same issue, so I assume that other client have lenient parsers that do not require feeds to be valid. This is one thing that I want to check when evaluating various RSS clients, so this blog post looks into the cause of the validation errors.
Specifications, XML, Validation
RSS was first designed in the late 1990s, so it is pretty old in the context of the internet. The first revision of the RSS specification was published in 1999, and it has not changed since the RSS 2.0 Specification was published in 2009. Various other formats have been created over the years in an attempt to improve upon RSS. Most notably, the Atom format was standardized in RFC 4287 in 2005. Other formats include WebSub (originally released with the unfortunate name PubSubHubbub), h-feed, and JSON Feed.
Almost all sites use RSS and/or Atom. Both are based on XML. XML was quite popular in the 1990s but has since lost favor. It tends to be a bit verbose, and it has been overused by the Java community, but the unfortunate fact that software development technology is highly influenced by fads is probably the primary reason why many people avoid XML today.
A positive aspect of using XML for RSS and Atom is that it provides a way to define the format of feeds using document type definitions (DTDs) as well as a way to extend the standard using namespaces. The RSS standard precisely defines what is required for a valid RSS feed, and various namespaces are used to provide additional information in a feed. For example, the podcast namespace defines a standard namespace for adding podcast information to feeds.
XML documents can be “validated” (checked for correctness) against the DTDs used. Anybody can validate feeds on their own computer, and the W3C provides a feed validation service for checking RSS and Atom feeds online. When Thunderbird loads feeds, it refuses to load feeds that are not valid.
船外人のくだったり、くだらなかったり
The first feed that I had problems with is a meta-feed for the Japan By River Cruise podcast, the best podcast for keeping up with the river cruise industry in Japan. The feed still fails validation and therefore does not load in Thunderbird. There are four problems in the feed.
- The channel specifies an image, but it does not specify a link to a website for the channel. There is no website for the meta-feed, so an easy fix would be to link to the RSS feed here.
Missing image element: link
</image>
- An element in the
itunes
namespace is unknown. (I will talk about iTunes below.)
Undefined channel element: itunes:complete
<itunes:complete>No</itunes:complete>
- Items specify an author, but this element must contain an email address and names are given instead. This element is optional, so an easy fix would be to not specify an author.
Invalid email address: ボビー・ジュードとオリー・ホーン
<author>ボビー・ジュードとオリー・ホーン</author>
- The channel does not specify a link to the website for the channel. There is no website for the meta-feed, so an easy fix would be to link to the RSS feed here.
Missing channel element: link
</channel>
I talked with the podcasting company about the issues, but they do not seem very interested in solving them.
The Haskell Interlude
The second feed that I had problems with is the feed for The Haskell Interlude podcast, an excellent podcast produced by the Haskell Foundation. There were three problems in the feed:
- The channel link was empty. This was fixed by specifying the website URL.
link must be a full and valid URL:
<link></link>
- The value for an element in the
itunes
namespace did not validate. (I will talk about iTunes below.)
itunes:explicit must be "yes", "no", or "clean": false
<itunes:explicit>false</itunes:explicit>
- The
episode
element in theitunes
namespace apparently requires an integer greater than 0. This is unfortunate because some podcasts like to start with a 0 episode. This issue was fixed by not specifying an episode number and changing the episode type totrailer
.
itunes:episode must be a positive integer: 0
<itunes:episode>0</itunes:episode>
The podcasting company investigated these errors and fixed all but the second one. This was sufficient: Thunderbird now loads the feed without issue!
iTunes
I investigated the iTunes errors and was disappointed to discover that it is not very straightforward. The namespace is defined as follows:
xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
Attempts to access this URL results in an HTTP 302 (“Found”) response code, which is use to indicate a URL redirection:
$ curl -v http://www.itunes.com/dtds/podcast-1.0.dtd
* Trying 17.172.224.35:80...
* Connected to www.itunes.com (17.172.224.35) port 80 (#0)
> GET /dtds/podcast-1.0.dtd HTTP/1.1
> Host: www.itunes.com
> User-Agent: curl/7.79.1
> Accept: */*
>
< HTTP/1.1 302 Found
< Server: Apache
< Date: Tue Jun 1 12:48:03 PDT 1999 PDT
< Referer: http://itunes.com/
< Location: https://search.itunes.apple.com/WebObjects/MZContentLink.woa/wa/link?path=dtds%2fpodcast-1.0.dtd
< Content-type: text/html
< Content-length: 387
...
Attempts to access the referred URL results in a webpage instead of a DTD. The webpage just attempts to connect to locally installed proprietary software and does nothing when the software is not found. Somebody asked where the DTD can be found on Stack Overflow, and it sounds like Apple does not publish one. This is disappointing but not surprising.
The best itunes
namespace reference that I could find is
A
Podcaster’s Guide to RSS, which is not a formal specification. This
guide does not match with the validation errors, so perhaps the
feed validation code is just out of date.
Thunderbird is open-source software, so it is easy to see what it is
doing! The code that is used to parse feeds is in the FeedParser.jsm
file. It turns out that it is just checking some specific things
manually and is not doing XML validation at all. It does not check
elements in the itunes
namespace, so iTunes issues should
not block a feed from loading in Thunderbird.
- RSS Part 1
- RSS Part 2: My Client Requirements
- RSS Part 4: Thunderbird
- RSS Part 5: Newsboat
- RSS Part 6: GORSS
- RSS Part 7: QuiteRSS
- RSS Part 8: Liferea
- RSS Part 9: Akregator
- RSS Part 10: Tiny Tiny RSS
- RSS Part 11: FreshRSS
- RSS Part 12: yarr
- RSS Part 13: RSS Guard
- RSS Part 14: Feature Reflections
- RSS Part 15: Client Reflections
- RSS Part 16: QuiteRSS Day 1
- RSS Part 17: QuiteRSS Update
- RSS Part 18: QuiteRSS is Dead
- RSS Part 19: Current Thoughts
- RSS Part 20: RSS Guard Revisited