Comment and URL Tags with Mutagen

I am using the Mutagen library for reading and writing ID3 tags in PodRat. ID3 is the de facto standard for specifying metadata in MP3 files. It is widely used because it is the format that a lot of hardware and software supports, but it is often considered to be quite a mess.

There are various versions of ID3 specifications. The ID3v2.4 version is the most recent, but it is not supported by some hardware/software. The ID3v2.3 version is the most widely used for this reason. I strictly use the ID3v2.3 version because I own hardware that does not support the ID3v2.4 version.

The Mutagen library supports both ID3v2.3 and ID3v2.4 versions. It is quite complicated, as it has to deal with many idiosyncrasies of the specifications. It provides a low-level API (ID3) that deals directly with frames, as well as a high-level API (EasyID3) that is easier to use. The high-level API includes support for many tag types, and it provides a way to add support for additional tag types.

There are two tag types that are very commonly used but are not supported by the high-level API: comments (using COMM frames) and URLs (using WXXX frames). If you use the RegisterTextKey class method to register these tags, the API will successfully write the tags but not be able to read them because it does not do so with the correct frame ID. I could not find any good solutions to this problem online, so I am sharing my solution in this blog entry in case it helps others who run into the same issue.

Comment Tags

The frame ID of comment frames includes a three-letter ISO language code and a description. When using RegisterTextKey, language XXX and an empty description is used when setting tags, so the frame is written with frame ID COMM::XXX. Attempts to get or delete the frame use frame ID COMM, which does not work.

The following function registers a comment tag using correct frame IDs:

def register_comment(lang='\0\0\0', desc=''):
    "Register the comment tag"
    frameid = ':'.join(('COMM', desc, lang))

    def getter(id3, _key):
        frame = id3.get(frameid)
        return None if frame is None else list(frame)

    def setter(id3, _key, value):
        id3.add(mutagen.id3.COMM(
            encoding=3, lang=lang, desc=desc, text=value))

    def deleter(id3, _key):
        del id3[frameid]

    EasyID3.RegisterKey('comment', getter, setter, deleter)

The language and description can be specified as arguments. Note that the default language of \0\0\0 is used to match that of EasyTag, the GUI tag software that I use.

URL Tags

The frame ID of URL frames includes a description. A language code is not used because only the ISO/IEC 8859-1 (latin1) encoding is supported. When using RegisterTextKey, an empty description is used when setting tags, so the frame is written with frame ID WXXX:. Attempts to get or delete the frame use frame ID WXXX, which does not work.

The following function registers a url tag using correct frame IDs:

def register_url(desc=''):
    "Register the url tag"
    frameid = ':'.join(('WXXX', desc))

    def getter(id3, _key):
        frame = id3.get(frameid)
        return None if frame is None else [frame.url]

    def setter(id3, _key, value):
        id3.add(mutagen.id3.WXXX(encoding=3, desc=desc, url=value[0]))

    def deleter(id3, _key):
        del id3[frameid]

    EasyID3.RegisterKey('url', getter, setter, deleter)

The description can be specified as an argument.

Type Annotations

I (of course) use type annotations whenever possible when I write Python. (Types provide many benefits, such as helping reduce bugs!) I stripped type annotations from the above code, however, because I bet that most people who find this blog entry are not using type annotations with Mutagen, as Mutagen does not use type annotations itself.

My actual code includes type annotations, making use of stubs that I wrote for the Mutagen API. While the type annotations in my code are faithful, the type annotations in my stubs are not. The reason is that Mutagen has an object-oriented design that makes heavy use of inheritance, and the Python type system is not powerful enough to represent the constraints.

For example, the concrete type of the return value of calling the get method of an ID3 object depends on the frame ID argument. Consider the getter function for comments:

def getter(id3: ID3, _key: str) -> Optional[List[str]]:
    frame = id3.get(frameid)
    return None if frame is None else list(frame)

The get method returns a value that is an instance of class Frame, but the many subclasses of Frame have different interfaces. In this case, the value of the frame ID starts with COMM:, so it returns a value of type Optional[TextFrame]. The condition takes care of the None possibility, so the final line can call list on the TextFrame value, which acts as an iterator of strings, therefore matching the return type of the function.

Consider the getter function for URLs:

def getter(id3: ID3, _key: str) -> Optional[List[str]]:
    frame = id3.get(frameid)
    return None if frame is None else [frame.url]

In this case, the value of the frame ID starts with WXXX, so it returns a value of type Optional[UrlFrame]. The condition takes care of the None possibility, so the final line can access the url attribute. That string is inserted into a list, therefore matching the return type of the function.

Note that a TextFrame value does not have a url attribute, and a UrlFrame value does not act as an iterator of strings. The Python type system is not powerful enough to represent this type of constraint, so the type checker is unable to confirm that such code is correct. It is therefore up to unit tests and humans to manually maintain the constraints as the code evolves. Object-oriented design is used to easily (and concisely) represent complex relations, but it does so at great cost.

Author

Travis Cardwell

Published

August 4, 2021

Revised

August 6, 2021