Comment and URL Tags with Mutagen
I am using the Mutagen library for reading and writing ID3 tags in PodRat. ID3 is the de facto standard for specifying metadata in MP3 files. It is widely used because it is the format that a lot of hardware and software supports, but it is often considered to be quite a mess.
There are various versions of ID3 specifications. The
ID3v2.4
version is the most recent, but it is not supported
by some hardware/software. The ID3v2.3
version is the most
widely used for this reason. I strictly use the ID3v2.3
version because I own hardware that does not support the
ID3v2.4
version.
The Mutagen library
supports both ID3v2.3
and ID3v2.4
versions. It
is quite complicated, as it has to deal with many
idiosyncrasies of the specifications. It provides a low-level API
(ID3
) that deals directly with frames, as well as a
high-level API (EasyID3
) that is easier to use. The
high-level API includes support for many tag types, and it provides a
way to add support for additional tag types.
There are two tag types that are very commonly used but are not
supported by the high-level API: comments (using COMM
frames) and URLs (using WXXX
frames). If you use the
RegisterTextKey
class method to register these tags, the
API will successfully write the tags but not be able to read them
because it does not do so with the correct frame ID. I could not find
any good solutions to this problem online, so I am sharing my solution
in this blog entry in case it helps others who run into the same
issue.
Comment Tags
The frame ID of comment frames includes a three-letter ISO language
code and a description. When using RegisterTextKey
,
language XXX
and an empty description is used when setting
tags, so the frame is written with frame ID COMM::XXX
.
Attempts to get or delete the frame use frame ID COMM
,
which does not work.
The following function registers a comment
tag using
correct frame IDs:
def register_comment(lang='\0\0\0', desc=''):
"Register the comment tag"
= ':'.join(('COMM', desc, lang))
frameid
def getter(id3, _key):
= id3.get(frameid)
frame return None if frame is None else list(frame)
def setter(id3, _key, value):
id3.add(mutagen.id3.COMM(=3, lang=lang, desc=desc, text=value))
encoding
def deleter(id3, _key):
del id3[frameid]
'comment', getter, setter, deleter) EasyID3.RegisterKey(
The language and description can be specified as arguments. Note that
the default language of \0\0\0
is used to match that of EasyTag, the GUI tag
software that I use.
URL Tags
The frame ID of URL frames includes a description. A language code is
not used because only the ISO/IEC 8859-1
(latin1
) encoding is supported. When using
RegisterTextKey
, an empty description is used when setting
tags, so the frame is written with frame ID WXXX:
. Attempts
to get or delete the frame use frame ID WXXX
, which does
not work.
The following function registers a url
tag using correct
frame IDs:
def register_url(desc=''):
"Register the url tag"
= ':'.join(('WXXX', desc))
frameid
def getter(id3, _key):
= id3.get(frameid)
frame return None if frame is None else [frame.url]
def setter(id3, _key, value):
=3, desc=desc, url=value[0]))
id3.add(mutagen.id3.WXXX(encoding
def deleter(id3, _key):
del id3[frameid]
'url', getter, setter, deleter) EasyID3.RegisterKey(
The description can be specified as an argument.
Type Annotations
I (of course) use type annotations whenever possible when I write Python. (Types provide many benefits, such as helping reduce bugs!) I stripped type annotations from the above code, however, because I bet that most people who find this blog entry are not using type annotations with Mutagen, as Mutagen does not use type annotations itself.
My actual code includes type annotations, making use of stubs that I wrote for the Mutagen API. While the type annotations in my code are faithful, the type annotations in my stubs are not. The reason is that Mutagen has an object-oriented design that makes heavy use of inheritance, and the Python type system is not powerful enough to represent the constraints.
For example, the concrete type of the return value of calling the
get
method of an ID3
object depends on the
frame ID argument. Consider the getter
function for
comments:
def getter(id3: ID3, _key: str) -> Optional[List[str]]:
= id3.get(frameid)
frame return None if frame is None else list(frame)
The get
method returns a value that is an instance of
class Frame
, but the many subclasses of Frame
have different interfaces. In this case, the value of the frame ID
starts with COMM:
, so it returns a value of type
Optional[TextFrame]
. The condition takes care of the
None
possibility, so the final line can call
list
on the TextFrame
value, which acts as an
iterator of strings, therefore matching the return type of the
function.
Consider the getter
function for URLs:
def getter(id3: ID3, _key: str) -> Optional[List[str]]:
= id3.get(frameid)
frame return None if frame is None else [frame.url]
In this case, the value of the frame ID starts with
WXXX
, so it returns a value of type
Optional[UrlFrame]
. The condition takes care of the
None
possibility, so the final line can access the
url
attribute. That string is inserted into a list,
therefore matching the return type of the function.
Note that a TextFrame
value does not have a
url
attribute, and a UrlFrame
value does not
act as an iterator of strings. The Python type system is not powerful
enough to represent this type of constraint, so the type checker is
unable to confirm that such code is correct. It is therefore up to unit
tests and humans to manually maintain the constraints as the
code evolves. Object-oriented design is used to easily (and concisely)
represent complex relations, but it does so at great cost.