Music schema refactoring

From Freebase

Jump to: navigation, search

As of 2010-04-09, this page is IMPLEMENTED.

Contents

Overview

As we "reboot" the effort to keep music data up-to-date in Freebase by syncronizing with MusicBrainz and other sources, we want to make sure that our data model is easy-to-use, expressive and accurate. (Or at least the best balance of those that we can manage.  ;-)

This page discusses quirks, omissions, and frustrations with the current data model. By comparing and contrasting how other music sources like MusicBrainz, last.fm, Discogs and AMG structure their data, we hope to refine our model to be the best it can.

This page is fleshed out by discussions in the Music Use Cases.

Artists and Groups

Artists vs Groups

Currently there is an asymmetry in the way we specify Musical Artists and Musical Groups. The Musical Artist type is mainly used to connect people or groups to the albums & tracks that they have recorded. However, it contains many properties like vocal_range and instruments_played that only make sense for an individual performer, as discussed in this thread.

This setup leads to two problems:

  1. Musical Groups, which must be cotyped as Music Artists to connect to albums & tracks, ends up with nonsensical properties like vocal_range.
  2. People like Sting, Alice Cooper and Ozzy Ozbourne have no good way to recording their standard lineup of their backing bands. Usually, a single topic gets cotyped as a Person, Musical Artist, and Musical Group, which is nonsensical.
  3. There is no good way to distinguish between one of these artists performing under their stage name and performing as "themselves".

Proposed: Three types --

  • Musical Artist, for recording and performance data;
  • Musician, for people who perform in bands or as solo acts; and
  • Musical Group, for groups. The Musician and Musical Group types would be classified as incompatible types. Musical Group would always have Musical Artist as an included type. Solo acts should be co-typed as Musical Artist and Musician. Supporting musicians and members of groups should only be typed as Musician (unless they have a solo career, of course).

Tasks:

Examples:

  • John Lennon would be co-typed as a Musician as well as a Musical Artist. As a Musician, he would be listed as a Member of the Musical Group The Beatles.
  • Santana would be co-typed as a Musical Group / Musical Artist. Carlos Santana, who has a solo career separate from the eponymous band, would be co-typed as a Musician / Musical Artist, and would be a Member of the Musical Group Santana.

Open Issues:

  1. For solo acts, how do we want to handle the difference between their stage name and their birth name -- especially if they have multiple birth names? (Note that the wikipedia infobox has "Birth Name" as a property of Musical Artist.)
    • I believe the /common/topic/alias property should do fine. (and I'm sure someone will come up with a names base & schema if it is of interest). Sprocketonline 07:02, 3 December 2009 (UTC)
      • Alias will work, but it does not differentiate between birth name and multiple performance names. Interestingly, the Wikipedia Infobox:Musical_artist template has separate "properties" for birth_name and aliases, as well as the article title. Do we see value in breaking these out as separate properties? --Zenkat 22:36, 3 December 2009 (UTC)
        • This is not so much a music issue as a general issue -- all kinds of artists (writers, actors, fine artists, etc.) have this issue, as do many other people. A music-specific solution would be to add a "credited as" property to the Artist-Track and Artist-Release relationships (although note that the later would involve adding a CVT. Jeff 23:45, 15 December 2009 (UTC)

Collaborations

The original MusicBrainz schema modelled collaborations as an actual artist; for instance, there was an artist called Queen & David Bowie, which has an annotation that specified it was a collaboration between the two artist.

Freebase currently follows this convention, but often loses the collaboration link to the original artists. This leads to a lot of crufty pseduo-artist Musical Artists.

This could be solved by:

  1. Making artist a non-unique property on album, release & track. However, we may lose the opportunity to handle the MusicBrainz keys for these collaborations (although the artists & albums would still have keys).
  2. If we implement Musical Act (see above), we could model collaborations as separate properties.

Proposed: The /music/track/artist, /music/album/artist and /music/release/artist properties will all be non-unique. If still required in the NGS, MusicBrainz identifiers for collaborations may be handled by special /dataworld/gardening_hint/last_referenced_by nodes.

Tasks:

Supporting Artists

MusicBrainz allows for a link_artist_artist relationship to assert that a particular person "is a supporting musician" of an act. For instance, Adrian Belew is a supporting musician for David Bowie. (MB link)

Freebase currently has no way of specifying this relationship, although it is valuable linking data.

Proposed: Adding a "supporting_musician" property (with an expected type of Musician) to Musical Artist would resolve this problem.

Tasks::

Open Issues:

  • What exactly is the definition of a supporting artist? Is it a backing singer, a session musician? A band that tours with another, more famous/popular, band? Does it cover live jam sessions by musicians (i.e. someone picks up an instrument and forms an ad-hoc group with a lead musician)? Sprocketonline 07:17, 3 December 2009 (UTC)
    • I think we should use something close to the definition MusicBrainz uses: "... artists which have played on their albums and/or in their live bands. This effectively replaces band membership data for solo musicians, because really a 'person' cannot have any members." The focus should be on "replacing band membership for solo acts". First off, this data is currently a mess in Freebase,, and secondly, it differentiates between studio musicians and supporting artists -- studio musicians may only be featured on a track, while supporting artists have a longstanding relationship with the main solo act. --Zenkat 22:41, 3 December 2009 (UTC)

Albums, Releases and Tracks

Album / Release / Release Event

UPDATED!

Freebase represents data about music albums at three levels: Musical Album, Musical Release, and Release Event. The Release type is somewhat arbitrary -- it combines all Release Events with identical track listings into a topic. We're considering merging Release and Release Event. This mapping suggests what properties should be on which type. If we take the complete union of properties on all of the freebase types, we get:

Property Current Type(s) Suggested Type
artist Album / Release Album
release_date (date) Album / Release / Event Album / Release
genre Album Album
release_type Album Album
length Album / Release Release
label Album / Release / Event Release
track Album / Release Release
producer Album Release
engineer Album Release
buy_or_acquire_webpage Album Album (Release)
compositions Album Album
supporting_tours Album Album
credited_as Release Release
catalog_number Release Event Release
region Release Event Release
format Release Event Release

Items in bold are places where there is a potential discrepancy and/or denormalization.

It's been noted that many of the properties that are on both Album and Release were deliberately denormalized because often "people just want to see the tracks on the album". However, as with any denormalization, it's difficult to keep items in sync once data is duplicated in multiple places.

Proposed: Define a (unqiue) primary_release property on Album that lets us link directly to a Release. This property would provide a way for queries like "what are the tracks on the album" without limiting our ability to fully express the complexity of the track/release/album relationships.

Tasks:

  • DA-1011
  • DM-609
  • DA-1012 (This is the big refactoring task for deleting/moving the properties specified in the above table.)

Track Overloading

A similar problem occurs because Tracks can either attach to Musical Albums or Musical Releases. However, the semantics are different in each of the two cases.

When we link an Album to a Track, we mix these levels. Consider the properties on Track. The following seem to be things that should be linked to an Album, because they are unlikely to vary from Release to Release:

  • album (duh)
  • artist
  • song
  • acquire_webpage
  • lyrics_website

While these properties are specific to a Release:

  • length
  • producer
  • engineer
  • contributions
  • date
  • place

If we just link Releases to Tracks, then this ambiguity is acceptable, since we are just placing generals on specifics. When Albums are linked to Tracks, however, we get into the situation where specifics are placed on the general case.

Proposed: Link tracks directly to Releases only. Album tracks can be infered either from the primary_release property (for the short answer to "what tracks are on this album" or by the union of tracks on all releases, depending on the users' needs.

Tasks:

Open Issues: I'm getting confused with the definition of a release compared to a track. e.g. is the Offspring's Pretty Fly (Fatboy Slim Remix) a different release of Pretty Fly with an additional contribution from Fat Boy Slim? Or is it a whole new track? The same problem occurs with any Live at... track, is that a new track or a new release? Sprocketonline 07:37, 3 December 2009 (UTC)

Tracks and releases are different types of things. Tracks represent specific recordings of a composition; releases are instances of albums that have the same tracks. (Release has a similar relationship to Album that Book Edition has to Book.) So "Pretty Fly (Fatboy Slim Remix)" is a track, and it appears on the release "The Untold Remixes of Fatboy Slim". It is not the same track as appears on the album Americana. Further remixes, live recordings, remasterings, etc. would also be their own tracks. Take a look at the current documentation for Musical Track Documentation for more info. Jeff 19:02, 3 December 2009 (UTC)

Tracks and Compositions

In the original load of MusicBrainz data, tracks were often cotyped as compositions. This is somewhat confusing, and is also counter to the usual Freebase practice of separating an abstraction (such as a composition) from a realization of that abstraction (such as a recording of the composition). However, creating a separate Composition topic for every Track (or cluster of tracks that represent recordings of the same composition) would create an enormous number of extra topics that added little value.

Proposed 1: Split topics that are currently typed as both Track and Composition, and define the two as incompatible types.

Tasks:

Proposed 2: For data imports, only create a Composition topic if one of these is true: A) the datasource has data related to the composition (such as composer, lyricist, etc.), or B) there are multiple Tracks that are recordings of the same composition. Singleton tracks with no composition data would therefore not be linked to a Composition topic via a dataload; however, users can create a Composition if they have the data for it.

Canonical Tracks

One use case that we would like to support is providing common strong identifiers that will allow people to note their favorite artists, albums, and tracks. In the case of Tracks, however, we have multiple distinct Freebase topics that can refer to what most people would consider the "same song".

Consider the song "(I Can't Get No) Satisfaction" by the Rolling Stones. They have recorded several different versions of the song, so we have multiple Tracks within Freebase. Conversely, the song has been covered by numerous other bands, so the Composition may refer to the song as performed by Britney Spears, Devo, or The Residents. Neither serves the purpose of providing a single strong canonical ID for "Satisfaction by the Rolling Stones".

Proposed: Create a new type called /music/single. Co-type every "canonical" Track with Single. "Canonical" will be defined as "the Track that appears on the Album with the earliest release date".

Tasks:

Songs vs. Compositions

The Song type currently has only one property, lyrics_website. It has Composition as an included type. The song type has little value, and having the two types is somewhat confusing.

Proposed: Eliminate the Song type, and move the key /music/song and the property /music/song/lyrics_website to Composition.

Tasks:

Genres

The current schema is at: Musical genre

Genre tree organization

Genres are currently organized in a Phylogeny pattern; each genre has one or more "parent genres", and can in turn have multiple "child genres". Albums and Artists can be attached to any node in the tree.

While a phylogeny pattern seems appropriate, the data itself in this structure has become a mess. Consider, for example, the myriad parents and children of Punk rock. Is Funk metal really a subgenre of Punk Rock? Is it really derived from Rockabilly?

One issue is that the some of the data in the music genre hierarchy in Freebase seems to attempt to show a genealogy of genres, rather than family groupings, which is counter to the way that parent and child Media genres are defined. (An example in Freebase is that punk is listed as a subgenre of glam. It is accurate to say that punk descended in part from glam, but it is a subgenre of rock.) It's possible that replacing the current data with data structured appropriately for the type would solve a lot of problems.Jeff 18:17, 24 November 2009 (UTC)

Also, the music industry has historically promoted the idea of genre as a broad, shallow tree to categorize music sales. Our messy, tangled genre web doesn't serve this need -- although it could be argued that this is an old-fashioned way of looking at the industry that no longer has much relevance.

Or, conversely, "genre" may be an outdated and overloaded concept. Sites like last.fm and MusicBrainz do not have a concept of genre; they use tags instead. Perhaps we would be better served if we ditched the whole genre phylogeny and went with a collection of flat, user-specified tags.

As a counterpoint, the current wikipedia music genre tree (Rock example) has become very rich and full-featured. In additions to parent/sub-genres, it also models "stylistic origins", "derivative forms", "fusion genres", "regional scenes", and "typical instruments". It appears to be very well maintained.

Proposal: Wipe the current genre tree. Model the Musical Genre type to match wikipedia, and reload the tree from current wikipedia infobox data.

How much do we lose by doing this? What's covered in FB and not in WP? --Rlyeh 20:32, 23 November 2009 (UTC)
None of the genre nodes will be deleted, nor will artist/album/track relations to genre nodes. Only the organizing tree (parent genre / subgenre) links will be deleted.

Classical Music

Classical Composers shown as Musical Artists

Classical music is something of a mess in Freebase currently. This is due largely to the fact that MusicBrainz treats classical composers as the Artist of an album/release/track, which leads to the assertion in Freebase that the relationship of, say Ludwig van Beethoven to the Berliner Philharmonic's recording of the Ninth Symphony is the same as that of Marvin Gaye to his recording of (Norman Whitfield and Barrett Strong's song) "Heard it Through the Grapevine". The Freebase model currently handles composers and lyricists separately from performing/recording artists, but because of the MusicBrainz import, the data itself is a real problem.

Personal tools