Compound Value Type

From Freebase

(Redirected from Cvt)
Jump to: navigation, search

Contents

Overview

A Compound Value Type is a Type within Freebase which is used to represent data where each entry consists of multiple fields. Compound value types, or CVT's are used in Freebase to represent complex data. It may be a little confusing at first, but CVT's are a very important part of the Freebase schema, and one of the things that makes it unique, and able to represent so much.

Think about the following example: Population for a city is something that changes over time. That means, whenever you query Freebase for population, you are at least implicitly asking for a population at a certain date. 2 Values are involved, a number of people, and the date. Here's a situation where a CVT becomes extremely useful. Without one, to model population data, you would need to make a topic, and name it something like 'vancouver's population in 1997', and submit the information over there.

A CVT can be thought of as a topic that does not require you to make a display name. CVT's, like normal topics, have a GUID that can be referenced independently. However, the Freebase client treats them much differently than topics. In most cases, every property of the CVT should be a disambiguation property.

Marking CVTs

You can tell if a property is a cvt using mql if it has

/freebase/type_hints/mediator == True

You can view this using the Explore view to look at the schema http://www.freebase.com/tools/explore/measurement_unit/dated_integer

There's a similar property on '/freebase/property' in the schema which determines which properties get displayed in the standard CVT display

/freebase/property_hints/disambiguator == True

Display Logic

There are two special aspects relating to the way that compound value types are displayed. First, the properties of a compound value type are intended to be displayed on a single line, so that when a user clicks edit next to a property that uses a CVT as its expected type, they will see all the properties of that type displayed next to one another. If you edit the Performances property for a film, for example, which has the CVT of Film Performances as its expected type, you'll see the empty fields for Actor, Character and Special Performance Type appear on a single line for that property. One way to think of a CVT is that it's a method for providing multiple information fields for a single property when that property uses the CVT as its expected type.

Another special aspect of a CVT is that, while each of its properties can be a type that has its own list of topics (for example, the Actor property in Film Performances has an expected type of Film Actor, which has its own list of Film Actor topics), there are no topics associated with the CVT type itself. There are, for example, no individual Film Performance topics; if you look at the Film Performance type, you'll see that there are topics listed for each of its properties, but there are no individual Film Performance topics.

Creating CVTs

Creating a CVT is relatively simple.

1. From either the base or commons schema page there is a sub-section titled 'Mediators' where all CVT's are listed or are created from ("freebase.com/schema/<commons name>" or "freebase.com/schema/base/<base name>").

2. Click on "Add New" to create a new CVT or on one of the existing listed types under "Mediators" to edit. Edit the name and/or description fields and save to complete the creation/editing of the CVT (Alert: be very careful of editing the key field after the initial creation of a type, it's best to allow the client to auto-create the key for you).

3. Use "Add New" to create the properties for your CVT just as you would the properties for any other type. Click on the triangle icon next to the existing property name to edit it.

4. For each property be sure to select the Disambiguator preference.

Note that you cannot use a CVT as the expected type for a property within another CVT. For example, if you are creating a CVT that should display money values, you would have to have separate properties for currency and amount, rather than using the existing Dated Money Value CVT.

In MQL

{
 "id": "/en/the_dark_knight",
 "/film/film/starring": {
   "create": "unconditional",
   "type": "/film/performance",
   "actor": {
     "id": "/en/heath_ledger",
     "connect": "insert"
   },
   "character": {
     "name": "The Joker",
     "type": "/film/film_character",
     "create": "unless_connected"
   }
 }
}

CVT Merge Logic

When merging two topics that have CVTs, problems with duplication can occur. Consider if we had two "Syd Barrett" topics that were flagged for merge, and that both of these topics had a CVT that represented that he was a guitarist in Pink Floyd:


/en/syd_barrett_1 --(member)--> /guid/9201..c453 --(group)--> /en/pink_floyd
                                       |
                                       +-----------(role)---> /en/guitarist

/en/syd_barrett_2 --(member)--> /guid/9201..1fa3 --(group)--> /en/pink_floyd
                                       |
                                       +-----------(role)---> /en/guitarist

When merging the two topics, the merge code would naively just see two separate /music/group_member/member properties pointing to two separate topics. This would end up with what looks to be duplicate information on the merged topic:


/en/syd_barrett_1 --(member)--> /guid/9201..c453 --(group)--> /en/pink_floyd
         |                             |
         |                             +-----------(role)---> /en/guitarist
         |
         +----------(member)--> /guid/9201..1fa3 --(group)--> /en/pink_floyd
                                       |
                                       +-----------(role)---> /en/guitarist

To solve this, CVT deduplication logic is used. Again, the "identity" of a CVT is really unimportant -- they are really identified by their links. If two CVTs have the exact same set of links, they can be considered duplicates. No extra information is encoded by the second CVT, and it can be safely deleted:

/en/syd_barrett_1 --(member)--> /guid/9201..c453 --(group)--> /en/pink_floyd
                                       |
                                       +-----------(role)---> /en/guitarist

This principle can be extended. If one CVT only contains a perfect subset of the links of another CVT, it too can be deleted since it provides no extra information. So, if in the above example one of the CVTs only specified that Syd Barrett was a member of Pink Floyd (no role), then it too could safely be deleted:

/en/syd_barrett_1 --(member)--> /guid/9201..c453 --(group)--> /en/pink_floyd
         |
         |
         +----------(member)--> /guid/9201..1fa3 --(group)--> /en/pink_floyd
                                       |
                                       +-----------(role)---> /en/guitarist

/en/syd_barrett_1 
         |
         |
         +----------(member)--> /guid/9201..1fa3 --(group)--> /en/pink_floyd
                                       |
                                       +-----------(role)---> /en/guitarist

However, if two CVTs have overlapping links (but not a perfect subset), then deduplication cannot occur.

This logic is carried out in a Constant Gardener script that looks for duplicate CVTs on a nightly basis.

Personal tools