Reconciliation
From Freebase
Reconciliation is the process of determining how similar a set of data is to existing Freebase topics. When adding new data to Freebase we encourage that the data is first reconciled with Freebase, as this reduces the likelihood of duplicate data being added to Freebase. A number of tools are available which can help you with carrying out reconciliation.
Contents |
Reconciliation with Google Refine
Google Refine has built-in reconciliation capability. If you have loaded data into Refine, you can reconcile any column against Freebase (or any other data repository for which a reconciliation plugin is available). See the following links for more information:
- http://code.google.com/p/google-refine/wiki/Reconciliation
- http://code.google.com/p/google-refine/wiki/ReconcilableDataSources
- http://code.google.com/p/google-refine/wiki/ReconciliationServiceApi
Reconciliation with Matchmaker
Matchmaker is an Acre application which can be used to reconcile an external web page with Freebase topics. Given a number of initial candidates for each web page, it collects human judgements on the correct candidate to reconcile with.
API
Overview
For programmers Freebase provides the Freebase Reconciliation Service, which is an API that carries out fuzzy matching of datasets with the Freebase graph. The Freebase Reconciliation Service is capable of handling partial or conflicting data and returning a relevant score for data with these conditions. The service takes a query and attempts to return records that best match it using string distances and probablistic matching.
| Name | Reconciliation API |
| Description | Reconciles a subgraph and returns a sorted list of candidates with confidence values |
| URL | http://data.labs.freebase.com/recon/query |
| Required Arguments | ?q= - A simplified MQL query. |
| Optional Arguments | start - The offset within the results for a query, for paging through results. |
| limit - The maximum number of candidates returned. | |
| jsonp - JSONP support - the name of the javascript function to wrap the json results in. | |
| Response | application/json |
Request
Requests resemble simplified MQL queries, with a dictionary of canonical properties and literal values. Any properties can be used as long as they use the canonical ids. If the property's expected value is a literal, then the value in the request is matched against the literal. If the property's expected value is a topic, then the value is matched against the name. CVT properties are flattened.
The id property can be used to specify the id for an entity in the reconciliation. Property values can be expanded to dictionaries to allow for an id; in this case the name is specified using the name property. If a guid is specified, the service won't question the mapping, and will either return candidates connected to that guid or assume that the connection to that guid is missing from OTG.
Here's an example:
{
"/type/object/name":"Blade Runner",
"/type/object/type":"/film/film",
"/film/film/starring/actor":["Harrison Ford", "Rutger Hauer"],
"/film/film/starring/character":["Rick Deckard", "Roy Batty"],
"/film/film/director":
{
"name":"Ridley Scott",
"id":"/guid/9202a8c04000641f8000000000032ded"
},
"/film/film/release_date_s":"1981"
}
Response
If the first record is a match, it will have the property "match":true in the result. If the service fails to reconcile the root record, it will return some suggested matches in rank-order of likelihood. The name, types, id, and match score of each record is also included.
Here's an example of a reconciliation response:
[{
"id":"/guid/9202a8c04000641f8000000000009e89",
"name":["Bladerunner", "Blade Runner"],
"score":2.1810298,
"match":true,
"type":["/award/award_winning_work", "/award/award_nominated_work", "/common/topic", "/base/dystopia/topic", "/film/film", "/base/greatfilms/topic", "/base/greatfilms/ranked_item", "/media_common/quotation_source", "/fictional_universe/work_of_fiction", "/media_common/adaptation", "/media_common/adapted_work"]
},
{
"id":"/guid/9202a8c04000641f8000000000446749",
"name":["Blade: Trinity"],
"score":0.2877137,
"match":false,
"type":["/film/film", "/common/topic"]
},
{
"id":"/guid/9202a8c04000641f8000000006b4624f",
"name":["A Blade in the Dark"],
"score":0.2664117,
"match":false,
"type":["/film/film", "/common/topic"]
}, ...