Data dumps
From Freebase
Full data dumps of every fact and assertion in Freebase are available in a variety of formats and are updated every week. Deltas are not available.
Contents |
Download
You may also be interested in the Freebase Wikipedia Extraction.
Formats
Quad dump
A full dump of Freebase assertions (quad dump) as tab separated utf8 text. This is a complete "low level" dump of data which is suitable for post processing into RDF or XML datasets. The format of the link export is a series of lines, one assertion per line. The lines are tab separated quadruples, <source>, <property>, <destination>, <value>. An assertion is a statement of fact about the <source> object. In any assertion, either the <destination> or <value> or both <destination> and <value> are present. Lines are grouped by <source> and <property> and are ordered by a sort index when available, meaning all assertions about a particular topic with a particular relationship are contiguous and sorted roughly by importance.
- A sample of this output is available.
- The Link Export is approximately 3.5 Gbytes compressed with bzip2 (35 GB uncompressed)
Description
in the quad dump, every line is a freebase node. Think of a node as an atomic fact. A fact in this context is a single proposition that relates an entity (an object) to a predicate.
The entity is represented by a mid, a machine-generated id, which is in the first column (the "source" column). The source column always contains a value.
The predicate is represented by at least two, and sometimes three of the remaining columns.
The second column is the "property". Values in this column are names like "/type/object/name". These represent a particular kind of quality of the entity mentioned in the "source" column. Like the source column, the property column always contains a value.
The value of the property is held in the remaining columns ("destination" and "value"). Depending on the kind of property, either or both the destination and value columns have a value. Also depending on the kind of property, a single property name can appear multiple times for a particular mid in the source column. In this latter case, the property is multivalued or represents a 1:m relationship with a set of other entities.
case #1: destination column holds a mid:
The destination column can hold a mid, in which case the property refers to another entity. So in these cases, there will be at least one line in the quad dump that has this mid from the destination column in the source column.
If the destination column holds a mid the value column is typically empty.
Examples from the sample:
/m/04p4tzs /base/saturdaynightlive/snl_episode/host /m/0p_47 /m/0p_47 /book/author/book_editions_published /m/04v4y7s
case #2: destination column empty, value column holds a value
If the destination column is empty, there will be a value in the value column. This represents a scalar value, or a single item in a multi-valued property.
Examples from the sample:
/m/0p_47 /base/saturdaynightlive/snl_actor/best_of_snl true
case #3: both destination and value columns hold values
If both destination and value columns hold a value, the destination property typically holds the name of a namespace, and the value column a key within that namespace.
A typical example are properties of the type /type/text, in which case the destination holds a value that identifies a namespace that corresponds to a particular language. [Examples from the sample http://wiki.freebase.com/images/e/eb/Steve-martin-quad-sample.txt]
/m/0p_47 /type/object/name /lang/en Steve Martin /m/0p_47 /type/object/name /lang/id Steve Martin /m/0p_47 /type/object/name /lang/ja スティーヴ・マーティン /m/0p_47 /type/object/name /lang/tr Steve Martin
A filled destination/value pair also occurs for object keys, for example:
/m/0p_47 /type/object/key /wikipedia/en Steve_Martin /m/0p_47 /type/object/key /wikipedia/de_id 107261 /m/0p_47 /type/object/key /source/nytimes top$002Freference$002Ftimestopics$002Fpeople$002Fm$002Fsteve_martin
In these cases, the destination column indicates a particular key namespace, and the value column the identifier within that namespace.
Simple Topic Dump
A tab-separated file containing basic identifying data about every topic in Freebase.
The columns are:
- mid
- English display name
- Freebase /en keys (comma-separated)
- numeric English Wikipedia keys (comma-separated)
- Freebase types (comma-separated) from the commons (not base types)
- a short text description from Wikipedia (when available).
The Simple Topic Dump is approximately 1.2 Gbyte compressed with bzip2 (5 GB uncompressed). In June 2011, there were over 22 million rows, 7.9 million of which are musical tracks, and millions are cvts, with no names (2.2 million paginations, 2.7 million ISBNs, 1 million dated integers, 850K geocodes, etc).
Tabs and newlines are backslash-escaped, and null fields are represented by "\N".
TSV per Freebase type
A tab-separated file for each type in Freebase, suitable for loading into spreadsheets. Each line represents an instance of a Freebase type and columns represent the available properties for the type. You may download the full set, or browse Freebase domains and types to find specific data sets. While the TSV files are useful for quick inspection of Freebase data, they're usually not the best option for using Freebase data in a production application and may even be discontinued in the future.
- The full download is approximately 1.3 Gbytes compressed with bzip2.
- The browseable set contains approximately 7500 TSV files in 100 domains.
License
Freebase Data Dumps are provided free of charge for any purpose with regular updates by Google. They are distributed, like Freebase itself, under the Creative Commons Attribution (CC-BY) license and use is subject to the Freebase Terms of Service. If you include the data from these data dumps in a website or application, you must attribute us as described in the Freebase Licensing Policy.
Citing
If you'd like to cite these data dumps in a publication, you may use:
Google, Freebase Data Dumps, http://download.freebase.com/datadumps/, <month> <day>, <year>
Or as BibTeX:
@misc{freebase:datadumps,
title = "Freebase Data Dumps"
author = "Google",
howpublished = "\url{http://download.freebase.com/datadumps/}",
edition = "<month> <day>, <year>",
year = "<year>"
}