Data dumps

From Freebase

(Difference between revisions)
Jump to: navigation, search
m (See also)
m (Link Export)
Line 18: Line 18:
===Link Export===
===Link Export===
-
A full dump of Freebase assertions as tab separated utf8 text. This is a complete "low level" dump of data which is suitable for post processing into RDF or XML datasets. The format of the link export is a series of lines, one assertion per line. The lines are tab separated quadruples, <source>, <property>, <destination>, <value> An assertion is a statement of fact about the <source> object. In any assertion, either the <destination> or <value> or both <destination> and <value> are present.
+
A full dump of Freebase assertions (quad dump) as tab separated utf8 text. This is a complete "low level" dump of data which is suitable for post processing into RDF or XML datasets. The format of the link export is a series of lines, one assertion per line. The lines are tab separated quadruples, <source>, <property>, <destination>, <value> An assertion is a statement of fact about the <source> object. In any assertion, either the <destination> or <value> or both <destination> and <value> are present.
* A [http://wiki.freebase.com/images/2/26/Quad-sample.txt sample] of this output is available.
* A [http://wiki.freebase.com/images/2/26/Quad-sample.txt sample] of this output is available.

Revision as of 14:20, 15 October 2010

Full data dumps of every fact and assertion in Freebase are available in a variety of formats and are updated every week.

Contents

Download

Download Freebase Data Dumps

You may also be interested in the Freebase Wikipedia Extraction.

Formats

TSV

A tab-separated file for each type in Freebase, suitable for loading into spreadsheets or database systems. Each line represents an instance of a Freebase type and columns represent the available properties for the type. You may download the full set, or browse Freebase domains and types to find specific data sets.

  • The full download is approximately 1300 Mbytes compressed with Bzip2.
  • The browseable set contains approximately 7500 TSV files in 100 domains.

Link Export

A full dump of Freebase assertions (quad dump) as tab separated utf8 text. This is a complete "low level" dump of data which is suitable for post processing into RDF or XML datasets. The format of the link export is a series of lines, one assertion per line. The lines are tab separated quadruples, <source>, <property>, <destination>, <value> An assertion is a statement of fact about the <source> object. In any assertion, either the <destination> or <value> or both <destination> and <value> are present.

  • A sample of this output is available.
  • The Link Export is approximately 4000 Mbytes compressed with Bzip2.

Simple Topic Dump

A tab-separated file containing basic identifying data about every topic in Freebase. The columns are: GUID, English display name, Freebase /en keys (comma-separated), numeric English Wikipedia keys (comma-separated), Freebase types (comma-separated), and a short text description from Wikipedia (when available). Tabs and newlines are backslash-escaped, and null fields are represented by "\N".

  • The Simple Topic Dump is approximately 1000 Mbytes compressed with Bzip2.

License

Freebase Data Dumps are provided free of charge for any purpose with regular updates by Google. They are distributed, like Freebase itself, under the Creative Commons Attribution (CC-BY) license and use is subject to the Freebase Terms of Service. If you include the data from these data dumps in a website or application, you must attribute us as described in the Freebase Licensing Policy.

Citing

If you'd like to cite these data dumps in a publication, you may use:

Google, Freebase Data Dumps, http://download.freebase.com/datadumps/, <month> <day>, <year>

Or as BibTeX:

@misc{freebase:datadumps,
  title = "Freebase Data Dumps" 
  author = "Google",
  howpublished = "\url{http://download.freebase.com/datadumps/}",
  edition = "<month> <day>, <year>",
  year = "<year>"
}

See also

Personal tools