Data sources
From Freebase
Contents |
Licensing
The Freebase Terms of Service require contributors to only contribute data which is compatible with the Freebase licensing terms. It is up to individuals to ensure content (Data, Schema, Descriptions, or Media Files) can be uploaded. See License compatibility for more information. For external data sets which are not license compatible another option is to link to them using a strong identifier so that users can follow the link from Freebase to the other database.
Current sources
This is a partial list of sources from which data has been imported to Freebase.
- Wikipedia - Wikipedia articles provide the core set of topics for Freebase
- Wikimedia Commons - images associated with Wikipedia articles
- EDGAR - Securities Exchange Commission (SEC) data
- Open Library Project - books, lots of books (and their authors)
- Stanford University Library
- TVRage
- ISFDB
- MusicBrainz
- National Register of Historic Places
- OurAirports
- NFDC FAA
- ITIS - Taxonomy of plants and animals
- World of Spectrum
- WordNet
You can find a fuller list at http://sources.freebaseapps.com/, although this lists only sources uploaded by the data team rather than by the community.
Automated sources (via Data pipeline)
Test sources
Linked Data
Linked Open Data, LOD, or Semantic web are terms that describe data that is meaningfully connected across different websites, and is accessible under an open License so that it can be combined and otherwise manipulated.
Terms and concepts:
Related semweb projects:
Proposed sources
See also LinkedData.org's cloud.
- see database upload candidates
- WordNet
- WiserEarth
- US Census Gazetteer data
- Infochimps
- AboutUs.org
- Geonames
- ABN Register (data dumps available under certain licensing conditions, would need careful clearance)
- Barcodepedia
- upcdatabase.com
- data.australia.gov.au
- English Heritage GIS data downloads
- Inducks A database of Disney comics
- http://world-nuclear.org/NuclearDatabase/Advanced.aspx?id=27246 (namespace suggestion)
- I started a scraper here: http://scraperwiki.com/scrapers/world-nuclear/ Right now this just pulls names and keys, i'll add another one to pull all the data.
- EM-DAT: The International Disaster Database
Proposed Key Sources
Many online databases may not have data dumps or appropriate licensing for import to Freebase. In this case we can still provide a link from Freebase back to the appropriate webpage. This is useful as it gives further data point for reconcilation of other items i.e. if we know imported topic A is the same as external website topic B, and Freebase topic C is also the same as B we can deduce that A should be reconciled with C. The linking relies on keys, so any external webpage should ideally relate 1 to 1 with a semantic entity, a topic on Freebase. The following is a list of possible data sources:
- Art collections
- People
- Commonwealth War Graves Commission
- Cemeteries
- Casualties (It isn't possible to do a single search to return all 1.7 million casualties as a paged result, so searches have to be split. I recommend doing it by the first two letters in the surname. e.g. Aa, Ab, Ac, Ad etc..)
- Internment.net
- Mapping Our Anzacs. A database of WW1 Anzacs.
- Commonwealth War Graves Commission
- Architecture
- Structurae
- Dictionary of Scottish Architects. Not possible to get all results in a single page and find links to individual entities (as they're using some funky javascript). But is possible to save search results as a text file, parse this and generate links. Iain is working on this.
- Water Technology projects
- Historic Scotland
- English Heritage
- ArchInform Architecture database.
- Rate Your Music
- European Cultivated Potato Database
- Indian Railways Fan Club Some 11k locomotives operated on India's railways.
- Biz Shark
- Flags
- Cricket Archive
- Incunabula Database of all written works prior to the year 1501.
- Marvel Database
- Fancy A Pint a British pub website
- CrossRef.org 46 million citations for academic publications available as RDF.
- Museum Collection Catalogues
- TV
- Film databases
- Complete Guide to World Film - 461,000 films in the database
- Cine Nacional - Argentinian Cinema
- Jinni
- Rotten Tomatoes
- Criticker
- Hong Kong Movie Database
- Bollywood database
Further reading:
- Semantic Web on Wikipedia
- W3Schools tutorial
- Programming the Semantic Web (book) by Jamie Taylor, Colin Evans and Toby Segaran
