Commons schema development
From Freebase
Commons schema are intended to be as stable as possible.
Contents |
Guidelines for Commons schema
- Types in the commons should be of broad, general interest
- Sufficient agreement among the community that the type would be a valuable addition to the Commons
- Commons data models should be designed to be applicable and understandable to everyone interested in that field, including educated laypersons.
- Commons types should not rely on non-Commons types. No type in the commons should include a type from outside the commons, nor should the expected type of any property be a non-commons type.
- All commons types and properties should be documented.
- Well designed: makes appropriate use of links (not raw text), included types, CVTs, etc.
- Follows naming conventions
- Stable: not going to change in any ways that will break queries
Impact of changes
Some schema changes will have effects on people using your Commons types. Users may include API developers, people with saved views, and people whose bases rely on your Commons types. The following list of changes will give you some idea of the amount of impact caused:
- Adding a property: low impact
- Deleting a property: high impact
- Changing a property from a plain property to a CVT: high impact
- Changing the expected type of a property: medium impact
- Changing a property from unique to non-unique: high impact
- Changing a property from non-unique to unique: low impact
- Adding or removing an included type: low impact
For high impact commons changes, you must attempt to notify as many users as possible. At minimum, you should give notice on the data-modeling and developers' mailing lists and in the discussion on your base homepage, at least two weeks in advance. You should repeat the warnings leading up to the change, then make the change on sandbox and leave it there for a week for people to test their code/schemas before making your changes on the live site. This is especially important for core types like person, location, etc.
For medium impact changes, you should perform the same notification but need not give as much warning.
For low impact changes, you can do them straight away without notice, but you should at least let people know you've done it by posting a discussion post on your base homepage. Ideally you would already have discussed the idea of the change with the community before doing it.
Tips for mitigating impact
Don't delete properties straight away. Hide them first, document them as deprecated, and leave them that way for some time before removing them. API users will find that their apps fall out of date, but don't break outright.
Backwards compatibility
Adding properties is always backwards compatible, changing properties is sometimes backwards compatible.
Let's define some terms: With respect to schema, backwards compatible means that MQL queries formulated with version N of a schema still function with version N + 1. The goal of a backwards compatible schema change is to allow existing applications to continue to run. One obvious invariant is property names. If a property, /somedomain/sometype/foo, is present in schema version N, the it has to be present in version N+1. Furthermore, the type of that property has to be "compatible". For object types, "compatible" means it has a superset of property names where the type of each property in the old set is compatible: If /somedomain/sometype/foo once got you to an object with a property named "bar", it should continue to do so in future versions of the schema. For value types, "compatible" is a bit slipperier as the reaction of a JSON consumer to a change from:
"some_property" : 1.0
to:
"some_property" : 1
is unspecified. Mostly, you can get away with the obvious set of changes
Subtleties
In some cases, for example when there is no data for a property, just removing the property is probably the right thing to do. But you will break queries containing:
"deletedproperty" : null
Arguably, MQL should silently ignore this case, but it doesn't. Avoid creating properties that you don't have actual data for. Adding them later is easy, deleting, not so much.
A very common schema change is converting a simple value to a CVT, for example changing a country's population from an integer to a dated integer. Such a change is not backwards compatible so we would have to create a new property for the dated population. OK, now we have /mycountry/population, an integer, and /mycountry/dated_population, a CVT. We have some existing data for the former, none for the latter. How to move forward:
- Add population data in the new format only. The existing data in the old format will continue to exist but will become less and less useful over time. This toes the schema compatibility line, but as existing applications will gradually cease to useful due to lack of data, the fulfilment of the schema compatibility promise is a bit empty.
- Preserve both types of population setting the simple integer based on the most recent dated population. This keeps applications using the old schema fully functional at the expense of some ongoing data gardening activity. Whether this counts as "denormalization" - evil duplication of data - depends on the contents of the property and how it gets used. For example, if we have reliable population data for Elbonia dated 2000, and someone hears a news report on Elbonia giving the current population as "23 million" using 23,000,000 for the simple value is arguably better than waiting another two years to get the official, 23,275,381, or worst having a user fabricate a dated value (23,000,000 as of 2009) when the date is, in fact, unknown. Simple schema makes the data much easier to use: Grabbing "population" is much easier than grabbing the most recent population value sorted by date. In the relatively near future, you'll be able to write an "extended MQL" property which can, among other things, compute a simple property, such as "age", on the basis of more complex data, in that case: date of birth, date of death.
- Actively delete the simple population property
This comes off as a bit hostile, but given the way that things play out in scenario #1, it might just be honest. http://www.freebase.com/view/guid/9202a8c04000641f800000000b75f213 speaks mostly about the Commons but if you want to maintain schema compatibility in your bases, then the same guidelines will apply!