Dime

From Freebase

Jump to: navigation, search
"MQL ain't worth a dime anymore." --Yogi Berra
Dime.png

Contents

Dime

Dime is the codename for a new ruggedized MQL implementation. It's 5¢ the same old MQL you know and love, and 5¢ MQL 2.0. It's faster, more consistent, and has a number of new features that we think you're going to like. This document describes those new features, and the differences from the current MQL implementation (hereafter referred to as MQL 1.0).

How can you try out Dime? Simply use the following Query Editor URL:

http://www.freebase.com/app/queryeditor?service=http://dime.labs.freebase.com/api/service/mqlread

or you can use the http service directly via:

http://dime.labs.freebase.com/api/service/mqlread

Note that Dime only implements the read portions of the MQL language, described in chapters 3 and 4 of the MQL Language Reference. If you need to do MQL write queries, you must continue to use the production MQL write services. Also, as a preview release, the Dime service found at dime.labs.freebase.com may be subject to intermittent availability as we make changes and improvements to the service (i.e. you're free to play with it as much as you like, but you should refrain from relying on it in a production application).

Give Dime a try, and let us know what you think. You can send your feedback to the freebase-discuss@freebase.com mailing list, or report bugs directly using the Freebase Issue Tracking System (use Project: "eng-mql", Component: "dime").

New Language Features

Here are the new MQL language features and extensions implemented by Dime. They are broken into those that are fully backwards-compatible, and those that either take some liberties with the current MQL Language Reference, are fixes to long-standing MQL problems, or are not quite altogether there yet.

Since Dime is a preview (beta) release, there is no guarantee that these features will remain in the final product release. Your feedback will help us determine the feasibility of adding these features or making the changes perminent.

Backwards-compatible extensions

performance

Dime has been optimized for improved MQL response times. Where as the MQL 1.0 implementation was spending about half of the transaction time compiling queries and processing results, Dime reduces this overhead to a minimum so that the overall response time is faster. You can compare Dime's response times with MQL 1.0 by comparing the tm (time in MQL) values in the status tab of the Query Editor. Note that for smaller queries, the dominant portion of the time may be spent in network latency, and for difficult queries, may be spent in our underlying graphd query solver.

Another feature that may facilitate improved performance in MQL applications is that when multiple queries are placed in the same MQL envelope, Dime will work on them concurrently, in many cases even distributing the load across multiple graphd servers working in parallel. There are some constraints on the number of concurrent queries Dime will work on at a given time, and the total processing time is subject to how heavily loaded the server may be. However, it's very likely that packaging multiple queries into a single envelope will increase overall performance.

directives only

MQL queries specified by JSON object structures ({...}) may now contain directives only. This will cause the default properties to be used automatically, just as they are when {} or [{}] is specified.

  • eliminates the error "Clause must not be empty excluding directives"
  • e.g. {"id":"/en/the_police","/music/artist/album":[{"limit":2}]} is equivalent to {"id":"/en/the_police","/music/artist/album":[{"limit":2, "id":null, "name":null, "type":[]}]}

expected type inference for "/type/link/source", "/type/link/target" and "/type/link/target_value"

Dime will infer the expected type for "source", "target" and "target_value" properties whenever the property containing the "link" clauses itself has a known expected type.

  • eliminates errors like: "Type /type/object does not have property contains"
  • e.g. {"id":"/en/bob_dylan","/people/person/place_of_birth":{"link":{"target":{"contains":[]}}}} is equivalent to {"id":"/en/bob_dylan","/people/person/place_of_birth":{"link":{"target":{"/location/location/contains":[]}}}}

bare /type/link properties

Dime allows properties of /type/link to be used as fully-qualified properties anywhere that the corresponding "link" clause can be used.

  • e.g. {"id":"/en/bob_dylan","/people/person/place_of_birth":{"/type/link/master_property":null}} is equivalent to {"id":"/en/bob_dylan","/people/person/place_of_birth":{"link":{"master_property":null}}}

ability to use type:/type/link anywhere

Dime allows the type constraint "type":"/type/link" to be used anywhere within a query to find links that refer to other links (a seldom used feature of our underlying graph representation). In MQL 1.0, "type":"/type/link" could go at the top-level of a query. See MQL-437 for more details.

improved key handling

  • Dime provides more consistency when dealing with fully-qualified property names as keys.
    • MQL 1.0 has many places where "/type/object/type" is not synonymous with "type", "/type/value/value" or "foo:value" is not synonymous with "value", etc
  • ability to label directives
    • e.g. "my:limit":3
  • use of ! to reverse a property need not use a fully-qualified property name
    • so long as MQL can determine the property from the expected type
    • e.g. {"id":"/en/bob_dylan","type":"/people/person", "children": [{"!children": []}]} -- here "!children" is ok because the expected type is still "/people/person"

extended /type/reflect properties

Added additional /type/reflect properties to allow reflection queries along master_property and attribution links.

  • /type/reflect/any_attribution -- for any object, return all attribution objects associated with its properties
  • /type/reflect/any_attribution_of -- for any attribution object, return all objects having properties with this attribution
  • /type/reflect/any_attributed -- for any attribution object, return all objects which attributed it
  • /type/reflect/any_attributed_of -- for any object, return all attribution objects which attributed it
  • /type/reflect/any_source_of -- for any object, return all property objects for which it is a source
  • /type/reflect/any_source -- for any property object, return all objects which use it as a source (note that the property object should be a master)
  • /type/reflect/any_target_of -- for any object, return all property objects for which it is a target
  • /type/reflect/any_target -- for any property object, return all objects which use it as a target (note that the property object should be a master)
  • /type/reflect/any_value_of -- for any object, return all objects which refer to it through a value (e.g. namespaces or languages)

reversible /type/link properties

Several /type/link properties now have proper reverse properties, or are reverse properties of each other:

  • /type/link/source and /type/link/target
  • /type/link/master_property and /type/property/links
  • /type/link/attribution and /type/attribution/links
  • any of these can be prefixed with !
  • eliminates errors like "Can't reverse artificial property /type/link/master_property"

ability to constrain by count and estimate-count

  • e.g. { "id": "/en/bob_dylan", "/people/person/children": [{ "count": 6, "id": null }]}
  • e.g. [{"id":null,"key":[{"value":null,"namespace":"/en","count>":2,"return":"value"}]}]

generalized comparison operators

Dime allows a few comparison operators in contexts where MQL 1.0 did not support them, e.g. inequality operators for "count". These are described in Dime Comparison Operators.

generalized handling of sort

Dime infers that the presence of a property mentioned in a "sort" directive implies that it must be returned.

  • e.g. { "id":"/en/bob_dylan", "/people/person/children":[{"sort":["profession","date_of_birth"]}]} is equivalent to { "id":"/en/bob_dylan", "/people/person/children":[{"sort":["profession","date_of_birth"], "profession":null, "date_of_birth":null}]}

Also, Dime deals more consistently with default properties such as "value" mentioned in a sort clause:

  • e.g. { "id": "/en/bob_dylan", "/people/person/children": [{ "date_of_birth": {"value": null}, "sort": "date_of_birth" }]} is equivalent to { "id": "/en/bob_dylan", "/people/person/children": [{ "date_of_birth": {"value": null}, "sort": "date_of_birth.value" }]}
    • eliminates errors like: "Must sort on a single value, not at date_of_birth"

generalized handling of return

Dime allows other properties to be returned besides "count" and "estimate-count", e.g.

{ "id": "/en/bob_dylan",
  "/people/person/children": {
    "date_of_birth>": "1965",
    "return": "id" }}
==>
{ "id": "/en/bob_dylan",
  "/people/person/children": "/en/jakob_dylan" }

sort by guid

Dime allows sorting by guid.

  • e.g. "sort":"guid" or "sort":"foo.bar.guid"
  • can also sort by "id" or "mid", but the guid order is used (so perhaps not quite what's expected)

generalized cursors

Dime allows a new MQL directive "cursor" to appear in array queries. This allows not only cursors at the top-level of queries (as is done by the "cursor" envelope parameter), but also allows nested cursors within query results.

For example, {"id":"/en/bob_dylan","/people/person/children":[{"cursor":null,"id":null,"limit":3}]} yields (ugly, but hopefully useful):

{
 "id": "/en/bob_dylan",
 "/people/person/children": [
   {
     "cursor": "eNptjUsKAjEQBU8TUGRIf_...",
     "id": "/en/jakob_dylan"
   },
   {
     "cursor": "eNptjUsKAjEQBU8TUGRIf...",
     "id": "/en/jesse_dylan"
   },
   {
     "cursor": "eNptjUsKAjEQBU8TUGRIf...",
     "id": "/en/desiree_gabrielle_dennis_dylan"
   }
 ]
}

(The fact that cursors values are repeated in the result may be somewhat problematic due to the size of the cursor, but this is consistent with MQL's usual handling of "array-centric" properties like "count" and "estimate-count". Perhaps this should be changed so that only the first array element contains the array-centric information.)

multiple identity constraints

Dime allows multiple identity constraints ("id", "guid", or "mid") to be at the same level within in a query, e.g.:

{
 "id":   "/en/bob_dylan",
 "guid": "#9202a8c04000641f8000000003abd178"
}

MQL 1.0 currently declares this an error. Dime will run the query and succeed if the identifiers refer to the same object. This is particularly useful when queries are automatically generated by code, and allows that code to avoid needing to check for identity agreement. It is also useful when one of the identity constraints is uses the |= operator with a list of ids.

auto-expand mediator CVTs

Dime will automatically expand {} to contain all the properties of a mediator CVT, e.g.:

{"id":"/en/bob_dylan","/people/person/place_of_birth":{"geolocation":{}}}
==>
{
 "id": "/en/bob_dylan",
 "/people/person/place_of_birth": {
   "geolocation": {
     "datum": "NAD83",
     "latitude": 46.7833,
     "longitude": -92.1064,
     "elevation": null
   }
 }
}

This is was MQL-618.

Not-quite backwards-compatible changes

improved id resolution rules

Dime corrects a number of idiosyncrasies in MQL 1.0's id resolution rules (ids in MQL that are not Machine IDs are determined dynamically from links between namespaces in the graph). These changes make ids more manageable and stable over time. These rules, listed in order of precedence are:

  1. ids prefixed by blacklisted namespaces are rejected (see MQL-448 for more details)
  2. ids closer to the root namespace are preferred
  3. /en ids are preferred
  4. newer ids are preferred (where newer refers to the final link connecting the namespace to the topic)

This last rule is particularly important as it allows corrections to be made, replacing undesired ids with newer corrected ones.

soft uniqueness

Non-unique properties need not be queried with a JSON list ([...]), instead only one (arbitrary) matching element will be returned.

  • eliminates errors like: "Unique query may have at most one result. Got 23"
  • e.g. {"id":"/en/the_police","/music/artist/album":{}} is equivalent to {"id":"/en/the_police","/music/artist/album":[{"limit":1, "id":null, "name":null, "type":[]}]}

Although MQL has always allowed uniqueness failures to be controlled by specifying the query envelope parameter "uniqueness_failure", Dime will fail if this parameter is set to anything other than "soft", i.e. it never fails when asking for a single instance of a non-unique property. For non-unique properties, Dime treats {...} as if the user had specified {..., "limit":1}.

count independ from limit

Currently MQL 1.0 only returns counts up to the number of results requested, e.g.

[{
 "type" : "/music/track",
 "artist" : "The Police",
 "name" : null,
 "count" : null
}]

Here, each result includes "count": 100 because only 100 results were (implicitly) requested.

Dime does not limit the count in this way, and will return the actual count of tracks regardless of how many results are returned (for the above query, "count": 135 is returned).

values from graphd are typed according to the schema

Dime interprets (parses) values from graphd according to the expected type specified in the schema rather than the primitive datatype field associated with the value (this was used by MQL 1.0). This means that if the expected type of a property is changed, the existing values will be reinterpreted according to the new expected type.

  • e.g. changing the expected type of /people/person/height_meters from /type/rawstring to /type/int will cause Dime to attempt to coerce the existing rawstring values to integers
  • non-conforming values will not be returned (null is returned instead), and are also logged for future cleanup
    • MQL 1.0 would make all old values disappear rather than attempting to reinterpret existing values. Values would need to be completely updated.
  • one exception to this is when querying by /type/reflect/any_value -- in this case, the graphd primitive datatype field is used to return integer, float and boolean values (not sure how to do this otherwise)

values from graphd are not munged

When Dime retrieves data values from graphd, it validates them against the expected type, but does not otherwise change their string representation. This means that exact date/time precision is preserved as well as leading and trailing zeros, significant digits and precision for floating point values.

A couple of caveats however:

  • MQL 1.0 may have already caused imprecise information to be stored in graphd, and now Dime will expose that data,
  • floating point comparisons are performed by graphd, and still subject to epsilon edge cases.

constraints with same property name are independent

Currently, when a query specifies two or more properties that only differ by the operator suffix, these properties are implicitly folded together into a single property access. For example:

[{"type":"/music/artist", "limit":2, "id":null, "album~=":"^qu*", "album": []}]

currently means "tell me 2 artists that have albums beginning with 'qu', and tell me the names of those albums." However, we could achieve effectively the same results with this query:

[{"type":"/music/artist", "limit":2, "id":null, "album": [{"name~=":"^qu*", "name":null}]}]

An alternate interpretation of the original query could be "tell me 2 artists that have albums beginning with 'qu', and tell me the names of all of their albums." Had we wanted only information on the 'qu' albums we could do that with the second query, but this new interpretation of the first query opens up new possibilities.

For instance, mql currently chokes on this query:

[{"type":"/music/artist", "limit":2, "id":null, "album~=":"^qu*", "album": [{"id":null}]}]
... "message": "Can't use a comparison operator with a non-terminal"

but if we decouple the two album constraints so that they work independently, we don't have this problem.

Furthermore, if we make more liberal use of default properties than we do today (e.g. allowing them to be defined for different types of CVTs), then we can interpret constraints as applying to default properties of the expected type of non-terminal properties. For instance, if we define "spouse" as the default property of /people/marriage, then we could write a query such as:

[{"type": "/people/person", "spouse_s~=": "Sara", "spouse_s": []}]

to effectively mean:

[{"type": "/people/person", "spouse_s": [{ "spouse": [{"name~=": "Sara"}] }], "spouse_s": []}]

then it becomes even less clear whether the "spouse_s":[] clause intends to return information about the CVTs or information about the names of the spouses that are matched by the first "spouse_s~=" clause. This is a further argument for decoupling the interpretation of similarly named property constraints.

New Service Features

UTF-8 encoded response

Unlike MQL 1.0 which converts everything in the response to ASCII with UTF-8 characters encoded as \uXXXX sequences, Dime returns full UTF-8 encoded responses.

  • eliminates need for additional encoding/decoding which degrades performance (JavaScript already decodes UTF-8 text just fine)

unified query/queries handling

The /api/service/mqlread url has been extended to allow query= or queries= to be used interchangeably. This works by inspecting the top-level keys in the JSON structure to determine whether they are envelope parameters. This determination is made by the presence of the "query" key. If not, it tests the assumption that the JSON contains a list of named envelopes (per the old queries= format) and processes them accordingly. For instance:

This is primarily a convenience for developers, allowing them to avoid using different url parameters depending on the form of the query or queries.

'q' parameter

A simplified form of /api/service/mqlread may be invoked by passing a single query without an envelope using the q parameter. The following are equivalent:

'encoding' parameter

The /api/service/mqlread url may be passed an encoding url parameter to specify the output encoding format for strings contained within the response. When absent, the default encoding is UTF-8. Valid values of the encoding parameter are described in Dime Text Encodings.

response errno values

MQL errors fall into several general categories described by the "code" parameter in the response envelope. However, it was requested that distinct error numbers (or errno values) be assigned to specific MQL errors so that they are more easily distinguishable to applications (see MQL-277). Errno values have been modeled after http status codes. The actually http status returned by a mql request is not the errno value, but is instead a simplified http status code returned for legacy reasons. Dime errno values are described in Dime Error Codes.

Dime allows the errno value to be returned as the http status code from /api/service/mqlread requests when the envelope parameter "errno_status":true is supplied. This is done for applications that require more consistent handling of errors, or a REST-ful API.

Known Issues

See the Dime component on the Freebase Issue Tracking System for the complete list of issues.

Release Notes

  • 2010-06-30T18:36:52Z - dime/trunk:97715
    • MQL-678: fixed malformed json for mid:[{}]
    • Fixed 500 error when no query parameter is supplied.
    • MQL-679: Fixed problem where checking for existence of 'index' property at top-level of query inferred wrong expected type.
    • MQL-676: Fixed problem where [{}] expansion for mediator properties was not optional.
  • 2010-06-25T23:06:36Z - dime/trunk:97515 - initial Labs release
Personal tools