MQL Manual/mqlread

From Freebase

Jump to: navigation, search

This article contains content originally taken from the MQL Manual, which is no longer maintained. It may need cleanup to make sense as a standalone document. Please go ahead and help us do this!

Unfortunately, this section of the manual is too big to put in the wiki as a single page. Therefore many parts of it have been split up into various other pages and redistributed elsewhere throughout the wiki. The resulting pages will probably need editing to make them independent.

See Special:WhatLinksHere/Template:Exmanual for wiki pages that were originally part of the MQL manual and are still not fully integrated into the normal wiki docs.

You can also download the MQL Reference guide as PDF if you want to see it in its original form.

What follows are the sections that were left after the parts were ripped out.


Contents

Metaweb Read Services

MQL Manual/Metaweb Query Language explained how to express Metaweb queries using MQL. This chapter explains how to deliver those queries to Metaweb servers and retrieve their response using the mqlread service. It also explains how to search Metaweb with the search service and how to retrieve chunks of data (such as images and HTML documents) using the trans service. The chapter includes example applications and libraries written in Perl, Python, PHP, and JavaScript and concludes with a sophisticated Python library for interacting with Metaweb's read services.

The mqlread Service

Now that we've seen some working code, this section explains more formally how mqlread works. Like all Metaweb services, mqlread is a web-based service: it takes an HTTP request as input and returns an HTTP response as its output.

The path to the mqlread service on a Metaweb server is /api/service/mqlread. To send a mqlread query to the Metaweb server running at api.freebase.com, for example, you'd use the following URL:

http://api.freebase.com/api/service/mqlread

mqlread works with both GET and POST request methods. GET requests are preferred unless the query is so long that POST must be used instead.

There are two sources of input to mqlread in an HTTP request. The first is the request parameters. For GET requests, these parameters are encoded in the URL itself, following a ? character. For POST requests, the request parameters appear in the body of the request. In both cases, the parameters are URI encoded in the standard way that web browsers encode HTML form submissions. mqlread recognizes request parameters query, queries and callback, and these are documented in sub-sections below. Every mqlread request must include either the query or queries request parameters (but not both). The value of these parameters is a JSON-serialized object known as a query envelope. In addition to holding the actual MQL query that is being submitted, this envelope object may also hold additional mqlread input in the form of "envelope parameters". Envelopes and envelope parameters are covered in detail below.

The second source of mqlread input in an HTTP request is HTTP cookies. mqlread looks for a cookie named mwLastWriteTime. This cookie is only necessary in applications that perform MQL writes as well as MQL reads, and ensures that recent writes are always visible to subsequent read requests performed by the same application (or by the same web browser). The mwLastWriteTime cookie is covered in MQL Manual/mqlwrite rather than in this chapter. In general, Metaweb-enabled applications need not track individual cookies. Instead, they can behave like web browsers do: any cookies returned as output by a Metaweb service should be included as input to subsequent requests.

The output of the mqlread service is the HTTP response body. This body is always (even when errors occur because of bad input) a JSON serialized object in text/plain encoding. This JSON serialized object is known as a response envelope, and it is explained in <xref linkend="responseenvelope"/> below.


The query Request Parameter

The simplest mqlread request includes a single request parameter named query. The value of this parameters is a JSON-serialized object known as a query envelope. This envelope object must have a property named query, and may also have additional properties that specify "envelope parameters" -- see <xref linkend="envelopeparameters"/>. The value of the query property in the envelope is your JSON-serialized MQL query. Thus a simple mqlread invocation uses a URL like this:

http://api.freebase.com/api/service/mqlread?query={"query":[{Your MQL here}]}

Notice that the word "query" appears twice in this URL. The first (without quotes) is the request parameter, and the second (with quotes) is a property of the envelope object. In this example, the square brackets are part of the MQL query itself, not part of the envelope syntax. Note also that everything after the equals sign should, in practice, be URI-encoded, which transforms quotation marks into %22, and so forth.



The queries Request Parameter

mqlread allows you to submit more than one MQL query at a time. To do this, you must use the queries request parameter instead of query. (Every invocation of mqlread must include one or the other, but not both.) The value of the queries parameter is not a simple query envelope as it is for the query parameter. Instead, it is a JSON-serialized object known as an outer envelope. The outer envelope contains named query envelopes. To submit two queries at once the outer envelope would have two properties. The names of the properties might be q1 and q2, and their values would be the two query envelopes that describe the two queries to be run:


{                                       # Start the outer envelope
  "q1": {                               # Query envelope for query named q1
    "query":{First MQL query here}      # Query property of query envelope
  },                                    # End of first query envelope
  "q2": {                               # Start query envelope for query q2
    "query":[{Second MQL query here}]   # Query property of q2
  }                                     # End of second query envelope
}                                       # End of outer envelope.

example:

http://api.freebase.com/api/service/mqlread?queries={

	"q0":{"extended": 1, "query":[{ "search": { "query": "larry", "id": null, "score": null }, "type": "/people/person", "name": null, "limit": 5, "sort": "-search.score", "age": null }] 
	},

	"q1":{"extended": 1, "query":[{ "search": { "query": "moe", "id": null, "score": null }, "type": "/people/person", "name": null, "limit": 5, "sort": "-search.score", "age": null }] 
	},

	"q2":{"extended": 1, "query":[{ "search": { "query": "curly", "id": null, "score": null }, "type": "/people/person", "name": null, "limit": 5, "sort": "-search.score", "age": null }] 
	}

}

The property names used within an outer envelope are arbitrary, but they appear again in the mqlread response.

The Response Envelope

The output of the mqlread service is an HTTP response, which consists of a set of HTTP response headers and a response body. The headers are not typically interesting (though Metaweb engineers might be interested in the X-Metaweb-TID header if you're submitting a bug report). In particular mqlread is not expected to return cookies as part of its output.

The body is the interesting part of the mqlread response. It is a UTF-8 encoded JSON-serialized object. This object is known as the response envelope, and its structure mirrors that of the query envelope with the query property replaced with a result property. If the request used the query parameter to submit a single query envelope, then the result is a single response envelope. The following are side-by-side views of a query and response envelope:

<tgroup cols="2"> <thead></thead> <tbody></tbody></tgroup>
Query EnvelopeResponse Envelope
{
  "query": [{ MQL Query Here }]
}
{
  "result": [{ MQL Response Here }],
  "status": "200 OK",
  "code": "/api/status/ok",
  "transaction_id":[opaque string value]
}

If the request used the queries parameter to submit multiple named query envelopes within an outer envelope, then the response is an outer envelope that uses the same names to refer to multiple response envelopes:

<tgroup cols="2"> <thead></thead> <tbody></tbody></tgroup>
Query EnvelopesResponse Envelopes
{
  "q1": {
    "query":{First MQL query here}
  },
  "q2": {
    "query":[{Second MQL query here}]
  }
}
{
  "q1": {
    "result":{First MQL result here},
    "code": "/api/status/ok"
  },
  "q2": {
    "result":[{Second MQL result here}],
    "code": "/api/status/ok"
  },
  "status": "200 OK",
  "code": "/api/status/ok",
  "transaction_id":[opaque string value]
}

Notice that response envelopes include code, status, and transaction_id properties. The code property is the most important: it specifies a success or failure status code (as a string) for the query. If the query was successful then the value of this property will be "/api/status/ok". If code does not equal "/api/status/ok", then there was an error of some sort, and the response envelope will include additional details in a messages array. See <xref linkend="mqlreaderrors"/> for further details about status codes and error messages, including details on the status and transaction_id properties.

Warnings

The mqlread/mqlwrite envelopes might also contain a warning key that is a dictionary of potential issues with your query. The currently known keys are:

deprecated_host
data type: string
This warning means that you are using a deprecated hostname for your API request.

Envelope Parameters

If you use the query request parameter you specify a single query envelope as its value. If you use queries instead, you specify one or more query envelopes within an outer envelope. In either case, each query envelope must include a property named query that specifies the MQL query to be executed. Each envelope may also include additional properties, known as "envelope parameters" that provide additional input to mqlread and specify how the query should be run. mqlread supports envelope parameters named cursor, escape, lang as_of_time and uniqueness_failure. These parameters are described below.


Fetching Large Result Sets with Cursors

Recall that MQL queries are implicitly limited to returning 100 results. You can use the MQL limit directive to specify a different limit, but when there is a very large result set, specifying a very large limit may cause your query to time out. When you expect that your query will have many results, and you want to retrieve all of those results, you should use a cursor <footnote> <para> A Metaweb cursor is related to, but not the same as a cursor used in SQL with a relational database. </footnote>. A cursor is simply a way of keeping track of your position within a large set of results, and it enables you to retrieve the results of a query batch by batch with multiple sequential mqlread invocations. Cursors are demonstrated later in this chapter in <xref linkend="metawebpython"/>. </para>

To begin a new query that uses a cursor, include a cursor property with the value true in the query envelope. The response envelope (see <xref linkend="responseenvelope"/> will then contain a cursor property. If the value of the cursor in the response is false, it means that all results have been delivered. Otherwise, that property will be a long string of opaque data. Use this string as the value of the cursor property in the query envelope, and submit that query envelope again (leaving the query itself unchanged). This time the response envelope will contain the second batch of results and a new value for the cursor property. Repeat these steps until the cursor property of the response envelope is false.

It is important to understand that cursors only work when multiple results are expected at the top-level of the query. The cursor property is part of the mqlread query envelope syntax, not part of MQL itself, and it cannot be applied to sub-queries of a query. Another way to say this is that it only makes sense to include "cursor":true in an envelope if the first character following "query": in the envelope is [. The query must be expressed as an array in order for a cursor to be meaningful. It is legal, but never useful, for example, to use a cursor in this query envelope:

{
  "cursor":true,
  "query": {
    "type":"/music/artist",
    "name":"The Police",
    "album":[]
  }
}

If you want to use a cursor to retrieve a list of albums in batches, the array of albums must be at the toplevel of the query:

{
  "cursor":true,
  "query": [{
    "type":"/music/album",
    "artist":"The Police",
    "name":null,
    "limit":10
  }]
}

Note the addition of the limit directive to the query to specify the size of each batch we want returned.

Cursor values remain valid after they are used. Once you have downloaded result batches sequentially, you can reuse saved cursor values to download those batches again in whatever order you like. (Except for the first batch which does not have a cursor.) Results retrieved with cursors are based on the state of the database as it existed when the first query (with "cursor":true) was issued. <footnote> <para> See the discussion of the as_of_time envelope parameter in <xref linkend="asoftime"/> for an explanation of how past results can be retrieved. The as_of_time parameter also allows you to re-retrieve the first batch of query results even though the first batch does not have a cursor value. </footnote> So results retrieved with a given cursor will always be the same, and will ignore any insertions or deletions that occurred after the original query. Cursors can be assumed to have a lifetime at least as long as that of your application. But updates to Metaweb's backend software can invalidate cursors, so you should not assume that they live forever. Cursors are not intended to be stored in databases or files or encoded into long-lived URLs, for example. They should not be considered "permalinks" or persistent bookmarks to a past state of the database. </para>



Disabling HTML Escapes

By default, mqlread uses HTML entities &amp;lt;, &amp;gt;, and &amp;amp; in its responses in place of the characters &lt;, &gt;, and &amp;, and this means that text returned by mqlread is safe for display in web browsers. To disable this escaping, add an escape parameter to the query envelope, and set its value to false. (You can also explicitly request HTML escaping with "escape":"html", but this is the default behavior and is not required.) If you do disable HTML escaping, you should be careful never to display the mqlread output in a web browser, since it could contain &lt;script&gt; tags that execute arbitrary JavaScript code, for example.

The escape envelope parameter was demonstrated in <xref linkend="albumlist2.pl"/> and we'll see it again in <xref linkend="albumlist.py"/>.



Specifying Your Preferred Language

As you know, Metaweb objects can have more than one value for the name property, but can have only one value in any given language. When you request the name of an object, it returns the name in your preferred language. The default language is English, but you can specify a different preference with the envelope parameter lang. The value of this parameter should be a language id in the /lang namespace. The following query envelope, for example, asks for the Spanish name of the French language:

{
  "lang":"/lang/es",
  "query": {
    "id":"/lang/fr",
    "name":null
  }
}

At the time of this writing, <footnote><para>September, 2008</footnote> mqlread supports only a single preferred language. If no name exists in the specified language, then null is returned. In the future, the lang envelope parameter is likely to evolve to support language fallbacks, and you should be able to request, for example, a name in Spanish, or in English if no Spanish name exists. </para>



Making Queries in the Past

Use the as_of_time envelope parameter in a mqlread query to specify that the query should be performed historically, against the Metaweb database as it existed at the specified moment in the past. The Metaweb database has a journaled structure, making this kind of historical query relatively simple and efficient to perform. Note, however, that type and schema information used in processing the query is current rather than historical.

The value of the as_of_time parameter should be a timestamp in the ISO 8601 format (see <xref linkend="typedatetime"/>) used by /type/datetime values in MQL. For example, the value "2007-02-03" represents midnight on February 3rd, 2007, and the value "2008-01-01T17:00Z" represents 5PM on January 1st, 2008. Metaweb timestamps are always stored in UTC (or GMT) time, and the as_of_time parameter assumes that your timestamp is specified in UTC. You may append Z to your timestamp to make this timezone explicit but you may not explicitly specify any other timezone. That is, you cannot add -08:00 to specify US Pacific time, for example.

Here are two queries (in their query envelopes, and named "now" and "then" in an outer envelope) that allow us to see how the number of defined types has grown over time:

{
  "now": {
    "query": {
	"return":"count",
	"type":"/type/type"
    }
  },
  "then": {
    "query": {
	"return":"count",
	"type":"/type/type"
    },
    "as_of_time":"2008-01-01"
  }
}

On 2008-04-25, freebase.com returned the following outer response envelope, showing the addition of over 900 types in under 5 months:

{
  "status" : "200 OK",
  "code" : "/api/status/ok",
  "now" : {
    "code" : "/api/status/ok",
    "result" : 5548
  },
  "then" : {
    "code" : "/api/status/ok",
    "result" : 4623
  }
}

In addition to the ability to run queries "in the past", Metaweb allows you to query the modification history of any object. See <xref linkend="history"/> for details.



Preventing Uniqueness Errors

When a MQL query or sub-query returns more than one result, but is not enclosed in square brackets to indicate that an array of results is expected, Metaweb normally returns an error code indicating that a uniqueness error has occurred. We saw in <xref linkend="uniqueerror"/>, for example, that the following query causes an uniqueness error because the object representing The Police has more than one type:

{"id":"/en/the_police", "type":null}

You can prevent this kind of error by setting the envelope parameter uniqueness_failure to "soft". (The default value is "hard"). With this parameter set to "soft", Metaweb simply returns one of the matching results, discards the others, and does not return an error or give any other indication that additional results are available.

Picking one (effectively random) result from a set and discarding all the others is not usually a useful strategy for handling multiple results, and the uniqueness_failure envelope parameter is not intended for use with queries like the one above. As a general rule, if a property is allowed to have more than one value, then queries of that property should be placed within square brackets.

When a property definition is changed to make the property unique after it is initially defined as non-unique, then it is possible (but rare) to find multiple values for a nominally unique property. You may want to use the uniqueness_failure envelope parameter when working with such a theoretically-unique property that is not yet unique in practice.



The callback Request Parameter

Every mqlread request must have a query or queries request parameter. They may also optionally have a callback parameter (this is a request parameter, not an envelope parameter). This parameter is used in JavaScript-based Metaweb applications and allows mqlread to be invoked using dynamically-generated &lt;script&gt; tags. (This &lt;script&gt;-based technique for client-server communication is commonly known as JSONP and we'll see examples in <xref linkend="javascriptmql"/>.)

The value of the callback parameter should be the name of a JavaScript function (without parentheses), such as processMQLResponse. Including the callback parameter in a mqlread invocation causes a small but very important change in the mqlread output. In order to understand these changes, recall that mqlread always returns a JSON-serialized object in the HTTP response body. Also recall that JSON is a subset of JavaScript which means that any JSON-serialized object that can be parsed by a JavaScript interpreter to re-create the object it represents. If you include a callback parameter in a mqlread invocation, mqlread returns a JSON-serialized object inside a JavaScript function invocation of the callback function you specify. Suppose, for example, that you invoke mqlread with a URL like this:

http://api.freebase.com/api/service/mqlread?callback=cb&query={"query":[{...}]}

In this case, the response will look like this:

cb({
  "status":"200 OK",
  "code":"/api/status/ok",
  "result":[{...MQL result here...}]
})

The JSON-serialized result envelope object is prefixed with the name of the callback function and an open parenthesis and is suffixed with a matching close parenthesis. This seems like a trivial change, but it transforms a bare JSON object into a JavaScript method invocation with that object as its argument. If the mqlread query URL is used as the src attribute of a &lt;script&gt; tag, the mqlread response becomes executable JavaScript content, and the callback function you name gets passed the response envelope object to process however it wants.

There is one other effect of using the callback parameter in a request. It forces mqlread to always return an HTTP status code of "200 OK", even when the query envelope is malformed and unparseable. The true HTTP status (which would otherwise have been returned) is available from the status property of the response envelope, and this enables a callback function to handle errors as well as successful queries.

See <xref linkend="javascriptmql"/> for practical examples that use the callback request parameter.

mqlread Error Codes

The mqlread response envelope does not always include the results of your query. Errors can occur if you invoke mqlread incorrectly, if you specify an invalid MQL query, if your query times out, or if there if there is an internal error on the Metaweb server. This section explains how to check for and handle mqlread errors.

If you invoke mqlread without either the query or queries parameter, or if the value of that parameter is not a valid JSON string, it responds with an HTTP status "400 Bad Request". Even though this is an HTTP error code, the response still includes a response body and that body is still a JSON object that you can parse. It looks something like this:

{
  "status": "400 Bad request", 
  "code": "/api/status/error", 
  "messages": [
    {
      "info": {
        "value": null
      }, 
      "message": "one of query=, or queries= must be provided", 
      "code": "/api/status/error/input/invalid"
    }
  ],
  "transaction_id":"cache;cache01.sandbox.sfo1:8101;2008-09-12T21:33:30Z;0001"
}

If you invoke mqlread with correct parameters and a parseable query envelope, then it will always return an HTTP status code of "200 OK". This does not mean, however, that no error has occurred. If the query envelope is valid JSON but does not have a query property, for example, then you get this response:

{
  "status": "200 OK", 
  "code": "/api/status/error", 
  "messages": [
    {
      "info": {
        "key": "query"
      }, 
      "message": "Missing 'query' parameter", 
      "code": "/api/status/error/envelope/parse"
    }
  ],
  "transaction_id":"cache;cache01.sandbox.sfo1:8101;2008-09-12T21:35:19Z;0001"
}

And if the invocation is correct and the envelope is correct but the MQL query is invalid, then you might get a response like this:

{
  "status": "200 OK", 
  "code": "/api/status/error", 
  "messages": [
    {
      "info": {
        "expected_type": "/music/artist", 
        "property": "albums"
      }, 
      "query": {
        "albums": [], 
        "type": "/music/artist", 
        "id": "/en/the_police", 
        "error_inside": "."
      }, 
      "message": "Type /music/artist does not have property albums", 
      "code": "/api/status/error/mql/type", 
      "path": ""
    }
  ],
  "transaction_id":"cache;cache01.sandbox.sfo1:8101;2008-09-12T21:36:51Z;0001"
}

If your query is simply too big (such as asking for the names and discographies of 10,000 bands) the query will timeout and mqlread will return a response like this:


{
  "status": "200 OK", 
  "code": "/api/status/error", 
  "messages": [
    {
      "info": {
        "detail": [
          "timed out"
        ], 
        "timeout": 8.0
      }, 
      "message": "Query timeout", 
      "code": "/api/status/error/mql/timeout"
    }
  ],
  "transaction_id":"cache;cache01.sandbox.sfo1:8101;2008-09-12T21:39:01Z;0004"
}

Each of these error response envelopes includes the properties status, transaction_id, code, and messages. The status property simply repeats the HTTP status code. If you check the HTTP status code, you can ignore the status property. But if you use the callback parameter in your request, then mqlread will return a HTTP status of "200 OK", even when invocation errors occur. In this case the status property can tell you that an invocation error occurred.

The transaction_id property is always present in the response envelope whether or not an error occurred. Its value is a unique identifier for your request and enables Metaweb engineers to look it up in their internal log files. You should include the value of this property whenever you report a bug or ask a question about a query on the Metaweb developers mailing list.

The code property specifies the error code for the query. It is present in every response envelope, and you should always check this property after parsing the response from mqlread. The value of this property is always a string: if it is "/api/status/ok", then the query was successful and no error occurred. Otherwise something went wrong.

When the value of the code property is "/api/status/ok", the response envelope contains the query results as the value of a property named result. When code has any other value, the response envelope includes a messages property instead of a result property. The value of the messages property is an array (usually of length 1) of message objects each of which has the following properties:

code
A more detailed error code that more precisely specifies the nature of the error. Note that this code property of the message object is usually distinct from, and more informative than, the code property of the response envelope.
message
A human-readable description of the error.
info
An object that provides additional details about the error. The properties of this object depend on the error code.
query
For errors that result from an invalid MQL query, this property is a copy of the query object with the addition of a special error_inside property, to indicate where error occurs.
path
When the message object contains a query property, it also contains a path property that specifies the "path" of property names from the root of the MQL query to the the location of the error. This is an alternative to the error_inside property for locating the source of the error. If the error is in the outermost object of the query, then this property is just an empty string.


The descriptions above of error-related properties are valid when you use the query request parameter. If you use queries instead, there are a few differences you should be aware of. First, the status property appears only in outer envelope, not in the individual response envelopes it contains. Second, there are code properties both in the outer envelope and in the individual response envelopes. The outer code will only indicate an error, however, if there was an invocation error (such as a unparseable query envelope), and in this case there won't be response envelopes inside the outer envelope. As long as mqlread is correctly invoked, the code property of the outer envelope will be "/api/status/ok", even if there were errors in one or more (or all) of the queries. It is the code property of the individual response envelopes that specify the status of each individual query. Third (and this really follows from the second), the messages property only appears in the outer envelope for invocation errors. Otherwise, messages properties always appear within the response envelopes.

<xref linkend="albumlist2.pl"/> included example code for mqlread error handling, and many examples that follow will also include error handling code. The general rule is to check the code property of any response envelope before "opening" it to extract the result. (And if the response envelope is inside an outer envelope, you must also check the code of that outer envelope before opening it to extract the response envelope.) If you find a code that is not "/api/status/ok", you can typically construct a suitable error message with messages[0].code and messages[0].message. If you have reason to expect a certain class of errors, you can refine your error reporting based on messages[0].code and messages[0].info.

Personal tools