Business Commons/Updated business schema
The primary purpose for refactoring the business schema is to enable pipelines through which business data can be kept up to date using a combination of public sources and RABJ queues.
The current Business Commons has a type /business/company which has grown very large in an attempt to accommodate a multitude of cases. There is also a lot of redundancy in some of the commons schemas.
There are several issues I wanted to look at:
- The company and organization schemas are completely separate even though a company is usually an organization and shares almost all its properties
- Currently we have "ticker symbol" as a property for all companies. Companies do not have tickers - publicly traded companies issue stock which is traded under one or more ticker symbols. Further, companies can have multiple issues which trade at different market capitalizations.
- Freebase should allow us to express relationships between companies at a more detailed level than "Industry"
- Many of the fields specify a range for employment / involvement. This data is impossible to collect at any scale.
- The only way to specify a company's CEO or other corporate leaders is through the Employer schema, which is really more useful from the people/person side to allow people to create job histories.
- Title normalization is a huge problem, particularly when a person has multiple titles
In an effort to improve this, I started building a business2 schema to more accurately reflect the type of data we can acquire and the way it will be used.
A table of mappings between current properties and Business2 properties can be found here: Business and Organization Refactoring
- SEC Filings
- Bloomberg Open Symbology (BSYM)
- Analyst research reports
- News articles
Issuer / Issue
Some companies issue multiple classes of stock with different symbols -- the most well known example is Berkshire Hathaway which has Class A and Class B shares. This is different from a stock trading under multiple symbols.
Data sources such as BSYM separate issues from companies. This allows us to have a unique identifier not just for the company, but for every class of stock (and potentially other financial instruments) that it issues. To accommodate this, the business2 domain defines two new types, Issue and Issuer
The Issuer type simply links a company to one or more Issues.
The Issue type specifies a trading symbol, market capitalization, and the type of issue (currently only "Common Stock", but BSYM contains many more types which we can include if necessary).
This schema allows the specification of multiple issues per company and also removes properties like "Ticker Symbol" and "Market Capitalization" from companies that do not have publicly traded issues.
I propose a either a new Organization type or a modification to the existing one, which captures properties which would apply to all organizations such as date founded, founder and headquarters.
Further, corporations typically have a board of directors and an executive team, which can be respectively considered governance and leadership. Other types of companies and organizations may have different terms for these (partners, trustees, proprietor, etc.) but they all have the function of either governance or leadership.
In the business2 domain, almost all companies are also organizations. The board of directors is given in the governance property and the executive team are the leaders.
Organization also has properties called Child and Parent to express organizational hierarchy. I have already loaded thousands of wholly-owned subsidiaries into this tree structure. What we do with partial-ownership between companies has yet to be determined. Mergers, acquisitions, and spin-offs are also handled solely on the Organization type (the current Organization and Company types hold redundant properties for this data).
I also propose an additional co-type for people, tentatively called Leader which links leaders to organizations through the Leadership CVT, providing a title and as-of date (described below). Currently Leadership has both normalized and non-normalized titles, which will change depending on how practical it is to normalize titles.
The Business Operation type captures properties that apply to companies but not all organizations. This mostly covers financial data like revenues and assets that can be extracted from SEC filings.
The full benefit of Freebase is realized when we connect objects together in new ways. SEC filings contain information about major customers, so business operation also has a property to capture this information. Further, we have a concept called "Competitive Space" (described below) which allows us to link companies together via the spaces in which they compete.
One of the most important aspects of understanding a company is learning about its competitors. Traditionally, finance sites either classify everything by its "industry", which is incredibly broad, or have a small list of competitors.
Having a graph database allows us to experiment with different approaches. In business2, I created a type called Competitive Space, which allows us to specify not just which companies compete, but also what markets they are competing for and what brand names they use in those markets. Here's an example of Smartphones
This allows a company like Apple to compete with Google in the smartphone space, Microsoft in the operating system developer space and Dell in the personal computer manufacture space.
Employer is an existing type in the /business domain, whose purpose is to link people to their places of employment. My only suggestion here is to move the "number of employees" property to this type.
As of dates
The current schemas for board members and employment both have a start date and end date. While this is an intuitive level of detail to capture, these dates aren't available through any public data source.
In general, when looking at an SEC filing, all we know is that something was true at a specific time. I propose that for the leadership and governance CVTs, we use as-of dates instead of start and end dates.
In theory, we could later infer that someone being a CEO of Company A one year and the CEO of Company B the following year, means that he has left Company A. I think it's better to capture the data that's available and potentially infer things about it later.
Update: "as of" dates will not be used; the current practice of "from" and "to" dates will be maintained.
The venerable Company type will no longer exist, as such. All its properties will be moved elsewhere. Its keys will most likely be maintained for backwards-compatibility purposes, but the specifics of the refactoring tasks haven't been worked out yet.
Consumer Company would be a new type to hold properties related to consumer products. These properties would be moved from the current Company type.
Endowed Organization is a new type to hold information about endowments. Currently the "endowment" property is on the Organization type; however, most topics that will be considered organizations under the new schema do not have endowments, and there is no other existing type that the endowment property would make sense on.
Membership organization would hold the "members" property that is on the current "Organization" type. This type will be for any kind of organization which has members, whether those members are people or other organizations.
Organization Advisor will be an extension of the current Business Advisor type, applied to organizations generally, rather than simply to companies, since many types of organizations have advisors (whether single individuals or advisory boards).
Organization Partnership represents a named entity that is a partnership between two or more organizations. This is a change from the current Organization Partnership type, which is a CVT, and therefore does not represent named entities.
Organization Types and Sectors
There is a fair bit of confusion between these two types (Organization sector and Organization type) in terms of how people are using them. We're not sure whether it would make more sense to merge these types or to keep them separate; if we keep them separate, we will probably need to make stronger distinctions between them to both cut down on misuse and help people clean them up.