Governing Enterprise Meta-Models and Value-Chain Instrumentation

what is metadata

Knowledge comes in multiple flavors. Information systems are used to process data or “content” that is structured or unstructured, across different flavors of knowledge. “Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information,” according to the National Information Standards Organization (NISO)1.

Metadata can describe both the structured content in databases, tables and columns, and unstructured content in images, videos, audio files, and documents. Metacontent governance brings “enterprise” level discipline to the metadata creation and management processes in each area of an organization to help achieve horizontal alignment across disparate divisions and stakeholder groups.

NISO describes three main types of metadata:

  • Descriptive metadata describes a category (glossary) or single instance of a digital resource for discovery and identification with elements such as subject, title, abstract, author, and keywords. A taxonomical model is descriptive.
  • Structural metadata indicates file or data type, formal link structures or how compound objects, such as pages and chapters of a book, are put together.
  • Administrative metadata provides information to help manage a resource, such as when, by what system or how it was created, data lineage, file type and other technical information, and who can access it. Types of administrative meta-content may include:
    • Rights management metadata describing intellectual property rights.
    • Preservation metadata describing policies and procedures governing resource archival and preservation (ibid1).

 

what is metadataThese types also apply to meta-knowledge about structured and unstructured content needed for metacontent governance. Meta-knowledge, or semantic knowledge about content, provides insight into what is represented by the tables, columns, attributes, objects, dimensions, files, and documents that knowledge workers gather and use to make better business decisions. The semantic insights include:

  • Data element and category definitions
  • Formulas for combining data into results and performance indicators
  • Source and lineage information (where it came from and who manipulated it)
  • The nature of associations between related data elements
  • The people responsible for managing the metadata

Decision makers often use the term “insight” to refer to information that describes or predicts customer or market behaviors and trends. The introspective insights in meta-knowledge facilitate and help deliver customer and market insights that can enhance success and competitiveness.

In the new knowledge enterprise, business users will have more ability to identify critical content and customize dashboards and reports, and even the workflows and rules that feed the repositories, warehouses, marts, and lakes from which they draw meaningful information. The outputs of some of the more intelligent systems will be in the form of actionable knowledge. The better the meta-model—the more actionable knowledge can be delivered.

Automated processes for creating metadata, in bulk through mining, or for each transaction, are needed to feed the model and the users with important semantic information. The model without the instrumentation is not enough. Value-chain instrumentation is one way the lineage and transaction metadata can be captured. Ideally, services that form part of every CRUD transaction go beyond logging the transaction (logs are hard to access and often out of reach for most users) to provide a compact statement of the current step in the lineage. Along with automatically mined data, this becomes a permanent part of the historical record of that data element and its successors, aggregates, and KPIs. When needed, manual input can be used to augment and enrich metadata for content of any type.

Retrofitting existing systems, especially commercial software with value-chain instrumentation, is naturally more difficult, and sometimes requires mining logs to extract lineage data. But the instrumentation delivers such rich and helpful historical data, especially needed when researching downstream data quality problems, that the effort is almost always worth the cost.

WHAT PROBLEM ARE WE SOLVING?

Meta-knowledge of any sort is used to clarify ambiguities in data and expose implications of change. In an enterprise, inspecting the metadata can resolve ambiguities when users or auditors ask, “Where did this data come from and how was it calculated?” Well-implemented canonical models combined with up-to-date metadata can help technicians answer: “How will this change affect connected systems or downstream data consumers and reports?” Both are served by the transparency provided by good metadata. In some cases, good semantic metadata can interpret requests and provide more complete answers or collateral information that can make the results more actionable.

These are significant benefits of metadata for metacontent management, but how do you govern its creation and management? The same stewardship and governance strategies that improve the consistency and quality of enterprise data today can be used in the future “knowledge enterprise” with a few additional considerations.

THE META-CONTENT GOVERNANCE PROCESS

The discipline involves an end-to-end process and governance framework for creating models and controlling, enhancing, attributing, defining, and managing their knowledge definitions. The desired outcome is correct, complete, and current models and definitions that can be used to support increasing usage in search, request processing, and reporting. Meta-content governance involves regular periodic governance discipline in which assigned stewards assist in defining, categorizing, organizing, and transforming information assets in a business domain, then instructing, championing, and evangelizing the business and technology evolution needed for broad adoption of knowledge-based innovations.what is metadata

 

  • Canonical modeling and content transformation
    • The model describes the semantics and associations in a structured way that rules can use to support complex processes.
    • The model resides in a semantic layer that can be used to improve information access.
    • The transformation is both physical and cultural with information naming and categorizing within the model, and processes that preserve and defend the canonical definitions.
  • Content convergence with metadata master management
    • Processes permit users with access to relevant unstructured content in any digital form to create metadata that places this content in the contexts in which it can be retrieved.
    • The benefits of internal crowd-sourcing the information are demonstrated as the meta-model in the semantic layer grows in breadth and depth.
    • Stewards scrutinize proposed additions and changes to categories and attributes to ensure managed expansion of the enterprise model and related sub-models.

While the long-term goal of standardization is very good, it is not always practical to roll out a new architecture to everyone at once. Divisions or silos can benefit from such innovations, and meta-content management solutions almost always need to begin in an isolated enterprise sub-domain. Once understood and proven, the deep benefits come in leveraging these innovations enterprise-wide to achieve vertical and horizontal alignment.

Governance roll-out may differ from solutions roll-out. It may be most advisable to implement top-down governance frameworks, disciplines, and tools at the broad enterprise level from the beginning, even if governance solution implementations begin in isolated pockets. Establishing enterprise tools capable of combining separate sub-domain metamodels can also provide valuable perspectives on the completeness and quality of separate implementations.

Tools such as SKOS (Simple Knowledge Organization System) can be used as a connective tissue or aggregator between different models even if implemented using different tools and standards. Many knowledge organization systems, such as thesauri, taxonomies, classification schemes, and subject heading systems, share a similar structure and are used in similar applications. SKOS enables governance professionals to capture much of the similarity, make it explicit, and enable knowledge and technology sharing across different services or applications.

Building out knowledge governance and semantic integration models across the enterprise makes it more feasible to push some governance responsibilities to individuals. This is an appealing choice for individuals who have unique expertise and unique automation needs. To mitigate some of the significant risks of such shared responsibilities, governing bodies will need to implement comprehensive technical security and auditing technologies, and institute appropriate checks and balances, periodic touch-points, and tools to manage governance responsibilities and accountabilities throughout the hierarchy.

STEWARDS CURATE METADATA QUALITY

As we push more responsibility for alignment to more subject matter experts, the middle managers become increasingly important in maintaining and curating metadata quality through small course corrections whenever misalignment occurs. Stewards watch for changes that could impair:

  • Consistency of definitions: The metadata glossary contains data element definitions to reconcile the difference in terminology such as “clients” and “customers,” “revenue” and “sales,” and formulas such as “gross margin” and “contribution margin,” or “members” and “subscribers.”
  • Clarity of relationships: The meta-model shows associations between data entities to help resolve ambiguity and inconsistencies. Hierarchical associations are important for managing inheritance of attributes, and synonymy associations connect different words used to mean the same thing in different systems.
  • Clarity of data lineage: Static lineage metadata, including its proper source of record, format, location, owner, and steward, describes lineage expectations in general terms. More granular operational metadata may capture auditable information about users, applications, and processes that create, delete, or change data, the exact timestamp of the change, and the authorization that was used to perform these actions. This can be gathered using value-chain instrumentation that tracks the origins of a particular data set (see TechTarget2).

Tools for lineage management should support proper governance processes and audit trails by:

  • Capturing end-to-end metadata describing upstream processes and data lineage.
  • Discovering and notifying stewards of metadata inconsistencies from multiple sources.
  • Enabling traceability from concept taxonomies and terms to logical and physical data schemas.
  • Automating metadata management lifecycle to support data stewards and stakeholders.
  • Empowering business users to understand where the data comes from that ends up as information in downstream reports and BI/analytics.
  • Exposing the impact of changing a data element on other data elements, reports, and queries.
  • Documenting needed information, how it is used, and highlighting redundancies in purchased data sources (Adaptive3).

Implementing metadata governance tools and processes requires budget and commitment, but the benefits are deep and lasting, and they help build a culture that increases agility through greater alignment across the knowledge enterprise.

Footnotes

1. Understanding Metadata
2. The Benefits of Metadata and Implementing a Metadata Strategy
3. Adaptive Metadata Manager

penn-state-online-ea-programs

Joe Roushar
About Joe Roushar 2 Articles
Joe Roushar is an enterprise business systems architect with experience in information and systems governance, architecting knowledge frameworks, and automating knowledge tasks. With graduate-level education in Natural Language Processing at Tokyo Institute of Technology and in artificial intelligence at the University of Minnesota, Roushar has spent the last few decades working in health insurance and financial services, manufacturing, retail, and government to improve outcomes through traditional architectures; hosted and XaaS strategies; advanced, model-based technologies; and content convergence. He holds a patent for an ontological approach to natural language understanding and translation. His blog is http://understandingcontext.com/