Data Governance Lifecycle – Illustration

Introduction

Following on from the Date of Birth example in the previous post, let’s evaluate how the cycle would actually work in practice. Assuming the person raising this is a consumer of this data, below are the steps that would be appropriate. Note that this post assumes that the CDO Governance function is set up within the firm. This means that there are regular data councils with business owners with appropriate escalation routes.

The Engagement

To bring this element into Governance, an engagement will need to be set up with the CDO Business Partner in the first instance. The business partner will follow the below workflow.

Ownership

Note that the business partner will first need to check if the element requested is already under governance. If it is, the requester can be informed of it and asked to use the authoritative sources of data. If not, the partner will take this through the governance lifecycle.

This requires a repository of metadata that can be checked with ease. Assuming date of birth is not in the metadata repository, an appropriate owner will need to be identified. Here the Business partner will check if there are similar attributes under governance. e.g. First Name, Last name etc. If so, it is possible for the owner of that Personal data to also own Date of birth. Discussions can then commence to bring this in. Alternately if none exists, the data council with business lines is an appropriate forum to voice this request and get ownership.

Once ownership is determined, the business partner will bring in the steward and the data engineer to review the request further.

Metadata Capture

The Steward will now look at metadata needs for setting up a Date of birth in the repository. At a minimum this should include:

  1. Name
  2. Description
  3. Owner
  4. Authoritative sources of data – including the applications where the data is stored – the master, the access layers and the origins. For instance, origin can be a third party customer tool like salesforce, the master could be a data lake/warehouse which is in use org wide and the access layers can be all the reporting data marts.
  5. Lineage – where does the data originate and where does it go. This is always a difficult step to achieve as most owners do not know how the data is used once it goes out of their systems. However, a business partner will enable those conversations with potential downstream applications. In addition, this will be constantly reviewed with other areas during day-to-day operations.
  6. Controls – Are there existing controls that apply to this element? e.g. a technology ETL reconciliation could be in place for other elements in the same application

The Steward will now add all of this information into a metadata repository. As far as possible avoid using excel as a tool for managing enterprise level metadata. Numerous tools are available in the market and can be configured for cataloguing metadata. Alternately, build something in-house to tailor to specific needs.

Physical Data

Once authoritative sources of data are identified, then comes the role of a Data Engineer. Any new data element onboarded should go through a data profiler. A profiler identifies nuances of data and can help in identifying Data quality controls as well as issues. Naturally access to the physical data pertaining to this is paramount. Below is an output of a profiler (Credit: Pandas Profiling) which showcases the behaviour of the data point.

Notice that Date of Birth captured goes from 1920 to 2005. For instance it appears there are customers over 100 years old and also some below 18. The rules that can be put on this data are therefore:

  1. It should be complete i.e. always populated – with a target of 100%
  2. Dates should range between 18 years and 100 years past from the current date with a target of 99% – the 1% could be real business exceptions to be handled operationally.

A process will need to be set up to measure the output of the rules on an agreed frequency against a target. Any deviations will need to be raised to the data owners and any governance forums set up.

The issues that we currently see should be reported to the data owner. A decision will then be required to formally log this as a Data Quality issue or to risk-accept this given the nature of the business.

Conclusion

A data element is now stood up under governance within the metadata repository. Metadata, lineage and data quality are now in place. Anyone looking to review Date of Birth should first look at the repository. While this is a new setup, there are other elements which will also need to be taken into account such as Data Architecture. For instance, an Architect would ensure that there is only one authoritative source of data being used downstream. They would challenge changes which sought to introduce a new data source.

Involvement of Technology and Architecture will enable a long term technology strategy in tune with the data needs of an organisation. Any projects with data needs should also seek inputs from CDO-aligned Architect. This will enable governance to stay intact in case of any new data coming in or amendments.

In addition, all of the information captured will need to be reviewed and amended by the steward on an agreed frequency, say yearly, because organisations change. The engineer will also need to review the physical data via the rules regularly to identify anomalies and amend as appropriate. That effectively translates governance to a business-as-usual mode.