Data Governance Lifecycle

Introduction

Most people wonder what data governance actually looks like so I thought of pitching in with an example. I’m going to illustrate with a business which caters to retail clientele. This could include online shopping, store retail, banking and financial services, logistics and many more. All of these hold customer data, so let me pick an attribute that’s close to the customer – Date of Birth. This post explores what goes into getting Date of birth into governance.

Why should this be under governance?

A date of birth is one of the most personal attributes you can maintain for a customer. It is usually mandatory in financial settings. For product development, sales and marketing teams, knowing this will help classify and segment their customers and come up with strategies to match wants and needs. If nothing else, it is a password reset question to be put to the customer in numerous use cases. So ensuring this is captured accurately and owned by someone who can affect change – is of paramount importance.

Where is the start line?

Always start with usage and determine ownership. We need to know the physical data that is being used and who uses it? Not everyone who ‘uses’ the data ‘owns’ the data. So it is appropriate to find out who would be best placed to own it. Whoever owns it, needs to be fully invested in ensuring the validity and quality of the data. They also need to understand how it is used appropriately within the firm. In some instances it may be the sales or marketing teams who bring in or manage the customers. It could be Operations and Servicing who handle customers on a regular basis. Or it could be the P&L Product lines who produce what these customers want. While not ideal, It could also be the area which is impacted most if the data is wrong.

Ideally the business process that generates the data should own the data. For Date of Birth, a Banking institution could own it under a product head primarily because there may be different lines of businesses having their own customer bases. There could be jurisdictional laws that prevent personal data usage across borders. Sales and Marketing teams would be more appropriate for an eCommerce or retail setup because they are closest to the customer and are impacted if the data is incorrect or even missing. The nature of the business is therefore important to gauge where ownership should sit – there is no single correct answer.

Got an owner, what next?

Once an owner is invested in the data, the most appropriate course of action is to find out where the data is coming from and where it sits. Nowadays, all of this is digital and there are apps or systems that capture this information. What’s more important is where it goes and what is considered the authoritative data set.

Assuming a bank, date of birth can originate from a customer-entered online application, a branch officer initiated in house app or even a paper form sent in the mail. All of these are valid originating points which may eventually feed into a single data warehouse or lake which drives insights for the organisation. From here, it could also potentially go to external parties such as research bodies or regulators. This embodies the lineage of data within the organisation and is closely tied to business processes and technology systems. It is up to the data office to ensure the lineage and usage of data is appropriately captured so the owner knows where the data is coming from and where it’s being used.

Is this data right?

Once we identify where the data comes from and where it sits from a usage perspective, it is essential to assess it. This the role of controls which are put and monitored on a regular basis. For a financial institution, you would want an adult population for most of the products by law so any body who is less that 18 years of age should ideally not have been a customer (leaving aside trustees etc.). Alternately you do not expect to have customers over 100 years because the probability of that is very low as per life expectancy rates. These are controls that can be put at the time of capture in the system to prevent data from flowing in. Alternately they can be evaluated on an existing population of customers and appropriate actions can be taken.

In most instances, governance is imposed on existing data so a data office is often required to view and profile the data which is under use. A profiler can identify basic patterns in data and give immediate insights. It can tell you how many customers don’t have the date of birth and what are the ranges that you do have – that would explain the controls above. It can also detect strange patterns such as over 30% of the customers have a date of birth of 1st Jan. It means it is likely it has been defaulted somewhere. Perhaps your customers are putting wrong dates of birth on purpose (in which case you should evaluate if you really need this) or perhaps there is a systemic gap. The customers may be giving you their date of birth on capture. However, when it comes to the warehouse the transformation rules are not picking that up and instead defaulting it. This constitutes a data quality issue. This will need to be coordinated and managed to remediation namely with the sponsorship of the business owner.

Business as Usual

All of the above talks about what to look for but the key is to keep managing it constantly. Metadata for the data element needs to be put up and managed in a repository where anyone can look it at. It should also be attested to on a regular basis to ensure ownership is current. like employees, systems and usage of data can also change so lineages should also be attested to regularly. Data quality controls should not be a one-time exercise – they should be reported regularly to ensure there are no gaps. The Issue we raised on date of birth should go through prioritisation and subsequently remediation. It is naturally useful for the data office to make sure all of this is available to view and audit in one place.

Conclusion

The post explains what to do to get date of birth into governance from an operational perspective. In the next post, i will explore how this is actually done with the help of tooling and what good looks like.