FREE ELECTRONIC LIBRARY - Books, dissertations, abstract

Pages:   || 2 | 3 |

«Overall Approach to Data Quality ROI By: William McKnight, founder and president, McKnight Associates, Inc. Sponsored by: Firstlogic, Inc. THE ...»

-- [ Page 1 ] --

White Paper

Overall Approach to Data Quality ROI

By: William McKnight, founder and president,

McKnight Associates, Inc.

Sponsored by: Firstlogic, Inc.


Data quality is an elusive subject that can defy measurement and yet be critical enough to

derail any single IT project, strategic initiative or even a company as a whole. The data layer

of an organization is a critical component because it is so easy to ignore the quality of that data or to make overly optimistic assumptions about its efficacy. Having data quality as a focus is a business philosophy that aligns strategy, business culture, company information, and technology in order to manage data to the benefit of the enterprise. Put simply, it is a competitive strategy. One day, like operational excellence, rich product features, everyday low prices, high product quality and short time-to-market, data quality will be expected by our markets. In the meantime, each company has the opportunity to differentiate itself through the quality of its data. Leading companies are now defining what the marketplace data quality expectation will be.

A parallel trend to data quality improvement in the marketplace is the come back of returnon-investment (ROI) measurement systems for technology-based initiatives. No longer is it acceptable to “throw money at problems,” target soft measures or lack accountability for results with technology spending. The approach of targeting ROI is even viable for efforts to improve the quality of data in a company and many executives are demanding payback for quality initiatives.

It is important to note that there are many benefits that accrue from improving the data quality of an organization. Many of these benefits are “intangible” or unreasonable to measure.

Benefits such as improved speed to solutions, a “single version of the truth,” improved customer satisfaction, improved morale, an enhanced corporate image and consistency between systems accumulate but an organization must selectively choose those benefits to perform further analysis on and convert to hard dollars. ROI must be measured on hard dollars.

A program approach to data quality is required to measure data quality ROI. Data quality improvement is not just another technology to implement. We must change our way of doing business to fully exploit data. Investments in the technologies as well as in organizational changes are necessary to reap the full rewards. Data quality is right in the “sweet spot” of modern business objectives that recognize that whatever business a company is in, it is also in the business of data. Those companies with more data, cleaner data, accessible data and the means to use that data will come out ahead.

But how do organizations begin and justify data quality improvements? The cleansing process and maintenance will cost money and may require dedicated staff. Management may not release funds dedicated to data quality improvement until such time as the cleansing will improve the reliability and accuracy of key business processes like trending, analysis, or billing for product sales. Data quality can — and must — measure its success based on its contribution to the improvement of such objectives.

However, there has not been a methodology to articulate and improve data quality ROI, until now. You can’t improve what you can’t measure. So, we need a means for measuring the quality of our data warehouse. Abstracting quality into a set of agreed data rules and measuring the occurrences of quality violations provides the measurement in the methodology, which was developed in conjunction with IT best practice successful efforts with data quality.


Step 1: System Inventory In an environment that has not focused on its data quality, there is likely to exist many systems that require clean data to operate well and make or influence correct decisions.

In the System Inventory step, company systems are surveyed and prioritized according to their use of and need for quality data. These high priority systems rely on its data being accurate in order to perform their intended function and can be found across the artificial boundary between operational systems and decision support or data warehouse systems. Systems needing clean data come from all categories including Enterprise Resource Management (ERM), Customer Relationship Management (CRM), Data Warehouse, E-Commerce and Call Center or customer service.

For the purposes of this paper, we will use a targeted marketing example to better explain the data quality ROI process. The concepts applied to this target market example can be applied to other IT systems whether it be your business intelligence, CRM applications or other critical applications.

In the next step, we’ll drill down on the detail in those systems.

Step 2: Data Quality Rule Determination Data quality can be defined as a lack of intolerable defects. There is a finite set of possibilities, which can constitute data quality defects and which categorize all data quality rules, as found in the list below. The rules generated in this step are the rules that you wish the data to conform to. These rules can apply wherever important data resides. This step will define the data quality rules that we would like our data to live up to.

□ Referential Integrity (RI) refers to the integrity of reference between data in related tables.

For example, product identifiers in sales records need to also be found in the product tables. RI is essential to prevent the users from navigating to a “dead end.” Many transactional systems utilize database-enforced RI, but many data warehouse systems do not due to the performance and order restrictions it places on data loading. But RI is still important and must be enforced somehow. In the targeted marketing system, for example, bad RI may mean sales with an undefined product and/or to an undefined customer, which effectively takes that sale out of the analysis.

□ Field(s) assigned to uniquely identify a business entity, like a product identifier, should also be unique. It would constitute a violation when the same identifier is used to identify multiple products. This also has implications for RI. Not only do we not want “dead end” navigations in the data, but we also want a singular reference. In the targeted marketing system, for example, a single product for product identifier number 123 would be expected.

□ Cardinality restricts the volume of referencability for any given value. For example, for the targeted marketing system, we may expect to find between one and three addresses for a customer – no more and no less. Other relationships may be based on 1-to-1, 1-tomany (infinite) or 0-to-many. The 0-to-many is interesting because it allows for the entry of a product without a sale. There are reasonable conditions on the cardinality in this case whereby these products would be “new” products – perhaps products with an entry date in the last 14 days only.

These first three components of cleanliness can be derived, for databases, from review of the data model.

□ Subtype/Supertype constructs may also be found in the model. For example, the targeted marketing system may represent a customer’s marital status as married, divorced, widowed or single. Different constructs may apply to each state, like formerspouse data for divorced persons. Subtype/supertype bounds the states of a given business entity. The logical progression through the various states for a business entity (i.e. prospect, customer, good customer, former customer) is another quality check.

Having a robust logical data model from which one can glean rules is an asset in determining rules for your data quality process. If the data model is missing or inadequate, this may be required before completing this step.

There are also measures associated with data value appropriateness such as the next two rules:

□ Are there reasonable domains for values? This is especially useful for analyzing numeric data, whose data types can allow an enormous range of values, only a subset of which are reasonable for the field. These include items like incomes $2,000,000 or 0; lastyear purchases $500,000 (for example a department store) or 0; and offers mailed 1,000,000 or 1. All of these values would be considered unusual at the least and quite possibly incorrect. That’s why we want to flag them for investigation.

□ Are data columns being used for multiple meanings? Data types constrain the values in a column … to a degree. Any mix of 0-20 characters can go into a character (20) data type column. However, if this is a Name field, there are some characters that you would not expect to find in certain columns such as % and $. These would be flagged that the field contained inappropriate data. There are also numerous misspellings and incorrect alternative spellings of last names.

There are various formatting errors that can be found in the field. Typical of these formatting

errors found in name columns include, for example:

□ A space in front of a name □ Two spaces between first and last name and/or middle initial □ No period after middle initial □ Inconsistent use of middle initial (sometimes used, sometimes not) □ Use of ALL CAPS □ Use of “&” instead of “and” when indicating plurality □ Use of a slash (/) instead of hyphen (-) In environments where original data entry is “free form,” unconstrained and without the use of master data as a reference, many of the violations on the previous page will be found.

Another type of typical data quality error involves columns that should only contain data if certain data values (or null) are present elsewhere. For example, in the customer table where some customers are organizations and others are individuals, only organizations will have SIC code, organization size, etc. (unless the presence of that data actually represents data for the individual’s organization).

Some systems derive data from data in other systems. For example, while one system may contain discount amount and unit price, the discount percentage — a simple division calculation — could be a new calculation. Checking the veracity of calculations like customer lifetime value for our targeted marketing system is a common quality check.

The existence of data consistently through all of its history, without gaps, indicates users will not be left to wonder what happened during missing periods, is another common quality check.

There is also incorrect data. For example, the inconsistent use of initials and misspellings or inappropriate spellings (i.e., William McNight instead of William McKnight).

Finally, does the data conform to the expected set of “clean” values for the column? For example, in the gender column we would expect to find M, F, U (unknown) or the like.

Data quality rule determination is determining the rules that will govern our data. Data characterization, the next step, determines where we are with the quality of the data.

Step 3: Data Profiling It’s one thing to sit back and pontificate about to what rules the data ought to conform.

The next step is to determine the data quality with a data profiling and prioritization exercise. Usually no one can articulate how clean or dirty corporate data is. Without this measurement of cleanliness, the effectiveness of activities that are aimed at improving data quality cannot be measured.

Measuring data quality begins with an inventory. By taking account of important data across several tangible factors that can be used to measure data quality, you can begin to translate the vague feelings of dirtiness into something tangible. In so doing, focus can be brought to those actions that can improve important quality elements. Ultimately data quality improvement will be performed against a small subset of data elements, as you will find most elements already conform to standard. The subset must be selected carefully, however. Another way to put it is that data quality initiatives will not be comprehensive across all corporate data elements and all possibilities.

Data profiling can be performed with software programs like IQ Insight™ from Firstlogic, SQL or similar queries against the data showing the spread of data in the systems and checking for rule adherence. For our targeted marketing example, we may find that the gender column is distributed as shown in Figure 1 below.

Pages:   || 2 | 3 |

Similar works:

«Freie Universität Berlin Fachbereich Wirtschaftswissenschaft The Incidence of Taxation and Regulation in the Labor Market Inaugural-Dissertation zur Erlangung des akademischen Grades eines Doktors der Wirtschaftswissenschaft (Dr. rer. pol.) vorgelegt von Pia Rattenhuber Geboren: am 1. August 1980 in München, Deutschland Wohnort: Berlin Jahr der Einreichung: 2012 Erstgutachter: Prof. Dr. Viktor Steiner Zweitgutachter: Prof. Dr. Bernhard Boockmann Tag der Disputation: 15. Februar 2012 Contents...»

«Born Global – Die Herausforderungen einer internationalen Unternehmungsgründung Autoren Ralph Lehmann Lutz E. Schlange PD Dr. oec. publ. Dipl.-Kfm. Dr. oec. HSG Professor für Management Professor für Betriebswirtschaftslehre und Hochschule für Technik und Wirtschaft Chur Marketing Comercialstrasse 24 Hochschule für Technik und Wirtschaft Chur CH-7000 Chur, GR Comercialstrasse 24 Tel. +41 081 286 39 45 CH-7000 Chur, GR Mail: ralph.lehmann@fh-htwchur.ch Tel. +41 081 286 39 66 Mail:...»

«Augsburg Universität Augsburg Geldanlage bei Inflationsrisiken und politischen Risiken Inflationsindexierte versus klassische Bundesanleihe – Eine Gegenüberstellung im Kontext der Finanzmarktkrise Betreuender Hochschullehrer: Prof. Dr. Marco Wilkens Studentische Teammitglieder: Jennifer Brummer Markus Buxbaum Ferdinand Kandlhofer Alexander Linhart Beitrag zum Postbank Finance Award 2012 II Inhaltsverzeichnis Seite Inhaltsverzeichnis Abbildungsverzeichnis Tabellenverzeichnis...»

«Rainer Falk Das Pendel schwingt zurück Erkundungen an den Grenzen des Washington Consensus „The political balance is swinging back. Neoliberal globalization will be written about ten years from now as a cyclical swing in the history of the capitalist word-economy. The real question is not whether this phase is over but whether the swing will be able, as in the past, to store a state of relative equilibrium in the world-system.” Immanuel Wallerstein (2008) Der Anstoß von Joe Stiglitz und...»

«Die Musikgeschichte Der Abtei Reichenau Im 10 Und 11 Jahrhundert Ein Versuch The items are ultimately inefficient course than bracket of the financial positive company. Job mind that is deductible action fears or the difference to fit the online number and some waiting research Die Musikgeschichte Der Abtei Reichenau Im 10. Und 11. Jahrhundert: Ein Versuch on that big 1:1. Executive and Die Musikgeschichte Der Abtei Reichenau Im 10. Und 11. Jahrhundert: Ein Versuch gray hours can believe a...»

«BEST PRACTICES FOR NIGHTLIFE ESTABLISHMENTS TABLE OF CONTENTS Developing a Safe Nightlife Atmosphere Security Intoxication Sexual Assaults Employees Age Verification Promoters Club Policies Police – Community Relations Response to Serious Criminal Incidents – “The Crime Scene” Pre-Incident Post-Incident Relevant Penal Law Sections Counterterrorism Best Practices Terrorist Strategy Terrorist Goals Terrorist Target Selection Terrorism and the Nightlife Industry Terrorists View Nightlife...»

«Nordic Journal of Religion and Society (2011), 24 (2): 205–213 Lene Kühle CONCLUDING REMARKS ON RELIGION AND STATE IN THE NORDIC COUNTRIES In Jonathan Fox’s survey of religion-state relations in 175 countries, Denmark, Finland, Iceland and Norway are categorized as cases of «Active state religion with low type of restrictions on minorities» (Fox 2008: 114), while Sweden is in the group of «Preferred treatment for some religions or support for a particular tradition with low type of...»

«HP Advanced Data Center Network Architectures Key Requirements for Building Modern, VirtualizationOptimized Data Center Networks Technical white paper Table of contents Introduction Key Business and Technology Drivers for a New Data Center Network Architecture Large-scale Data Center Consolidation BladeSystems and Server Virtualization Technologies New Application Deployment and Delivery Models HP’s Solution: FlexFabric Networks Wire-Once, Virtual I/O Data Center Network Data Center...»

«Prof. Dr. Georg Müller-Christ FB 7 | Nachhaltiges Management E-Schriftenreihe Nachhaltiges Management Bror Giesenbauer Diversity Management zwischen Markt und Moral Eine systemische Analyse Masterarbeit 2015 | Nr. 1 Prof. Dr. Georg Müller-Christ (Hrsg.) Universität Bremen Fachbereich 7 und 11 M. Sc. Wirtschaftspsychologie Masterarbeit Diversity Management zwischen Markt und Moral. Eine systemische Analyse. vorgelegt von Bror Giesenbauer bror.giesenbauer@web.de Erstgutachter: Prof. Dr. Georg...»

«BAMBERGER BEITRÄGE ZUR WIRTSCHAFTSINFORMATIK UND ANGEWANDTEN INFORMATIK ISSN 0937-3349 Nr. 97 Konzeption einer Komponentenarchitektur für prozessorientierte OLTP& OLAP-Anwendungssysteme Thomas Benker Oktober 2015 Nachfolgender Beitrag wurde beim Workshop für Business Intelligence 2015 (31. Juli 2015, Brühl) vorgestellt. Die Begutachtung zur Beitragsannahme erfolgte doppelt blind. FAKULTÄT FÜR WIRTSCHAFTSINFORMATIK UND ANGEWANDTE INFORMATIK OTTO-FRIEDRICH-UNIVERSITÄT BAMBERG Konzeption...»

<<  HOME   |    CONTACTS
2016 www.book.dislib.info - Free e-library - Books, dissertations, abstract

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.