«Abstract: This paper discusses data quality challenges in the context of eBusiness Transformation. It presents the major differences between ...»
Data Quality Challenges in Enabling eBusiness Transformation
(Research in Progress)
Arie Segev, Professor & Director*
Fisher Center for Information Technology and Marketplace Transformation
University of California, Berkeley
Associate Professor Research Affiliate Co-Director for
Boston University Fisher CITM, U. C. Berkeley MIT TDQM Program
Abstract: This paper discusses data quality challenges in the context of eBusiness Transformation. It presents the major differences between traditional and eBusiness as they relate to business models, organizations, processes and technologies, and then outlines the differences with respect to data quality approaches. The scenarios described pose significant data quality (and other) challenges, and the paper discusses work in progress to construct a data quality strategy and implementation methodology.
1. INTRODUCTION The field of data quality has witnessed significant advances over the last decade. Today, researchers and practitioners have moved beyond establishing data quality as a field to resolving data quality problems, which range from data quality definition, measurement, analysis, and improvement to tools, methods, and processes [1, 3, 5, 6, 11-19]. With many of the theoretical foundations developed, researchers have begun to go beyond the fundamental data quality research to solving critical business problems. For example, research has been initiated to investigate how to develop data production maps for information supply chain management and remanufacture . Another area of active research is the conceptualization and software implementation for corporate household . One research area that has not been actively pursued, however, is data quality in the context of eBusiness.
The Internet and eBusiness added new complexities to data quality primarily due the increase in a company’s interaction with its environment – externalization; and new levels of data integration resulting from new business models. That business-to-business (B2B) integration calls for the augmentation of data manufacturing models with data logistics concepts.
Furthermore, it is imperative that organizations establish data quality strategies and implementation methodologies combined with their eBusiness transformation approaches.
In this paper we focus on B2B eBusiness, but there are obvious links to B2C eBusiness, for example, product and inventory information, which is used for B2C purposes, would inherit quality problems that were introduced in the data manufacturing process.
* The work of this author was supported by the External Acquisition Research Program (EARP) under contract N00244-99-C-0034 eBusiness Transformation entails business, organizational and technological aspects. It should be based on a comprehensive top-down view of the enterprise and its environment and incorporates proven principles when applicable. Basic principles of conventional information systems methodologies that have been developed in the last ten or more years still apply, but the scope and context have changed significantly. Section 2 discusses the eBusiness transformation process and elaborates on the business integration aspect. Section 3 then elaborates on the inter-company aspect and discusses four different scenarios; examples from the domain of B2B eProcurement are presented.
2. eBUSINESS TRANSFORMATION
eBusiness Transformation entails business, organizational and technological aspects. It should be based on a comprehensive top-down view of the enterprise and its environment and incorporates proven principles when applicable. Basic principles of conventional information systems methodologies that have been developed in the last ten or more years still apply, but the
scope and context have changed significantly. The new context is characterized by:
• New business models, applications and related requirements
• The externalization level of companies
• The degree of required interconnectivity and integration
• The rate of change (technology and business models).
The second bullet point indicates that increasingly, company’s processes are shifted outwards as part of new business models involving interactions with customers, suppliers and partners. This, in turn, has led to an exponential increase of the company’s interfaces (i.e., the level of business connectivity). From a process and data perspective a new level of Business-to-Business integration need emerged. A typical methodology used in addressing this need has been to expand the Enterprise Application Integration (EAI) technology beyond the corporate walls and delivers the full promise of eBusiness by integrating customers, suppliers and partners (see Figure 1). The basic principle is to create a decomposition-based application and technical infrastructure to support the business objectives and satisfy various performance constraints.
The previous figure represents sound decomposition principles, but it has the disadvantage of conveying a single company’s perspective. We prefer to use the diagram of Figure 2, below to emphasize the new eBusiness requirements. It is important to note that while the requirements at the bottom of the figure are referred to as “technical,” they of course have significant business and cost ramifications. The figure does not represent a decision model itself, but rather includes the scope, the elements, and the overall ideal order of the decision types. Frequently, one has to deal with a subset of the issues in a narrower and less systematic fashion, but whenever possible, this general framework should be followed or related to. The emphasize here is on the scope of the inter-organizational processes, the required infrastructure, as well as new organizational and skill dimensions.
Figure 2: A Framework for eBusiness Integration
The new type of eBusiness applications involves a business and technology change in delivering products and services. An immediate requirement that companies face is to Web-enable legacy systems. Web servers, and the application server in particular, have become the foundation of business service delivery and, consequently, the Web service model must be central to modernizing or moving out of legacy systems. It is important to understand that this model is as much a business model as it is a technology model. The range of Web-enabling possibilities is wide, but two general approaches are used when legacy systems are present. We refer to them as Level I and Level II. In Level I solution there is no significant change in functionality and it is based on creating an interface between the legacy system and the Web server. The concern here is the presentation and user interface, and it is similar to “GUI wrapping” that became a popular approach in the early days of client server. In the case of Level II solution, additional process functionality is introduced. The advantage of an application server is that it can be used for various degrees of Level I and Level II integration as shown in Figure 3. It also allows various degrees of inter-process integration and data quality enhancement. As an example, the application server enabled legacy application in the figure can provide data to a new objectoriented Web-based application, and the integration of the two at the application server provides the unified added value service that underlies the new eBusiness process. Furthermore, simple, but important, data quality enhancements can easily be introduced at Level II, e.g., performing validity checks on data attributes that were not implemented in the original legacy system.
Cleaning the data at this junction is more effective and cheaper than data cleaning procedures downstream. In addition to the accuracy enhancement of Level I, Level II enhancements can be not only functional but also data quality. Relevant dimensions are completeness - enhanced through capturing more data and possibly relating it to other data (semantic completeness);
timeliness – enhanced by capturing real-time data instead or in addition to other channels. In the next section we analyze in more details the data logistics as it moves across companies; the webenablement approach described above is also applicable to many of those cases.
3. INTER-COMPANY eBUSINESS INTEGRATION AND QUALITY ENHANCEMENTThere are four primary cases of inter-company eBusiness scenarios discussed below with respect to data quality strategies. These cases are discussed in the context of coordination and negotiation in , , , .While not capturing all possible scenarios, we believe that these cases are the most important and represent the majority of real-life scenarios.
Case I: 1C
The case of a single company corresponds to the traditional intra-company data quality scenario.
As discussed earlier in this paper, this case has received intensive attention in the last ten years both in academia and industry. The application server example in the preceding section is applicable to this case.
Case II: 2C
The case of two companies corresponds to dedicated systems between two trading partners, ranging from faxed papers and telephone calls, to traditional EDI and Web-EDI, to contemporary XML-based connectivity. Data quality issues in traditional systems (many of them are legacy systems) were identified and addressed long time ago both in research and in industry. In the case of fax, telephone errors are introduced due to “misunderstanding” and more errors through retyping,. There is often a “semantic reduction” as a result of translations to other systems. For electronic transmissions the following are common cases.
EDI: in addition to cost (setup and operational) many recipients print and re-input; in particular small companies. Translators and mappers improved situation somewhat. Further semantic problems arise in matching the received data with other data - from the same partner but from other systems, frequently arising because of the complexity, cost and time to modify existing EDI systems.
Web-EDI: primary objective was to reduce transport cost and possibly by-pass expensive VANs. One quality dimension improved is the timeliness when periodical downloads from VAN is replaced by more “real-time” web-based connectivity.
XML-based: These are contemporary systems, most implemented in the context of Case IV
below. One should distinguish between two primary types of applications:
Transactional applications: including XML-EDI and new “pure” XML connectivity such as in Desktop Procurement Systems (DPS), e.g., DPS connectivity to inventory systems of the supplier. In many cases the 1-to-1 connectivity was changed to 3C2L by using the services as a content intermediary.
Collaborative applications: e.g. design, customer support; added connectivity and timeliness.
More complete information. Workflow technology plays a major role.
Common problems are similar to those encountered twenty years ago when companies moved from file-based systems to databases by emulating the former on the latter, resulting in more efficient GIGO process. Unless this process is accompanied by a methodology-based process and data quality improvement, the results will be similar, but with much more serious (and perhaps catastrophic) results to the business. The lower portion of Figure 4 illustrates the case of direct connectivity between the buyer and the seller in the context of eProcurement. It typically involves a significant business relationship that justified the cost of setting the one-to-one business integration. A main obstacle to data quality enhancement is the legacy EDI conduit which makes it difficult to add to the functional business integration, leading to parallel systems that don’t integrate well relative to the end-to-end process. There is also typically ambiguity about the responsibility of each company for the data quality.