«COMMON EDUCATION DATA STANDARDS The Status of State Data Dictionaries Introduction and Context As the use of data for strategic decision-making in ...»
COMMON EDUCATION DATA STANDARDS
The Status of State Data Dictionaries
Introduction and Context
As the use of data for strategic decision-making in education continues to expand, states and districts have been addressing
multiple issues regarding the collection, housing, use, and communication of data among various individuals and entities. State
education agencies (SEAs) and local education agencies (LEAs) have been focusing on their data systems through the development and implementation of their state longitudinal data systems (SLDS), data warehouses, data dashboards, assessment systems, and various other elements of the overall data process. As these components continue to expand and improve, the introduction of a comprehensive and well-designed data dictionary has emerged as a critical need for states.
Currently, the education field is addressing the issue of standardization across many levels—from districts to states to federal.
Many states are focusing specific attention on the development of data dictionaries. At the federal level, the Common Education Data Standards (CEDS) project has developed voluntary, common data standards for a key set of education data elements to streamline the exchange and comparison of data across institutions and sectors. In their data dictionary efforts, many states are incorporating CEDS and using its associated tools to guide their efforts.
This paper looks at the status of state data dictionaries in selected states in order to highlight states’ experiences, common challenges, and guidance for other states. The purpose of this paper is to provide a timely perspective on the development and implementation of state data dictionaries, and to offer guidance to states as they navigate the process. In addition, the paper considers how tools such as data dictionaries factor into larger data initiatives such as CEDS, EDFacts, and SLDS, and the relationships that can and should exist among such initiatives.
Benefits of a Data Dictionary in an Education Data System As the use of data in educational assessment, planning, and decision-making continues to grow, so too does the sharing of data across multiple sectors. Data users must be sure that the information they are sharing and utilizing is clear, consistent, and accurate. The Data Quality Campaign states, “State policymakers and educators need a data system that not only links student records over time and across databases, but also makes it easy for users to query those databases and use up-to-date reports to adapt to the unique needs of their students” (p. 1). This type of comprehensive data system must include an effective data dictionary.
Educational organizations such as SEAs and LEAs often realize their need for an institutional data dictionary when they are faced with many of the data inconsistencies that can arise without such a standard: inconsistent definitions, inconsistent naming conventions, varying field lengths for data elements, or varied element values. A well-designed data dictionary improves the data quality of an organization by ensuring data integrity, eliminating redundancies, increasing consistency, and allowing effective communication. As SEAs continue to expand their use of data for strategic decision-making, this type of data quality is essential.
In recent years, many SEAs have chosen to share their data dictionaries with other agencies—in several cases publishing the dictionaries publicly. Doing so can provide multiple benefits, both to the agency itself and to the larger educational field. Sharing Visit http://ceds.ed.gov to learn more about CEDS, view the standards, explore the data model, and use the tools.
dictionaries has provided an impetus for conversation among states and districts, as data leaders bring questions or commentary to the publishing state in order to better understand their tool and its usage. Receiving questions and feedback from other states or LEAs can allow the publishing state to recognize needed clarifications or expansions to their dictionary. These conversations can also provide the publishing state a perspective on how it is faring in its dictionary project in comparison to other agencies, highlight opportunities for collaboration with other states about design or metadata decisions, and reveal areas to draw suggestions from other states about vendors and implementation of initiatives.
States that publish their dictionaries publicly also benefit the larger education community. They provide examples and guidelines for other states and education agencies in terms of chosen data elements, metadata, system design, and various other dictionary aspects. Having this type of example can be very helpful to SEAs that are earlier in the development process, in that they can learn from publishing states’ experiences and implement those elements or system designs that are most likely to work effectively for their particular state data system.
Beyond these benefits to states in the development process, open access to multiple data dictionaries also increases the alignment of data shared among different agencies and sectors. While data initiatives such as CEDS are designed to increase the communicability and portability of data across sectors, sharing of data dictionaries among SEAs and LEAs can be an early indicator of the alignment of data elements and metadata among education agencies. In fact, the CEDS Align tool (discussed in greater detail on page 9) builds upon the simple sharing of dictionaries to assess alignment, allowing states to compare their elements to CEDS and to the dictionaries uploaded by other states.
Fundamentals of Data Dictionaries Simply put, a data dictionary provides the names, definitions, and attributes of data elements within a data system. In addition to providing varied users a collective understanding of how data will be used and expressed within a given context, it provides important metadata about the elements in the system.
What is Metadata?
In its publication Forum Guide to Metadata: The Meaning Behind Education Data, the National Forum on Education Statistics (The Forum) defines metadata as follows: “Metadata are structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource… a robust metadata system improves the accuracy of data use and interpretation, as well as the efficiency of data access, transfer, and storage.”
The guide goes on to state:
In general terms, a robust metadata system will include system governance arrangements that include policies and procedures for metadata management and use within the organization, and related roles and responsibilities for staff; a metadata model that links metadata items to existing data elements and data sets; a list of relevant metadata items (i.e., a metadata item inventory), including a lexicon that identifies shared vocabulary for term use and naming conventions; and a comprehensive data dictionary. (p. 11-12) Because metadata is what drives a tool like a data dictionary, it is helpful to clarify at the outset what types of items fall under the concept “metadata.” The Forum breaks these items into three categories: data management, technical, and data reporting/use.
• Common data management metadata items: element name, definition, purpose/mandate, restrictions, related data elements/components, calculations/formulas, manipulation rules, ownership/stewardship, effective dates, retention period, business rule, security/confidentiality
• Common technical metadata items: field length, element type, permitted values, code set, translations, storage/archival destination, data source, data target
• Common data reporting/use metadata items: routine use, key words, quality metrics
What is a Data Dictionary?
According to the Forum Guide, “a data dictionary is an agreed-upon set of clearly and consistently defined elements, definitions, and attributes—and is indispensable to any information system…Data dictionaries generally contain only some of the metadata necessary for understanding and navigating data elements and databases and, thus, contain only a subset of the metadata found in a robust metadata system” (p. 15).
When asked about the purpose and expected outcomes of their state education data dictionary, a data lead from Colorado
provided the following explanation:
The goal is to create and provide a comprehensive information catalog of data definitions, relationships, collection groupings, validation rules, aggregations, and generated reports. The added benefit of a Data Dictionary will be to improve the accuracy of information and standardize data definitions within a centralized repository.
As large-scale data initiatives proliferate in the American educational system, states, districts, and other organizations are recognizing the critical need for a comprehensive data dictionary—and the metadata that drive it—to facilitate communication among varied data stewards and users and to allow for the accurate transfer of data from one entity or educational level to another.
Status of Data Dictionaries in Selected States The central goal of this paper is to provide a direct, hands-on perspective on the development, maintenance, and philosophies behind state data dictionaries. To this end, interviews were conducted with data team members in five states: Colorado, Maine, Montana, Oregon, and Washington. These states represent various stages of data dictionary development and implementation, and the information culled from the interviews offers a range of experiences and lessons learned.
Interviews addressed the current status of the data dictionary, including where and how it is housed, who “owns” it and is responsible for its maintenance, and what procedures are in place to support it. Interviewees were also asked about challenges and successes they have experienced, their relationships (if any) with data dictionary vendors, and their overall goals and expectations for the dictionary. Finally, state representatives were asked to offer guidance or recommended practices to other states as they develop, implement, and maintain their data dictionaries. (See Appendix A for Interview Guide.)
Development and Design of Data Dictionary
Interviewed data leaders described different experiences regarding the development of their data dictionaries. Each state has followed a unique path, influenced both by the initial impetus for dictionary creation and by choices made regarding internal versus external development (e.g., the involvement of a vendor). This section provides a brief summary of each state’s development process, as well as the overall design of each data dictionary.
In-house Development: Oregon and Colorado Development of Oregon’s data dictionary began around 2000, as the data team worked with the Oregon Student Record to identify a set of elements needed for student transfer data. This process led to the state’s Consolidated Student File Format, which was a technical document that described the specific layout required to exchange information with the state’s Department of Education.
The dictionary initially included about 120 elements, with different parts of the data system being populated depending on which data collection a user was referencing or updating. As the data team continued to develop the data dictionary, they focused on expanding the metadata and the ways it would drive the collection system. They worked to specify standards, including field names, length, and types. Standards became tighter over time as the team created the state’s data map and leveraged the information they had to make decisions about metadata.
Oregon currently uses a Microsoft SQL 2008 server to house their dictionary, and will soon be moving to the SQL 2012 system.