Health Information Systems for Low-Income Countries: An Overview
Canadian Society for International Health

 


V. Modeling Health Data

Delivery of health services and monitoring health status require information to make decisions concerning resource allocation and policy administration. The information used comes from many sources:

clinical information such as diagnoses,

knowledge based information such as medical best practices,

service statistics such as hospital admissions and number of immunizations,

resource consumption statistics such as materials used and hours worked,

health registries,

various surveys,

census statistics,

social statistics such as household income, separations and divorces,

labour force statistics such as the number of physicians,

insurance statistics, for example, health and accident claims,

current events, for example, earthquakes, and

financial and accounting statistics.

The diversity of information sources is indicative of the variety and types of data elements that must be collected and analyzed. Key to the production of information is how the data is organized, which, depending on the scope of the information system, can be a complex design task.

Users who collect data perceive it to be organized in the fashion of the entry forms they use. Similarly, users who act on information perceive it (and often the data as well!) to be structured in the fashion in which it is reported. While it is true that the inputs and outputs of the information processing system are as the users perceive them, the underlying data structures may be considerably different.

A database designer must perform several inter-related data modeling tasks before he or she can design an actual database. The objective of the designer is to build a model that produces the desired information correctly and efficiently. To this end,

the scope, timing and order of the information must be determined,

the data sources must be identified, and

the processing algorithms must be defined.

 

Steps in Data Modeling

Identify the data objects

The first step is to identify data objects by means of a data analysis. A data object can be a person, thing, or event. It can even be an abstract such as an infant in 'ideal' health or it can be a construct such as a prescription for medication.

Determine relationships among objects

Once the objects have been identified, the relationships among those objects must be analyzed. For example, if the objects are people and the information must include relationships within family units, then at a minimum, the parent-child relationship must be represented in the data model.

Verify the initial data model

The first two steps have produced a data model of a sort but it must be verified and then refined. Verification is achieved by tracing the processing algorithms from the data objects to the information desired. If for example, information is missing, then one or more data objects or relationships is missing.

Refine the data model

"Normalization" is a well-defined method that database designers use to refine the data model. Normalization breaks objects into more manageable sub objects that can be easily stored in data files or in a database. Processing requirements, however, may force the designer to choose unusual data structures that are not fully normalized or are 'denormalized'. Databases and the normalization method are discussed in the following section.

Verify the refined data model

Although normalization transformations are determinant, often inconsistencies in interpretation are exposed once the sub objects are exposed. The verification procedure is the same - use the processing algorithm to determine that the desired information can be derived from data elements in the data model. Unless there is justification for capturing them, those data elements, data objects or relationships that are not useful to deriving the information should be removed from the model.

 

Normalization

As shown in the following quotation for databases, the key to normalization lies in the removal of data that is duplicated and data structures that are unwieldy.

A data base may be defined as a collection of interrelated data stored together without harmful or unnecessary redundancy to serve one or more applications in an optimal fashion; the data are stored so that they are independent of programs which use the data; a common and controlled approach is used in adding new data and in modifying and retrieving existing data within the data base.

Martin, James. Computer Data-Base Organization. Englewood Cliffs: Prentice-Hall, Inc.,1975:p.19.

Although there are five different stages used to normalize data structures, the only the first three are performed most often; the remaining two are useful in rare circumstances. Normalization is based on the concept of data stored in arrays. Each column of the array contains an attribute of the class of object, also called an "entity class", being modeled. Each row is a different instance of the object, also called an "entity” that can be identified uniquely by a 'primary key'.

First normal form

Those attributes that have repeated values for the same entity are split out individually and placed in the same array. The primary key is augmented so that each new row has a unique identifier. For example, suppose that the entity class is children and each child can have multiple immunization events. In the list below, the child identifier is marked with an asterisk (*) to indicate that this is the primary key of this entity class.

child identifier*,
child name,
child date of birth,
immunization date (1),
immunization type (1),
immunization description (1),
physician's number (1),
physician's name (1),
immunization date (2),
immunization type (2),
immunization description (2),
physician's number (2),
physician's name (2),
...

 

The transformation results in an entity class of child immunizations, that is, each immunization is represented by a row that the child has received. Although now the rows are shorter, there are more rows in the array.

child identifier*,
child name,
child date of birth,
immunization date*,
immunization type*,
immunization description

 

Second normal form

Those attributes in the array that are identified by only one component of a multi-component primary key are moved into an array of their own. A relationship is created between those entities in the new and existing arrays. In the example above, the child name and date of birth are identified by the child identifier but not by the immunization date or immunization type. Similarly, the immunization description is identified by the immunization type.

child identifier*,
child name,
child date of birth

immunization type*,
immunization description

child identifier*,
immunization date*,
immunization type*,
physician's number,
physician's name

Third normal form

Attributes that are uniquely identified by another attribute are placed in their own array. A relationship is created between those entities in the new and existing arrays.

child identifier*,
child name,
child date of birth,

immunization type*,
immunization description

child identifier*,
immunization date*,
immunization type*,
physician's number

physician's number*,
physician's name

The normalization procedure has subdivided the original child immunization object into four sub objects that are related. However these objects are easier to process and do not have any possibility for duplicated data. The sub objects also have face validity: they are the children, immunization types, physicians, and immunization events.

 

Database Management Systems

A data base may be defined as a collection of interrelated data stored together without harmful or unnecessary redundancy to serve one or more applications in an optimal fashion; the data are stored so that they are independent of programs which use the data; a common and controlled approach is used in adding new data and in modifying and retrieving existing data within the data base.

Martin, James. Computer Data-Base Organization. Englewood Cliffs: Prentice-Hall, Inc.,1975:p.19.


Figure V-1 - The ANSI/SPARC model
 

Database management systems (DBMS's) are used to achieve independence between the data and the application programs. As stated in chapter III, "a database management system (DBMS) is a structured collection of data and programs that manage that data." In order to do this, the DBMS manipulates data and meta data that are stored in files. It uses the meta data to structure the data according to objects and relationships. Then, it presents and formats the result according to the request submitted by the application program. Figure V-1, a schematic of the ANSI/SPARC model, shows how a DBMS achieves independence by means of three functional levels.

The bottom level retrieves and manipulates data in the file system. The middle level structures the data conceptually in accordance with the meta data. Based on this conceptual structure, the top level presents the data in the format that the application program has requested.

 

The Database Life Cycle

As with roads and buildings, databases must be planned, engineered, constructed and maintained. Also, as with roads and buildings, an assessment should be made before starting to determine whether or not a database is truly appropriate. Sometimes simple, inexpensive data storage tools like worksheets or even text documents are sufficient.

A database should be used if any of the following conditions exist:

There is a large amount of data with many types of records occurring many times.

The data structure is complex and includes many relationships between the items of data.

There are many different user requirements that use the data.

There is a need to accommodate many future changes.

Figure V-2 shows eight stages in the database life cycle. The path is a cycle, which implies that it can be repeated any number of timesover the lifetime of a database.


Figure V-2 - The database life cycle
 

Planning

Identification of health and health care goals and objectives that the information system is intended to support. This stage delivers a commitment by all health information stake holders to support the subsequent development. Financial resources should be committed to the project. Activities and needed resources, e.g., personnel, are identified.

Analysis

Identification of the information required to meet the health care goals and objectives. Knowledge is acquired about the data required to satisfy the information needs; its producers, its flow, its timing and its processing. Spatial and temporal models are made of the data flow and processing within the health information system.

Specification

Identification of the resources (hardware, software, human) required to develop and install the system and train its users. The specification should include consideration of data security and data confidentiality.

Design

Conceptual design of a database that supports the collection, storage and processing of health data. The design should incorporate interface conditions with other databases, applications, manual procedures, and the infrastructure such as the platform and network architecture.

Development

Entry of meta data, entry of internal conceptual and external schemas, programming internal DBMS functions and routines, programming procedures to build and maintain the database.

Testing

Verification of the DBMS operation against the database design and validation of the conceptual database design with respect to health care goals and objectives.

Implementation

Installation of the DBMS and its database with appropriate user training and operation.

Operation

Ongoing use of the DBMS and ongoing maintenance of the database. This stage requires that the database conceptual design is continuously reviewed for adaptive changes or new requirements (which may lead to a new life cycle), the database growth is monitored, the infrastructure of the health information system is managed, and data security and confidentiality are monitored and changed to counter risks.


© 2005 Canadian Society for International Health and the Contributors
last update: 2005-06-28