Experimental Gerontology, Vol. 24, pp. 301-316, 1989 Printed in the USA. All rights reserved.
0531-5565/89 $3.00 + .00 Copyright © 1989 Maxwell Pergamon Macmillanplc
IMPROVING THE PRECISION OF BIOLOGICAL AGE DETERMINATIONS. PART 2: AUTOMATIC HUMAN TESTS, AGE NORMS AND VARIABILITY
RICHARD HOCHSCHILD Hoch Company, 2915 Pebble Drive, Corona del Mar, California 92625
Abstract - - In order to eliminate variability due to test operators, procedures for measuring 12 physiological functions that are candidate biomarkers of aging have been automated. Data was collected from a norm group of 2462 male and female office workers using an instrument which requires no test operators, administers all 12 tests in about 45 min. per subject, computes biological age, prints out results, and stores data on floppy disks for transfer to other computers for analysis. This report a) describes the instrumentation and test procedures, b) presents normal age/sex standards for each of the 12 biomarkers, c) reports the variance of the data for each biomarker by sex, d) lists sources of biomarker variance, e) discusses criteria for biomarker selection and f) examines implications for information loss when biomarker data is combined to calculate biological age. After eliminating chronological age as a variable, the standard deviations of the frequency distributions of predicted age for individual biomarkers were found to vary from .226 to 1.075, a range of more than 4 to 1. Procedures are discussed for improving the ratio of useful-to-useless variance in calculating biological age. Key Words: biomarkers of aging, biological age, aging rate, aging measurement, aging and mortality, functional age, physiological age
INTRODUCTION BIOMARKERSOF aging collect a great deal of unwanted information. Biomarkers are selected for their presumed ability to distinguish genuine aging differences between individuals. Such differences may be said to account for the useful component of their variance. In practice, however, much of biomarker variance is useless in the sense that it originates from other sources such as baseline differences between individuals, short-term variations in test subject responses, and measurement error. The determination of biological age from scores on a battery of tests of biomarkers of aging can be made more precise by minimizing unwanted variance on three fronts. These are a) instrument design, b) mathematical methods of converting biomarker scores to biological age and c) the use of extemal biomarker validation/weighting criteria. This second of a four-part
(Received 2 August 1988; Accepted 9 February 1989)
R. HOCHSCHILD TABLE 1. BIOMARKERTESTS ADMINISTEREDBY H-SCAN (j is biomarker number)
1 2 3 4 5 6 7 8 9 10 11 12
Vibrotactile sensitivity, finger tip Memory, sequence of lamps Forced vital capacity Forced expiratory volume, 1 second Alternate button tapping time, 30 x Highest audible pitch Visual accommodation Auditory reaction time Visual reaction time without decision Movement time without decision Visual reaction time with decision Movement time with decision
× 1.5 dB # of jumps 10 ml 10 ml .1 sec 100 Hz .1 diopter msec msec msec msec msec
series of articles (Hochschild, 1989a,b,c) focuses on instrument design considerations to reduce unwanted variance. It describes an automatic measuring system, the H-SCAN, gives norm equations for the relationship between score and age for each of 12 biomarkers measured by the H-SCAN, examines each biomarker's variance, discusses remaining sources of unwanted variance and lays the groundwork for mathematical methods of controlling unwanted variance in calculating biological age that will be introduced in Parts 3 and 4.
Applicable population Employees aged 35 and up working in the home offices of life insurance companies across the U.S. constitute the norm population in this study. They may not all have been healthy by the pure definition of the term, but they were healthy enough to be on their jobs. While manageable by most 10-year-olds, the automatic biomarker tests used require the ability to read with some comprehension, to follow simple instructions, and to perform certain physical (mostly manual) tasks. Thus, the norms generated here apply to persons possessing reading, comprehension, and manual skills comparable to office workers whose first language is English. Most results and conclusions can be expected to apply generally to healthy adults.
Selected biomarkers Table 1 lists the 12 physiological functions measured as candidate biomarkers of aging, biomarker number, j, and units used in the analysis. The selected biomarkers were among the more commonly used biomarkers in the prior human studies of biological age cited in Part 1 and were, by the criteria of those studies, among the more successful markers. While several other biomarkers might have been similarly classified, those selected could be made to fit into an automated testing format.
Test automation The instrument used to collect the data for this study was Hoch Company's (Corona del Mar, CA) H-SCAN, which administers the 12 biomarker tests automatically. Prior to the develop-
IMPROVING BIOLOGICAL AGE DETERMINATIONS. PART II.
ment of the H-SCAN, we administered the biomarker tests using separate instruments, each in the hands of a test operator. All tests in the battery are measures of performance. Therefore, test scores are sensitive both to the quality of instructions and the level of motivation imparted to the participant. Differences in operator training, ability and personality, and varying levels of operator fatigue and boredom were observed to introduce variability into participant instruction and motivation, and into the resulting data. An issue developed as to whether results obtained at various times and locations with different staffs could be combined. These problems led to the development of the H-SCAN, an instrument which measures all 12 biomarkers automatically, collects the data, computes biological age and prints out the results without an operator having to be present. Data is stored on floppy disks from which it can be transferred to other computers for analysis. The system has been used to test subjects aged 10 to 80 +, and operates reliably under untrained supervisors, who can be in a nearby room.
MATERIALS AND METHODS
Sample Seventeen life insurance companies in various parts of the U.S. contributed employees as study subjects. Participating companies each were asked to select randomly up to 200 home office employees aged 35 and above and to avoid volunteers (who might constitute a self-selected, healthier-than-average group). Participation was not mandatory, but most companies reported that only a small proportion of those asked to participate declined to do so. Use of life insurance company home office employees had the advantage of providing a study sample that was widely distributed geographically while being rather well matched educationally, occupationally, and socioeconomically. A total of 2462 individuals aged 34.5 to 77 was tested. Of the 1485 females, 1344 were white, 89 black and 23 oriental. Of the 977 males, 931 were white, 22 black, and 15 oriental. Twenty-nine females and nine males answered "other" or declined to identify their race. Figures 1 and 2 show the age distributions of the males and females, respectively. Mean age was 45.9 for males, 44.7 for females.
Instrumentation Eleven H-SCANs were used, one or two being distributed to each participating company from 3 to 8 weeks. No personnel accompanied these systems, which could be set up quickly by untrained persons at each site from simple instructions. Facilities consisted of a quiet room, table, and chair. Man-power at each testing location was a part-time scheduling person who doubled as test supervisor, remaining available in a nearby room in case there were any questions. A testing session is between the H-SCAN and the individual being tested. Simple instructions for taking the tests appear in large letters on a computer screen. There are headphones, a pulmonary flow transducer, a vibrotactile transducer, a viewer containing a computer-controlled lens system, and a module with six push buttons and lamps. These components are connected to a central unit that contains the driving, data collection and interface circuits. The software for operating the tests and collecting the data requires two floppy disks and runs on a dedicated
0,8 0,7 -70
02 0,1 I
ACE FJ6. 1. Age distribution of 977 males in study.
computer attached to the central unit. Errors in procedure are detected by the computer program and produce appropriate, sometimes humorous, messages. Total testing time is about 45 min.
Questionnaire In the present study, the 12 biomarker tests were followed immediately by an on-screen questionnaire to gather information on mortality risk and other health related factors such as cigarette smoking, consumption of fats, red meat, coffee and alcohol, exercise habits, education, race, parental longevity, hours of sleep, contentment, and other factors. (Part 4 examines the relationship between biological age and these factors.) This questionnaire was made part of the H-SCAN program and added 10 min to the testing session. The 33 questions were multiple choice, answerable by the push of one of six buttons on the push-button module. Each answer given by the participant was immediately reprinted on the screen for review and correction if necessary. Data collection Age, sex, height, physiological test scores, and questionnaire answers are stored as they are generated on one of the two H-SCAN program disks from which, upon return of the equipment,
IMPROVING BIOLOGICALAGE DETERMINATIONS. PART II.
1,0 0.9 I--
AC~ Fie. 2. A g e distribution o f 1485 females in study.
were transferred electronically to the analysis computer, in this case an IBM PS/2 Model 50. The analysis programs used were SYSTAT, Version 4.0, and SYGRAPH, Version 1.0.
Objectives accomplished by instrumentation It was possible to design the motivational aspects of the H-SCAN software to elicit high levels of effort. Nevertheless, differences remain in individual subjects' responses to the motivating conditions, and continue to constitute a source of variability. Despite such remaining variability, the H-SCAN has accomplished the following objectives: a) elimination of test operators as a source of variance; b) standardization of the testing by providing a uniformly repeatable test procedure in the form of a software program and identical units of test equipment at the different sites; c) ability to combine data collected at different times and geographical locations; d) facilitation of mass testing; and e) a sizable reduction in testing costs. An unexpected bonus was the popularity of the video-game-like testing method, which contributed to very high participation levels among randomly selected individuals.
Descriptions of tests Vibrotactile sensitivity at finger tip. A finger is placed on a small metal bar vibrating at a frequency of 120 Hz. The vibration goes on and off unpredictably and the task is to depress a
button only while the vibration is felt. As long as the vibration is correctly identified, it continues to drop in amplitude. Button depression when the vibration is off causes an error message, restarts the test with a somewhat stronger vibration and, upon the fourth such error, cancels the test, resulting in a deleted score for this test. The program makes several passes through the limit of perception in order to define the boundary.
Memory. Six lamps light momentarily in random succession over six push buttons set in a row at a 1.5" spacing. The participant is asked to repeat the sequence a moment later by pushing the corresponding buttons in the same order. The first sequence is two buttons long. With each correct repetition, the prior sequence repeats and one position is added, for a maximum sequence length of 16 which, if reached, terminates the test. After any miss, there is a chance to review the last sequence and try again. If an error is made on two successive attempts, the test is ended. A different random sequence is generated for each test participant.
Vital capacity and FEV-1. The participant is instructed to take the deepest breath possible, then blow it out forcefully and exhaustively through a small hand-held tube. The rotation of a very light impeller blade (moment of inertia 0.00148 g) on sapphire bearings is measured to 1/16 rotation from pulses generated by four photo emitter-detector pairs encircling the tube whose beams are broken by the blade. Volume accuracy is 2% of reading, repeatability 0.5%. The tube is fitted with disposable cardboard mouthpieces and is submersible for washing. Forced vital capacity (FVC) is the total volume of air exhaled during the effort while forced expiratory volume in 1 second (FEV-1) is the volume exhaled during the first second. Just prior to each of three efforts, a target time-volume curve appears on the screen. This is the average curve predicted for the participant's sex, age, and height. A real-time curve for the actual effort is traced instantaneously during exhalation next to the target curve, permitting comparison and providing motivation to try to beat the target curve. Faulty procedure produces helpful messages and prompts a repeat of the effort. If there is a lung impairment, the test can be skipped by pushing a button.
Highest audible pitch. The H-SCAN's headphones are capable of reproducing 30,000 Hz (well above the human auditory frequency range) and are provided with ear cups to seal out external noise. A binaural tone in the headphones goes on and off at arbitrary times while it rises in pitch. The participant responds by depressing a button as long as the tone is heard. Button depression when the tone is off produces an error message and requires the test to be repeated up to four times, after which the test is abandoned, resulting in a deleted score. The program determines the boundary frequency at which the tone becomes inaudible, making several passes through that boundary. Persons who believe that they have a hearing impairment can push a button to skip the test.
Visual accommodation. This test of the focal range of the eye is partly a measure of lens and lens-capsule elasticity and is largely independent of refractive errors. The program first establishes the kind of glasses (if any) the participant has brought. Contact lenses may be worn during the test, as may glasses correcting distance vision. Not permitted are multifocal or
IMPROVING BIOLOGICAL AGE DETERMINATIONS. PART II.
reading glasses. Persons who cannot read the instructions without them can push a button to skip the test, as can anyone who has difficulty with the first few steps. The objective is to identify a small symbol visible in a 13" long viewer containing an actuator-driven lens system under program control. The symbol, either a 0 or an 8, flashes on for 0.2 seconds, then off, at 1.3-second intervals. Usually a 0 comes up, but at random times there is an 8. The 0s are to be ignored and a button is to be pressed whenever an 8 is seen. Starting at the point of clearest vision (determined by a push-button focusing procedure), the program advances the image in small steps toward infinity, reversing the motion slightly and narrowing the steps at each error until several errors have been made. The far point limit of focus, where the ability to distinguish the symbols is lost, is determined by several passes through that boundary. The near point limit of focus is determined in similar fashion. The score is the difference in diopters between the near- and far-points.
Psychomotor indices. In the test of auditory reaction time, a button is pressed as quickly as possible in response to a tone on the headphones. After a short practice session, the best 5 responses (measured to 0.001 seconds) in 10 trials are averaged for the score. Visual reaction time (VRT) and visual movement time (VMT), both without decision, are measured in a single procedure. A button is held depressed until the illuminated lamp above it goes out, whereupon the finger is jumped as quickly as possible to a second, predetermined, button. VRT is the time to release the first button, VMT the travel time to the second button. Each is measured to 0.001 seconds. In each case, the score is the average of the 5 best of 10 tries. A short practice session precedes the scored run. Reaction and movement time with decision are measured similarly except that six lamps light, one at a time, in an unpredictable sequence over the six buttons. The participant follows the jumping light by jumping a finger as quickly as possible from button to button. The light waits for the participant to jump, then proceeds after a short, unpredictable delay. Release time and travel time to the new button are the two measured parameters. Again for each parameter, the best 5 scores obtained in 10 jumps are averaged for the final scores. Alternate button tapping time is the time required to jump with one finger back and forth 30 times between two buttons spaced 7.5" apart. In all six of the psychomotor tests, program detection of faulty procedure or attempts to cheat prompts appropriate messages and sets up a repeat of the mismanaged step. Missing data Biomarker scores were listed as missing if a test was skipped at the election of the participant (usually for certain impairments), abandoned by the H-SCAN program due to excessive errors by the participant, or if the score was deleted for being sufficiently beyond expected limits (listed below) to suggest a physical handicap, an improperly carried-out test or a testing malfunction. No questionnaire answers were missing for any participant because the computer would not go on until it got an answer. Conditions for biomarker score deletions were as follows, with the total number of male/female scores missing given in parenthesis: vibrotactile sensitivity score if under 6 dB (38/45), memory if less than 3 jumps (34/126), both lung function scores if vital capacity was under 1 liter or height was outside the working range of 140 to 220 cm (2/8), button tapping time if under 10 seconds (24/61), highest audible pitch if under 5 kHz (91/60), and accommodation
R. HOCHSCHILD TABLE 2. NORM COEFFICIENTS,BY SEX, FOR CALCULATINGPREDICTEDAGE, Pm(j), FOR BIOMARKERj FROM BIOMARKERSCORE, Y(j),USING EQUATION(l) j:
Males bjo 176.6 1 4 1 . 9 - 6 6 . 5 4 - 3 4 . 7 1 - 1 4 4 . 1 113.8 109.8-209.0-327.1-64.44-273.7-79.31 bjl-6.894-9.437 -.223 -.275 -.927-.530-1.475 1.947 1.833 .713 1.298 1.034 bs2 1.239 1.058 Females bjo 170.0 149.4-55.44-38.99-140.6 127.7 111.0-139.8-358.3-75.21-227.5-58.65 bjl - 6 . 3 1 - 1 1 . 7 4 -.280 -.329 .834-.619-1.656 1.286 1.925 .651 1.129 .688 bj2 1.201 1.075 Substituting the tabulated coefficients for a selected biomarker and sex in equation (1) and plotting the results produces a line relating age to score. Applicable units are given in Table 1. Height is in cm.
if under 1.3 diopter (224/475). The large number of missing accommodation scores was due to malfunctioning viewers at 3 of the 17 test sites and the fact that a number of participants had difficulty with this test. No reaction or movement time scores were missing. All 12 test scores were available for 645 males and 881 females. No outliers were deleted, nor were deletions made for any reason not stated. To avoid bias, all deletions were made before data analysis was begun.
Confounding variables Disease, to the extent that it would keep participants away from work, was excluded as a variable by limiting this study to working employees. However, differences among participants in ability to interpret and carry out instructions, conditioning, fitness, motivation, mood, fatigue, effect of time of day, state of digestion, and so one, remained as confounding variables, contributing to data variance (see Discussion).
Norm coefficients and calculation of predicted age, PA (j), for each biomarker There are appreciable sex differences in the distribution of scores on most biomarker tests. Therefore the development of the age norms, and all other analyses, were carried out separately by sex. Examination of the scatterplots of scores against age suggested that a linear function would adequately satisfy the relationships between chronological age and raw test score, Y(j), for each of the 12 biomarkers, designated j = 1 to 12 using the numbering in Table 1. Therefore, predicted age, PA(j'), for biomarkerj is defined as a linear function of Y(j) as follows: PA(j) = bjo + bj, Y(j) ( + bj2/-/)
The term in parenthesis is added only for the lung function biomarkers, forced vital capacity and F E V - I , both of which depend on height, H, as well as age. The age/sex norms for our population of office workers, that is, the coefficients bjo, bjl and,
IMPROVING BIOLOGICAL AGE DETERMINATIONS. PART I1,
as needed, bj:, were calculated according to the procedure given in Part 1, equations (4) through (7). The resulting coefficients are listed in Table 2 by sex for each biomarker, j. Corresponding units for the biomarker measurements are listed in Table 1. H is in cm. From these coefficients, predicted age, PA(j), for each available raw score, Y(j), for each biomarker, j = 1 to 12, was calculated using equation (1).
Relative predicted age The deviation of a given biomarker score from its age/sex norm can be expressed by relative predicted age, RPA(j), defined as follows: RPA(/)-
where CA is chronological age. For example, RPA(j) = 1 means that the participant's raw score for biomarker j exactly equals the norm score for the participant's age and sex for that biomarker. RPA(j) = .9 indicates a raw score that equals the norm score for a 10% younger age than the participant's CA, and so on. RPA(j) has the advantage that it expresses results on the same scale regardless of participant age. Therefore, the RPA(j) can be analyzed simultaneously and combined, although it will be better to standardize them first, as will be seen. It is the standardized RPA(j), rather than PA(j), which are used in Part 3 to combine biomarker results into a single variable, that is, biological age.
Prediction equations: Age norms by sex for each biomarker (Table 2) How the test scores for each biomarker depend on age is shown in Table 2. Table 2 is organized by biomarker numberj = 1 to 12 across the top, and by sex. Biomarker numbers and units are identified in Table 1. If, for a selected biomarker and sex, the coefficients of Table 2 are substituted in equation (1) and plotted, the result is a line relating age to score. Height coefficients are included for the two lung function biomarkers, forced vital capacity and FEV- 1, where height, H, is in cm. This added coefficient generates a series of parallel lines at different values of H.
Applicable age range of the norms Most of our sample fell in the age range from 35 to 65 (see Figs. 1 and 2). It is doubtful that the norm coefficients in Table 2 are applicable much beyond this range. Below age 30, dramatic departures from linear are known to occur for the relationship between score, Yfj), and age, CA, for some of the biomarkers used in this study. For example, starting from birth, scores on the lung function biomarkers, forced vital capacity and FEV-1, tend to improve until about age 20 in women and age 27 in men (Knudson et al., 1976), then begin their decline. Similarly, the psychomotor indices appear to improve with age throughout the teens. In contrast, touch
R. HOCHSCHILD TABLE 3 CORRELATIONS AND t STATISTICSFOR REGRESSIONSOF EQUATION (4) OF PART 1 BY SEX
N is the number of subjects for which a score for biomarker j was available, r is the correlation between score and age, a n d t 4 is the t statistic which tests the significance of the correlation between score and age, that is, of regression coefficient bin. Similarly, for the two lung function biomarkers, t5 tests the significance of the correlation between score and height, that is, of bjs. (All correlations, r, are significant with two-tailed p < 0.000000001 .)
sensitivity, highest audible pitch, and visual accommodation tend to begin to decline with age early in life (by age 12 if not earlier). Thus, biological age calculations become almost meaningless for persons in their mid-20s and younger because of the competition between changes associated with development and those associated with aging. Accordingly, extrapolation of the reported norms to persons much younger than 35 is not justified. Nor was sufficient data generated to warrant extrapolation to ages beyond 70.
Correlations and t statistics (Table 3) The coefficients in Table 2 resulted from separate linear regressions per biomarker and per sex of scores, Y(j), on chronological age, CA. It is of interest to examine the correlation between scores and age, and the significance of the relationship, for each biomarker and sex, that is, for each set of coefficients in Table 2. This issue is confused by the fact that the coefficients in Table 2 fit equation (1) of age in terms of score, but they were derived from regressions of score on age, see equation (4) in Part 1. Thus the coefficients in Table 2 are not the actual regression coefficients. The important reason for having reversed the direction of the regression is explained in detail in Part 1. The actual regression coefficients are the bj3, bj4, and bj5 of equation (4), Part 1. They are transformed into the bjo, bit, bj2 of Table 2 by equations (5), (6), and (7) of Part 1. Like Table 2, Table 3 is organized by biomarker number, j, and sex. For each biomarker and sex, it gives the number of participants, N, for whom test score Y(j) was available, and the Pearson correlation, r, for the regression of scores on age, equation (4), Part 1. This relationship is, of course, independent of the direction of the regression. Its significance is given by the t statistic that tests the significance of bj4, listed as t4 for eachj in Table 3. Thus the 12 tabulated
IMPROVING BIOLOGICAL AGE DETERMINATIONS, PART I1.
TABLE 4 . STANDARD DEVIATIONS OF RELATIVE PREDICTED AGES,
FOR BIOMARKERS j
COVER MORE THAN A 4 - T O - 1 RANGE
j: Males SD Females SD
t4 values also give the significance of the matching bj~ in Table 2. The corresponding F ratio (not tabulated) is equal to t42. Added for the lung function tests is t 5, the t statistic that tests the significance of the height coefficient, bj5, that is, the significance of the correlation between Y(j) and H. For both males and females, the three biomarkers with the highest correlation between scores and age ( r > .500) are highest audible pitch (j = 6), forced vital capacity (j = 3), and FEV-I (j = 4). The lowest correlations for both sexes (r < .225) occur for the three measures of reaction time (j = 8, 9 and 11), except that for females, the correlation for memory (j = 2) falls
1.0 0.9 I--
c Z .-I
0.1 k.,._ I
RPA(6) FIG. 3. Frequency distribution histogram of RPA(6), the relative predicted age for highest audible pitch, for females. Mean = 1. Standard deviation = 0.226, making this the narrowest of the 24 distributions of RPA(j). Normal curve with same mean and SD is superimposed.
1.0 0.9 IzD
c Z --I
RPA(9) Fio. 4. Frequency distribution histogram of RPA(9), the relative predicted age for visual reaction time without decision, for females. Mean = 1. With a standard deviation = 1.075, this is the widest of the 24 distributions of RPA(j). Normal curve with same mean and SD is superimposed.
into the same low range. For reasons to be considered under Discussion, correlation with chronological age is not necessarily an indicator of the quality of a biomarker for determining aging rate differences between individuals.
Biomarker variance (Table 4, Figs. 3 and 4) Most of the techniques for improving the precision of biological age determinations proposed in this series of articles concem the control of biomarker variance, that is, improving the ratio of useful-to-useless variance. Table 4 compares the 12 biomarkers with respect to variance, or rather its square root, standard deviation. Table 4 is organized like the two preceding tables in terms of biomarker number j = 1 to 12 and sex. It lists the standard deviations of histograms of the number of subjects scoring a particular relative predicted age, RPA(j), vertically versus RPA(j) horizontally. Two such frequency distributions are illustrated as examples in Figs. 3 and 4. In each case, mean RPA(j) = 1, which follows from the above definition of RPA(j). Table 4 shows that the standard deviations of the RPA(j), differ widely from one biomarker to another. Figures 3 and 4 illustrate the extremes encountered in this study. The narrowest frequency distribution occurred for highest audible pitch (/ = 6) for females, see Fig. 3, with a standard deviation of .226. The widest distribution occurred for visual reaction time without
IMPROVING BIOLOGICAL AGE DETERMINATIONS. PART 11.
decision (j -- 9) also in the case of females, see Fig. 4. The standard deviation for the Fig. 4 case is 1.075, which is more than four times greater than that of the Fig. 3 case. That the distributions are close to normal in shape is indicated by comparison to the superimposed normal curves based on each sample's mean and standard deviation. In general, the results in Table 4 reflect the differences in correlations reported in Table 3. Biomarker testsj = 3, 4 and 6 have the lowest standard deviations, while testsj = 8, 9 and 10 (and for females, j = 2) have the highest.
DISCUSSION This article describes an instrument design approach for eliminating the variable influence of test operators on scores obtained on tests of biomarkers of aging. Operator influence tends to be important because most biomarker tests are performance-oriented and depend on how strongly the subject is motivated and how well he or she is instructed. Data was collected from 12 biomarker tests administered automatically to 2462 male and female office workers. Norm equations were computed relating scores to age for each biomarker, by sex (Table 2 and equation (1)). The correlation and significance of the relationship between scores and age for each biomarker were reported by sex (Table 3). The biomarkers were compared according to variance (Table 4) and it was found that the standard deviations for relative predicted ages based on the 12 biomarkers were distributed over more than a 4-to-1 range.
Sources of variability Ratio of useful-to-useless variance. An unknown part of each biomarker's variance is related to genuine aging differences between individuals and constitutes the information to be gathered by the biomarkers. This was designated earlier as useful variance. The rest of the variance, the useless portion, represents differences not related to aging, such as measurement error, short-term variations in a participant's proficiency in performing the tests and, except in longitudinal studies, where they are not generally a factor, baseline differences between participants. It can be expected that the technology of biomarkers of aging and biological age testing will become increasingly concerned with improving the ratio of useful-to-useless variance. This can be done a) at the level of instrument design (e.g., automation to eliminate operator-contributed variance, automatic calibration circuits, etc.), b) by computational techniques for combining scores into biological age (such as standardization) and c) by validating and weighting biomarkers according to external criteria of efficacy (such as ability to predict mortality risk). Points b) and c) are the subjects of Parts 3 and 4. To lay the groundwork for their consideration, it will be useful next to examine remaining sources of biomarker variability, discuss variability as a criterion for biomarker selection and consider implications for information loss when biomarker data is combined for purposes of calculating biological age. Variance contributed by measurement error. Measurement error is attributable to the instrument and consists of errors of calibration, drift in transducers, drift in the analog portion of the electronic circuits, and so on. This potential source of error was carefully addressed in the
design of the H-SCAN and is not believed to have contributed significantly to the variance reported in this study.
Variance due to short-term variations in participants. Errors attributable to short-term variations are associated with random changes with time in the mood, motivation, alertness, fatigue, conditioning, state of digestion, etc., of test participants, and to chance differences in participant response with no specific explanation. Automatic operation, as in the H-SCAN, helps to present uniform instructions and motivating conditions but can do little about some of these other sources of short-term variations. Because they express themselves in lack of repeatability, errors attributable to short-term variations can be reduced by having participants repeat the test battery two or more times, preferably spaced over several days or even months, as suggested by Harrison (personal communication, November 1988). Harrison emphasizes the potential of repeated biomarker tests, especially when they are automatic and therefore procedurally repeatable, to give a real measure of individual aging rates, at least for the biological systems tested. Test repetition introduces a new variable, the training effect, which produces a slight improvement in scores on some biomarker tests due to greater familiarity or ease with the test procedure. This effect is generally small and corrections for it can be applied. Repeated measurements should be considered de rigueur in longitudinal and other studies where comparisons focus on individuals. However, when the focus is on group comparisons as in the present study, use of a relatively larger sample size reduces the error attributable to random short-term variations on the same principle as repeated measurements. The sample size used in this study was inadequate for some comparisons and excessive for others, as illustrated in Part 4.
Variance due to baseline differences between individuals. More difficult to address is the other source of participant variability, namely baseline differences between individuals. Some people are born with the potential for a larger lung capacity, better hearing, greater touch sensitivity, a faster nervous system, and so on. Moreover, personality differences cause some persons to attack the tests with more effort and vigor than do others under identical motivating conditions. In contrast to short-term variations, baseline differences are not generally a factor in longitudinal studies, where scores for the same subjects tested at different times, and perhaps after different treatments, are being compared. But they are clearly a source of variance in cross-sectional studies. To what extent baseline differences between individuals contribute to longevity and aging rate differences is not known. There is evidence, for example, that a larger lung capacity may contribute to longevity (reviewed in Part 3). Moreover, personality differences clearly influence health habits and possibly the development and progression of some diseases and, thereby, longevity. In short, it is unclear what relative importance to assign to baseline differences with respect to their influence on aging. A partial solution to the control of this source of variance may be to distinguish baseline differences that are correlated with mortality rate from those that are not. The rationale for this is discussed in Part 3, and a method is developed for calculating biological age by weighting biomarker scores according to their correlation with mortality rate or mortality risk.
IMPROVING BIOLOGICAL AGE DETERMINATIONS. PART I1.
Correlation with chronological age as an inappropriate criterion of biomarker validity As shown in Table 3, scores for some biomarkers are relatively well correlated with CA (r = .624 for highest audible pitch, females). For other biomarkers, the correlation is poor even if still highly significant thanks to the large N (r = . 167, p < .000000001, for visual reaction time, females). Some investigators have used correlation of scores with CA as a weighting factor in combining biomarker data (Conard, et al., 1966). Presumably, this was done on the assumption that higher variance in a biomarker means higher useless, not higher useful variance. Moreover, such weighting occurs automatically when multiple regression is used in the calculation of biological age, see Part 1. That association with CA is not a rational criterion for selecting, validating or weighting biomarkers of aging has been discussed in Part 1, as has the necessary role of CA in the calculation of biological age. Firstly, CA is needed in the regressions to determine the age/sex norms for biomarkers that depend on age (equation (2), Part 1). Secondly, CA is needed if biomarker results are to be compared for groups whose members vary in age. In the latter case, division of PA(j) by CA, as in equation (2) above, provides an age-independent measure of the deviation of a given biomarker score from the age/sex norm.
Variance as an inappropriate criterion for biomarker selection Another possible way to select or weight biomarkers is on the basis of the wide-ranging standard deviations of the RPA(j), Table 4. However, by the definition of RPA(j), this amounts to the same criterion as correlation of biomarker scores with CA. Correlation with CA is related inversely to the variance of the RPA(j), perfect correlation corresponding to zero variance. It is tempting to speculate that high variance, low correlation biomarkers got that way because of higher contributions from non-age-related short-term and baseline variations. But it is also possible that the low-variance, highly CA-correlated biomarkers are less sensitive to aging rate differences. There is evidence for this, some of it presented in Part 4. For example, highest audible pitch is by far the lowest variance biomarker, see Table 4. In Part 4, it is found to be inferior to two (for males) or three (for females) other biomarkers in ability to predict mortality risk. As long as the ratios of useful-to-useless variance of the different biomarkers remain unknown, the magnitudes of the variances alone are an insufficient criterion for determining biomarker quality. Tables 3 and 4 offer little in the way of information usable for the selection of biomarkers of aging. An outside criterion is needed to gauge the ratio of useful-to-useless variance in each biomarker if weights are to be assigned to the RPA(j) when they are averaged in the process of calculating biological age.
Implications for information loss when biomarker data is combined Weighted or unweighted, there is a reason not to base the calculation of biological age on an average of the predicted ages, PA(j), or relative predicted ages, RPA(j). The PA(j) or RPA(j) will contribute to the variance of their mean in proportion to their individual variances, quite the opposite of what is desired if most of the variability is useless rather than useful. This leads to the undesired condition that the useless variance contributed to the mean by high-variance biomarkers will swamp the useful variance contributed by all biomarkers. The result is
information loss and a reduction o f the ratio o f useful-to-useless variance. This p r o b l e m can be c i r c u m v e n t e d by standardizing the RPA(j) before averaging, that is, transforming to z scores, thereby a l l o w i n g each b i o m a r k e r to contribute equal (unit) variance. But m o r e can be a c c o m p l i s h e d in the way o f variance control by using a weighted average of the standardized RPA(/'), where the weights are the correlations b e t w e e n the standardized R P A ( j ) and an external measure o f b i o m a r k e r efficacy (such as association with mortality rate or mortality risk). This m e t h o d o f calculating biological age will be presented in Part 3. Results are g i v e n in Part 4, which e x a m i n e s the relationship b e t w e e n biological age and a c o m p o s i t e measure of mortality risk before and after w e i g h t i n g , and the relationship between biological age and each of 17 s u r v e y e d risk or health-related factors. Acknowledgments -- I wish to thank the 17 life insurance companies who made employees available for testing and
provided facilities and supervisory staff. Without their willingness to cooperate, this study could not have been done.
REFERENCES CONARD, A., LOWREY, A., EICHER, M., THOMPSON, K., and SCOTT, W. A. Ageing studies in a Marshallese population exposed to radioactive fall-out in 1954. In: Radiation & Ageing, P.J. Lindop and Sachet, G. A., Editors, pp.345-360. Taylor & Francis, Ltd., London, 1966. HOCHSCHILD, R. Improving the precision of biological age determinations. Part 1: A new approach to calculating biological age. Exp. Gerontol. 24, 289-300, 1989. HOCHSCHILD, R. Improving the precision of biological age determinations. Part 3: Redefining biological age. In preparation. HOCHSCHILD, R. Improving the precision of biological age determinations. Part 4: Biological age vs. mortality risk in 2,462 office workers. In preparation. KNUDSON, R.J., SLATIN, R.C., LEBOWlTZ, M.D., and BURROWS, B. The maximal expiratory flow-volume curve. Normal standards, variability and effects of age. Am. Rev. Respir. Dis. 113. 587-600, 1976.