Applied Ergonomics 40 (2009) 997–1003
Contents lists available at ScienceDirect
Applied Ergonomics journal homepage: www.elsevier.com/locate/apergo
Physiological compliance and team performance Amanda N. Elkins a, Eric R. Muth a, *, Adam W. Hoover b, Alexander D. Walker a, Thomas L. Carpenter a, Fred S. Switzer a a b
Department of Psychology, Clemson University, 418 Brackett Hall, Clemson, SC 29634-1355, USA Department of Electrical and Computer Engineering, Clemson University, SC 29634-1355, USA
a r t i c l e i n f o
a b s t r a c t
Article history: Received 4 January 2008 Accepted 5 February 2009
Physiological compliance (PC) refers to the correlation between physiological measures of team members over time. The goals of this study were to examine ways of measuring PC in heart rate variability (HRV) data and the relationship between PC and team performance. Teams were tasked with entering both real and simulated rooms and ‘‘shooting’’ individuals with a weapon and identifying individuals without a weapon. The linear correlation and directional agreement PC methods were shown to be the most sensitive to differences in performance, with greater PC being associated with better performance. The correlation method when applied to a measure of respiratory sinus arrhythmia (RSA) revealed a significant difference between high and low performers (t ¼ 2.31, p ¼ 0.03) and the directional agreement applied to inter-beat-intervals and RSA revealed trend-level differences (t[4.62] ¼ 1.86, p ¼ 0.06 and t ¼ 1.68, p ¼ 0.07). These results suggest that PC may have merit for predicting team performance. Ó 2009 Elsevier Ltd. All rights reserved.
Keywords: Teams Heart rate variability Compliance
This study had two goals. The ﬁrst was to examine ways of measuring physiological compliance (PC) in heart rate variability (HRV) data and identify the most viable measure. The second was to examine the relationship between PC in HRV and performance. Psychophysiological measures have often been used in studies of human-environment systems and team work (Henning et al., 2001). These measures have several advantages in the evaluation of environments and systems (Backs and Boucsein, 2000). First, psychophysiological measures are noninvasive. Second, psychophysiological measures can be acquired unobtrusively without requiring any explicit behavior from the subject. Lastly, they provide continuous information about an individual and are sensitive to physiological state changes. However, these measures have most often been analyzed at the individual level and only sometimes the dyad level (Cacioppo and Petty, 1983; Henning et al., 2001). PC involves taking a different approach from traditional psychophysiological studies by focusing on psychophysiological measures at the team level rather than the individual level. Understanding how PC relates to team performance could lead to improved team training and assessment. PC refers to psychophysiological changes of a joint nature, i.e. two or more people (Smith and Smith, 1987). Physiological changes that involve two or more people and exhibit close correspondence of reﬂected mutual inﬂuence are considered to be compliant * Corresponding author. Tel.: þ1 864 656 6741; fax: þ1 864 656 0358. E-mail address: [email protected]
(E.R. Muth). 0003-6870/$ – see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.apergo.2009.02.002
(Henning and Korbelak, 2005). Levenson and Gottman (1983, p. 588) explained physiological linkage as physiological responses between members of an interacting dyad that ‘‘show considerable relatedness or linkage.’’ In general, PC can be operationally deﬁned as the correlation between physiological measures of team members over time. Although PC is not a new notion in research, it is somewhat understudied (Henning et al., 2001). A limited number of studies have departed the conventional individual level of psychophysiological data analysis and have explored the implications of psychophysiology among dyads. These studies provide evidence that psychophysiological responses among interacting people can exhibit relatedness and linkage that can then be used to predict performance. DiMascio et al. (1955) conducted a study involving the physiological responses of psychotherapists and their clients during interviews. They found that heart rates of the therapists and client frequently covaried. Later, in a study of the effects of a therapist’s praise on female responses, Malmo et al. (1957) clearly demonstrated the covariation of physiological signals of two people, i.e., PC, after each manipulation. In their study, the amplitude of the electromyogram of the examiner and patient both fell, subsequent to the examiner issuing praise, and remained constant after the examiner expressed criticism. The authors noted the need for objective investigation of the interaction between patient and therapist. The psychophysiological relationship between individuals in dyads was also explored by Kaplan et al. (1964). They examined
A.N. Elkins et al. / Applied Ergonomics 40 (2009) 997–1003
skin conductance among dyads of participants who either ‘‘liked’’ or ‘‘disliked’’ each other. In this study, data from dyads composed of people that disliked each other were more likely to show signiﬁcant physiological correlations compared to data from dyads composed of people who liked each other. More recently, Levenson and Gottman (1983) looked at social interactions during topical discussions between thirty married couples and found that 60% of the variance in marital satisfaction was accounted for by physiological linkage. They found evidence that distressed couples’ interactions were more likely to exhibit physiological interrelatedness and this linkage was more likely to occur in periods of high negative affect. They concluded that PC was associated with periods of negative affect and could therefore be problematic. Hatﬁeld et al. (1994) offered a different interpretation of Levenson and Gottman’s ﬁndings. They suggested that PC is not problematic, but ‘‘simply accompanies periods of intense social interaction’’ (Henning et al., 2001, p. 222). In one of the most recent studies focused on PC, Henning et al. (2001) examined compliance in electrodermal activity (EDA), heart rate and breathing while using two-person teams participating in a team tracking task to evaluate if PC is a determinant of team performance. They found that increased PC was associated with improved performance and decreased team tracking error. Furthermore, Henning et al. (2001, p. 230) found that there was no correlation of joystick control actions and PC, which countered interpretations that PC may have been attributed to ‘‘matched task behaviors resulting in matched physiological responses.’’ In a subsequent study, Henning and Korbelak (2005) investigated the predictive value of PC in team performance. In this study, teams of two performed a tracking task randomly interspersed with unexpected switches in task control dynamics. The study revealed that higher cardiac PC predicted lower tracking error from teams, conﬁrming the hypothesis that PC could be used as a predictor of team performance and possibly as a means of determining a team’s preparedness to manage the unanticipated. In perhaps the only study of PC in a four-person team, Henning et al. (2006) analyzed HRV data and subjective team work effectiveness responses from a research team during meetings throughout a six month period. Although PC scores from entire meetings provided no predictive information, the data did show that PC during periods of sequential speech activity (subsequent speeches of different persons) was predictive of team effectiveness ratings. Further study on this targeted approach was suggested. Physiological measures used to study PC in the past have included heart rate and HRV, skin conductance, pulse transmission time to the ﬁnger, and respiration (Henning et al., 2001; Levenson and Gottman, 1983). Of the previously studied, HRV has been shown to have the strongest predictive relationship with performance (Henning et al., 2001). The current study examined three different approaches to measuring PC of HRV. HRV is a measurement of the naturally occurring changes in an individual’s inter-beat-intervals (IBIs), the time between successive heart beats, over time (Bernston et al., 2007). Variability in heart rate over time is predominantly composed of three frequencies. These frequencies are high, medium, and low. The ﬁrst measure of HRV examined in the current study was the high frequency component of spectrally decomposed HRV, which normally ranges from 0.15 Hz to 0.5 Hz. In the high frequency component, it can be shown that during exhalation, IBI increases and during inhalation, IBI decreases (Porges, 1995). This cyclical change is known as respiratory sinus arrhythmia (RSA) and when respiration is controlled for, can be used as a measure of parasympathetic nervous system (PNS) activity (Berntson et al., 1997). RSA derived from the high frequency component of spectrally decomposed HRV was one of the physiological measures examined
due to its relationship with PNS activity. RSA was examined as both raw high frequency power (raw RSA) and as loge high frequency power (loge RSA). Both were examined due to the fact that raw RSA data form a skewed distribution and taking the loge of raw RSA data results in a normal distribution. The second measure of HRV examined was mean successive differences (MSD) in IBI. MSD of IBI is a measure of the standard deviation of heart interval. The MSD statistic is computed as the average of the difference between successive IBIs for a particular time period (Allen et al., 2000). It ﬁlters out low frequency sources of variability in the IBI data series. MSD has been validated in previous studies and also has been shown to effectively track manipulated cardiac vagal control (Hayano et al., 1991). The ﬁnal measure examined was mean IBI value, or the average of a set of IBI data for a particular time period. Average IBI was examined due to its simplicity and direct relationship to a physiological system, i.e. rate of heart activity. While previous studies have provided a great deal of information about PC, few, if any, have speciﬁcally examined various measures of PC in HRV between team members performing a highly physical and applied task. The current analyses aimed to continue exploring PC as an objective assessment of team performance by examining PC of HRV across individuals in four-person teams as they perform building clearing tasks related to military operations. It was hypothesized that a useful measure of PC of HRV could be derived and that higher PC would be associated with better performance. 1. Methods 1.1. Participants Participant data were mined from a previous study on training the military task of building clearing to four-person teams. In the task of building clearing, an armed team enters and moves through an entire building. As they do this, they must eliminate, ‘‘shoot’’, all of the opposition combatant forces. At the same time, they must otherwise identify and not ‘‘shoot’’ the non-combatants in the building. A subset consisting of 10 teams of four participants each (40 participants total, all male, ages 18–30) was selected from the previous study based on the availability of useable physiological data. Participants were previously screened for good physical condition, moderate level of experience with ﬁrst person shooter style video games, no formal combat or weapons training, and English as their ﬁrst language (Carpenter, 2006). All male subjects were selected due to the relationship of the previous study to combat marines. Participants were randomly assigned to fourperson teams and were compensated at the approximate rate of $10 an hour for their time. All subjects signed a consent form approved by Clemson University’s Institutional Research Board for the Protection of Human Subjects prior to participation. 1.2. Task description The building clearing task was selected for several reasons. First, the current analyses sought to use the concept of PC in conjunction with a highly physical and applied task, which has not been previously addressed. Second, this task required a degree of cooperation among team members to accomplish a common objective. The degree of cooperation used was left to participants, allowing for a variety of PC and performance scores. There were four experimental conditions in the previous study based on the training received: watch a training video only; watch the video plus play a ﬁrst person shooter computer game; watch
A.N. Elkins et al. / Applied Ergonomics 40 (2009) 997–1003
the video and practice by physically moving through the real world and pretending to shoot or identify paper targets; or a combination of the video, real world, and computer game (Carpenter, 2006). The current analysis did not use data from the video training only condition because physiological data were not collected in this training condition. Further, the type of training condition was ignored in the current analyses as only compliance during training and post training performance were of interest, not the particular method of training received. In the training phase, each team completed six trials that totaled about one hour of training time (Carpenter, 2006). Depending on the training method assigned, a trial was deﬁned by either a predetermined amount of time (computer game) or a number of room entries (real world). Appropriate feedback was given between each trial during the training portion. IBI data were collected during training. In the testing portion, participants completed six testing trials in a real world ‘‘shoot-house’’ against live opposition forces and noncombatants using laser tag type equipment. The teams were told to complete each trial while incurring a minimum number of casualties. They were to clear the ﬁve room shoot-house while attempting to engage combatants (individuals with a weapon) and acknowledge unarmed non-combatants (individuals without a weapon). Teams had to eliminate (shoot with a simulated weapon) combatants but only identify non-combatants. Performance metrics were monitored during the testing phase using equipment described below. 1.3. Physiological recordings during training The IBI data in the current analyses were obtained during the training session using a UFI 3991x/1 –IBI Biolog (Morro Bay, CA) that weighed approximately 2 pounds and was worn close to the small of the back on a belt. This device automatically derived IBIs real-time from electrocardiogram recordings and digitally stored asynchronous IBI data for off-line download and viewing. Three disposable electrodes were used for obtaining the recordings. One electrode was placed in the middle of the sternum and another was placed just below the ribcage on the left hand side of the body. The last electrode was placed on the lower right abdomen and was used as a reference electrode. This conﬁguration was intended to allow for maximum detection of sequential electrical events of the cardiac cycle with minimal effect of body movement. 1.4. HRV data cleaning and reduction Because accelerometer equipment was not available to control for movement artifacts and noise, this was accomplished during data reduction. It was necessary to examine the quality of the raw IBI data. Initially, data ﬁles were examined and labeled usable or unusable. Files with errors comprising more than approximately 10% of the recording were considered unusable. Several teams had three subjects that provided usable data while other teams had only two. In order to balance this, the two subject data ﬁles with the least amount of artifact were chosen to be analyzed from teams with three usable team members. Because it has been previously demonstrated that synchronized physical movement was not a primary cause of PC (Henning et al., 2001), the physical action roles of team members would not be expected to be the sole determinants of the degree of PC observed. Therefore, 2 team members with useable physiological data were chosen as representative of the entire team. After this initial inspection, a total of 20 data ﬁles, two per team, were left to carry into the next data processing step.
In the next step, the raw IBI data were divided or split based on training trials. To accomplish this, Biolog DPSx1.4 software (UFI, Morrow Bay, CA) was used to ‘‘region select’’ periods of interest in the raw data ﬁles and save each separately. Periods of interest were deﬁned by trials; therefore, the ﬁle of each participant was divided and saved as six separate ﬁles, with one ﬁle for each trial. Next, the split raw data were cleaned and artifacts were manually corrected with a locally developed editor program that allows for viewing and storing IBI series as corrected with a user’s input and decisions. Data required an average of six corrections per 100 IBI values. The most common errors in the IBI recordings were detecting a false peak, which created a short IBI, and missing detection of a peak, which created a long IBI. The corrections used were ‘‘combine’’ (merge two consecutive IBI values), ‘‘combine three’’ (merge three consecutive IBI values), ‘‘combine split’’ (merge two consecutive IBI values and split two ways), ‘‘combine three split two’’ (merge three consecutive IBI values and split two ways), ‘‘combine two split three’’ (merge two consecutive IBI values and split three ways), and ‘‘combine three split three’’ (merge three consecutive IBI values and split three ways). After data cleaning was completed, the data were re-sampled at 1 Hz in order to create a continuous time series. After re-sampling, each data ﬁle (one for each trial of each participant) was further split into 65 s windows. Sixty-ﬁve seconds was chosen as the window size because it was the smallest window usable by the analysis software described below in the next step. In the ﬁnal data reduction step, each ‘‘windowed’’ data ﬁle was analyzed using additional locally developed software to gather statistics from the data. Mean IBI values, MSDs, and peak RSA frequency were all derived. RSA was derived using a running fast Fourier transform (FFT) analysis with non-overlapping windows. The FFT analysis requires a power of two sample (in this case a 64 data point window) with the local software skipping the ﬁrst data point (hence the 65 s windows). Non-overlapping windows were used because multiple, non-overlapping HRV measure data points were required for use in the PC calculations as described below. Spectral density estimates were derived at a bin width of 0.017 Hz [1 cycle per minute]. The spectral power at the high frequency peak between 0.15 and 0.5 Hz was taken as the measure of RSA. After deriving RSA values, the loge value of each RSA data point was taken to account for the fact that RSA data form a skewed rather than normal distribution. The MSD and mean IBI values were also calculated for the same 65 s windows. For the signal matching and instantaneous derivative matching measures described below, a further data processing step was required. These methods required the data to be z-scored. This was accomplished by considering all of the resulting data points from every window processed. z-Scores were calculated within each subject, ignoring trial, for each measure (MSD, mean IBI, RSA and loge RSA). This additional processing was not required for the directional agreement and correlation methods. However, the directional agreement was also calculated (ignoring trial) by using the total number of data points in agreement over all trials to establish an agreement percentage. The correlation method was the only method that required trial to be considered. 1.5. Measures of PC during training The purpose of the ﬁrst analysis was to create several measures of PC through the analysis of mean IBI, MSD, raw RSA, and loge RSA data. The measures chosen were signal matching (SM), instantaneous derivative matching (IDM), directional agreement (DA), and correlation. Data used for these measures were synchronized over time by having a constant sampling rate for both team members
A.N. Elkins et al. / Applied Ergonomics 40 (2009) 997–1003
1.6. Signal matching (SM) SM was used to quantify PC by examining the differences in area between the data curves of team members. Fig. 1 illustrates this concept. The area of interest between the curves is highlighted with lined shading. Greater area between the curves indicates less similarity between the signals, while less area between the curves indicates more similarity; therefore, a lower score on SM indicates higher PC. The SM process was accomplished in several steps. First, the values from each physiological signal were normalized so that both signals were on an equal and comparable scale. To accomplish this, z-scores were derived within-subject considering all data points for that subject simultaneously (trial and window were ignored). Next, the absolute differences in each data point and its counter in the other signal were derived. For example, the absolute difference between point 1 on team member A data would be compared to point 1 on team member B data, point 2 on team member A data would be compared to point 2 on team member B data, and so on. After differences had been derived, the overall mean difference of each team was calculated. 1.7. Instantaneous derivative matching (IDM) The z-scored data used for SM calculations were also used for IDM calculations. IDM examined how well the slopes of two different physiological signal curves matched each other. The derivative of a point is the tangent to the curve at that point, which provides a slope. Because these analyses used discrete values, the slope for each point was calculated as the difference between that point and the next to get the instantaneous derivative for each point. The instantaneous derivatives for each point were then compared to the corresponding point on the signal from the other team member and the differences between each point were averaged. This is expressed by the following equation where a is team member A, b is team member B, and t is time:
Time Fig. 2. Instantaneous derivatives of 2 signal curves.
Fig. 2 illustrates an example of physiological signals from 2 team members. Four corresponding points on each show where instantaneous derivatives (represented by tangent arrows) would be compared to the corresponding derivative on the other curve. These curves would have a low score due to the similarity of the derivatives at each point, indicating high PC. 1.8. Directional agreement (DA) DA provided a very basic measure of PC. The directional movement of each data point relative to the previous point was determined. For example, if the value at data point 1 were lower than the value at data point 2, the directional movement would be labeled as ‘‘increasing,’’ but if the value at data point 1 were higher than the value at data point 2, the directional movement would be labeled as ‘‘decreasing.’’ Next, both team members’ data were compared and determined if they were in directional agreement at each data point (i.e. both are increasing or both are decreasing). Fig. 3 illustrates this concept with example data. In this ﬁgure, data points 2A and 2B are in directional agreement because both are increasing from the previous point. However, data points 4A and 4B are not in agreement because one increases relative to the previous point while the
Team member B
Team member A
X 1 T1 at Þ ðbtþ1 bt Þj jða T t ¼ 0 tþ1
Team member A
and ensuring that measurements began at the same point by using digital markers in the data ﬁles. These measures provided four different ways of comparing the similarities of the HRV of the two team members.
Team member B
2B 6B 1B
Time Fig. 1. Example area between two data curves examined by signal matching.
Fig. 3. Illustration of directional agreement/disagreement.
A.N. Elkins et al. / Applied Ergonomics 40 (2009) 997–1003
other decreases. Points 6A and 6B both increase relative to the previous point and are in directional agreement. A percent agreement was derived for the entire set of data from a team from the comparison explained above and used as a measure of PC. For example, using only the 3 points mentioned previously, the percent agreement would be 66%. Higher percentages represented higher levels of PC.
was also possible to encounter less than the total number of noncombatants present. To remedy this, the percentage of correct acknowledgements out of the total number of non-combatants confronted was taken. For example, if a team encountered eleven non-combatants and acknowledged only eight, they would receive a percentage of 72.7%. 2. Results
1.9. Correlation A correlation was used to indicate the strength of the linear relationship between the team members being examined. This was possible because the current data were synchronized over time. Correlation coefﬁcients were calculated for each trial between team members. The average correlation across trials was then calculated to provide one score for the degree to which PC was demonstrated between team members. Positive correlation values were related to PC with higher positive correlation values indicating higher levels of compliance. It is important to note that in this study, measures derived for consecutive 65 s windows were correlated over time. For example, the mean IBI derived from each 65 s for a series of windows were correlated. This is different than the correlation approach used by Henning et al. (2001), where a beat-to-beat measure of raw IBIs, rather than an average of a 65 s period of data, was correlated. 1.10. Team performance measures during testing Performance ratings were derived for use as the testing data by averaging together a team velocity and a percentage of noncombatants acknowledged metric. z-Scores were derived separately for these two metrics across all testing data (ignoring trial) to guarantee equal weighting among the two scores since there was no evidence to suggest one was more important than the other. These measures were chosen from metrics that were monitored during the testing phase of the task because they model the speedaccuracy tradeoff present in human movement. In order to derive team velocity, position data for each subject were recorded as Xpt , Ypt : where p ¼ 1 through 4 and is the ID of the subject; and t ¼ 1 through N and is the time index of the record. The data were recorded at 20 Hz, so that the time delta between consecutive samples t and t þ 1 equaled 50 ms. The units of position measurements are cm. In order to calculate team velocity, each individual’s distance moved over a given time period was ﬁrst derived. Dpt is the distance moved by subject p at time t, and was qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ computed as Dpt ¼ ððYpt Ypðt1Þ Þ2 þ ðXpt Xpðt1Þ Þ2 Þ. The velocity for an individual subject p at time t was then computed as P Vpt ¼ ð tþ5 i ¼ t4 Dpi Þ=ð10 50msÞ. This calculation smoothed the velocity computation over a 10 sample or 500 ms window, making the computations more robust in the presence of noise. The units of Vpt are cm/sec, and can be scaled to m/s by multiplying by 100. The average team velocity At at time t was computed as P At ¼ ð 4i ¼ 1 Vit Þ=4. If a team member was shot (killed) at time k, then his velocity was not used in any computation of average team velocity for t k. Note that all four team members, rather than only the two selected for PC analysis, were included in team velocity calculations. Percentage of non-combatants acknowledged was the accuracy portion of the performance rating and refers to the number of noncombatants acknowledged by a team relative to the number encountered by that team. There were twelve possible noncombatants to be acknowledged over all testing trials. However, since some teams were unable to complete all trials successfully, it
2.1. Analysis I The purpose of Analysis I was to examine the ability of each of the combinations of HRV measure (RSA, MSD and mean IBI) with each PC method (SM, IDM, DA and correlation) to differentiate between recordings which exhibited compliance and which did not. After data reduction, plots of each team were generated for each measure (mean IBI, MSD, RSA, and loge RSA) by trial. In order to establish the ‘‘ground truth’’ regarding the existence of PC in each recording, three raters visually scanned the plots individually to look for the appearance of PC. Raters were blind to the team they were rating. Raters were instructed to determine if each trial exhibited compliance and then if the overall recording exhibited compliance. Plots were rated for compliance separately for each measure (mean IBI, MSD, RSA, and loge RSA). The deﬁnition of PC was explained to the raters. However, no speciﬁc strategies or rules to be used in the rating process were given. When all raters did not agree, the rating with a majority (two out of three agreed) was chosen. The PC scores as calculated above were then compared between the two groups identiﬁed by the raters (those exhibiting PC vs. those not exhibiting PC), separately for each measure using between-subjects t-tests. Adjustments were made to the degrees of freedom to account for unequal variances where appropriate. All tests were directional in nature and statistical signiﬁcance was determined with alpha set at 0.05. Table 1 provides a summary table of combinations of physiological measures and compliance methods. Those combinations which were able to correctly differentiate, as indicated by signiﬁcant t-tests, between the compliant and non-compliant records, as determined by the raters, are indicated. Based on the results of Analysis I, DA and linear correlation were used in conjunction with mean IBI, raw RSA, and loge RSA data to provide six PC measures for use in Analysis II. 2.2. Analysis II The purpose of the second analysis was to examine the relationship between performance of four-man teams and PC. Betweensubjects t-tests were used to compare the average PC scores of high versus low performance teams and the six compliance measures. Adjustments were made to the degrees of freedom to account for unequal variances where appropriate. All tests were directional in nature and statistical signiﬁcance was determined with alpha set at 0.05. Because performance values had been standardized using z-score values, the split between performance groups was made at zero. Recall that PC was measured during training and performance was measured during subsequent testing. Table 1 Measures which showed a statistically signiﬁcance differentiation between recordings visually identiﬁed as compliant vs. non-compliant.
Signal Matching Instantaneous Derivative Matching Directional Agreement Correlation
no no yes yes
no no no no
no no yes yes
no no yes yes
A.N. Elkins et al. / Applied Ergonomics 40 (2009) 997–1003
There was a statistically signiﬁcant difference between PC scores of high (mean ¼ 0.16, SD ¼ 0.19) and low (mean ¼ 0.10, SD ¼ 0.16) performance groups when using correlation and loge RSA (t ¼ 2.31, p ¼ 0.03). There were also trend-level signiﬁcant differences between the mean PC scores of high (mean ¼ 0.66, SD ¼ 0.17) and low (mean ¼ 0.51, SD ¼ 0.05) performance groups when using the combination of DA and mean IBI data (t[4.62] ¼ 1.86, p ¼ 0.06), as well as between high (mean ¼ 0.55, SD ¼ 0.08) and low (mean ¼ 0.45, SD ¼ 0.12) performance groups when using DA and loge RSA data (t ¼ 1.68, p ¼ 0.07). Although all measures trended in the expected direction, no other statistically signiﬁcant differences were found between high and low performance groups. Fig. 4 illustrates the means of PC for the high and low performance groups using the 3 correlation combinations, while Fig. 5 illustrates the means of PC for high and low performance groups using the 3 DA combination methods. A follow-up independent t-test was calculated on average team respiration rate (in cycles per minute) to rule it out as a confounding variable in the raw RSA and loge RSA scores. An independent t-test revealed no statistically signiﬁcant differences in respiration between high (mean ¼ 13.2, SD ¼ 2.07) and low (mean ¼ 12.34, SD ¼ 1.29) performance groups (t(8) ¼ 0.79, p > 0.05).
Physiological Compliance Score (Pearson's r)
The objective of these analyses was to begin to explore the relationship of PC and performance by exploring possible PC measures of HRV data and applying them to existing performance data. Analysis I was successful in creating six viable compliance measures. Subsequent analyses supported the hypotheses that a positive relationship between compliance and performance existed. These ﬁndings show that the compliance–performance relationship is not limited to stationary tasks and can be applied even in complex task settings. Due to the fact that PC was measured during training and performance during subsequent testing, compliance may also have some predictive value to be used in selection and training methods. Using independent raters as the ‘‘ground truth’’ to verify the presence of PC, these analyses revealed that DA and linear correlation combined with mean IBI, RSA, and loge RSA data are all valid measures of PC. These method-measure combinations all provided results indicating that performance and PC are positively correlated (some statistically signiﬁcant, some trending correctly). However, results from Analyses II also suggest that some of these methodmeasure combinations are more sensitive to differences in performance. DA seems to work the best when paired with mean IBI or
0.3 0.2 0.1 0 -0.1 -0.2
Physiological Compliance Measure * Difference is statistically significant
Fig. 4. Mean physiological compliance for high and low performance teams by directional agreement measure.
DA Physiological Compliance Score (percentage)
0.6 0.5 0.4 0.3 0.2 0.1 0
Physiological Compliance Measure * Difference is statistically significant
Fig. 5. Mean physiological compliance for high and low performance teams by correlation measure.
loge RSA data while linear correlation seems to work best when combined with loge RSA data. It is likely that RSA data provided sensitive measures for PC due to its quick response through vagal activity (Berntson et al., 1997). RSA is often used as an indirect measure of PNS activity for this reason. This is in agreement with the behavioral cybernetic model proposed by Smith and Smith (1966) that asserts that PC occurs before the behavior. Therefore, the mechanism chosen to measure PC must react quickly in order to provide a good measure of compliance. It should be noted that the sensitivity of these measures could also be attributed to similar mental models among team members (Beng-Chong and Klein, 2006). Anticipation of the same events could also lead to measurable changes in these measures. However, these measures do suggest that physiological covariation is not the origin of PC (Grossman, 1983). The teams examined in these analyses were completing a task that did not require a simultaneous start or action throughout the task. Team members often completed separate tasks in different areas (i.e. splitting off to clear separate parts of rooms). If physiological covariation were the basis of PC, it is likely that signiﬁcance would not have been measureable without considerable phase shifting and including only parts where team members acted together. However, because of the multiple sources or possible artifacts in a situation with free moving subjects, it should be noted that the present results only explore a few of the possible sources for the origin of PC. The current analyses justiﬁed the PC measures by using ground truth data as determined by human raters. Additional work should be completed to further validate the measures. For example, it is imperative to the validity of these PC measures that the results are repeatable. Also, these measures need be examined in a study speciﬁcally designed for their use in order to ensure that all possible confounds can be ruled out. This would allow for the relationship between performance and PC to be examined simultaneously as performance both increases and decreases. Validating these measures would provide more ﬁrm evidence to link PC and performance. These ﬁndings provide compelling evidence to suggest a positive relationship between compliance and team performance, which would mean that PC may be an important part of team proﬁciency. The overall results are in agreement with the most recent research in this area that also signiﬁes a positive relationship (Henning et al., 2001). As found in previous investigations, the current analyses found that the correlation measure showed the strongest predictive relationship with performance. It seems that the simplest measures used were shown to be the most sensitive measures. This suggests that
A.N. Elkins et al. / Applied Ergonomics 40 (2009) 997–1003
measures devised to quantify PC should be straightforward and uncomplicated. It is possible that in the future, measures of PC will ﬁnd many uses throughout the ﬁeld of human factors. As mentioned previously, PC could become an integral part of assessing team training methods and performance. PC could provide a constant, objective assessment of performance that is sorely needed in the billion dollar area of training. Team members could be selected on the degree to which they exhibited PC while completing training. PC could be an important design tool when designing systems for multiple concurrent users. Cooperative work stations could be rated on the extent that they encourage PC (Henning et al., 2001). Although there are countless possible applications for using PC as an objective score of team performance, it could also be an indicator of negative performance. As previously mentioned, Levenson and Gottman (1983) concluded that PC could be problematic due to its association with periods of negative affect. Also, some teams are formed for the express purpose of having different viewpoints and opinions and it is possible that PC among those team members would be detrimental to performance. An example of this would be a design group trying to formulate new ideas. Therefore, it must be noted that while PC can be a good measure of performance for many team activities, it will not apply to all situations. The relevance of PC should be assessed for each situation individually. In conclusion, this study, when noted with the results of existing studies, presents adequate evidence that PC may possibly be a useful tool in training and numerous other applications and should continue to be explored in order to be fully understood. This study presents several measures: directional agreement and correlation; that are computationally simple and easily applied that appear to be sensitive to variations in performance. These measures should be further examined in future studies. References Allen, M.T., Matthews, K.A., Kenyon, K.L., 2000. The relationships of resting baroreﬂex sensitivity, heart rate variability and measures of impulse control in children and adolescents. International Journal of Psychophysiology 37 (2), 185–194.
Backs, R.W., Boucsein, W. (Eds.), 2000. Engineering Psychophysiology. Lawrence Erlbaum Associates, Mahwah, NJ. Beng-Chong, L., Klein, K.J., 2006. Team mental models and team performance: a ﬁeld study of the effects of team mental model similarity and accuracy. Journal of Organizational Behavior 27, 403–418. Berntson, G.G., Bigger, J.T., Eckberg, D.L., Grossman, P., Kaufmann, P.G., Malik, M., Nagaraja, H.N., Porges, S.W., Saul, J.P., Stone, P.H., Van Der Molen, M.W., 1997. Heart rate variability: origins, methods, and interpretive caveats. Psychophysiology 34, 623–648. Bernston, G.G., Quigley, K.S., Lozano, D., 2007. Cardiovascular psychophysiology. In: Cacioppo, J.T., Tassinary, L.G., Bernston, G.G. (Eds.), Handbook of Psychophysiology, third ed. Cambridge University Press, Cambridge, UK, pp. 182–210. Cacioppo, J.T., Petty, R.E., 1983. Social Psychophysiology: A Sourcebook. Guildford Press, New York. Carpenter, T.L., 2006. The Effects of Game-based Virtual Environment Training on Novice Team Performance of a Building Clearing Task. Unpublished masters thesis, Clemson University, Clemson, South Carolina, USA. DiMascio, A., Boyd, R.W., Greenblatt, M., Solomon, H.C., 1955. The psychiatric interview: a sociophysiological study. Diseases of the Nervous System 16, 4–9. Grossman, P., 1983. Respiration, stress, and cardiovascular function. Psychophysiology 20 (3), 284–300. Hatﬁeld, E., Cacioppo, J.T., Rapson, R.L., 1994. Emotional Contagion. Cambridge University Press. Hayano, J., Sakakibara, Y., Yamada, A., Mukai, S., Fujinami, T., Yokoyama, K., Wtanabe, Y., Takata, K., 1991. Accuracy of assessment of cardiac vagal tone by heart rate variability in normal subjects. American Journal of Cardiology 67, 199–204. Henning, R.A., Boucsein, W., Gil, M.C., 2001. Social-Physiological compliance as a determinant of team performance. International Journal of Psychophysiology 40, 221–232. Henning, R.A., Ferris, J.K., Armstead, A.G., 2006. Longitudinal study of social psychophysiological compliance in a four-person research team. CD-ROM proceedings. In: Proceedings of the 16th World Congress on Ergonomics. Maastricht, Netherlands, July 10–14, 2006. Elsevier Ltd., pp. 1–6. ISSN 0003-6870. Henning, R.A., Korbelak, K., 2005. Social-psychophysiological compliance as a predictor of team performance. Psychologia 48 (2), 84–92. Kaplan, H.B., Burch, N.R., Bloom, S.W., 1964. Physiological covariation and sociometric relationships in small peer groups. In: Leiderman, P.H., Shapiro, D. (Eds.), Psychobiological Approaches to Social Behavior. Stanford University Press, Stanford, CA. Levenson, R.W., Gottman, J.M., 1983. Marital interaction: physiological linkage and affective exchange. Journal of Personality and Social Psychology 45 (3), 587–597. Malmo, R.B., Boag, T.J., Smith, A.A., 1957. Phsyiological study of personal interaction. Psychosomatic Medicine 19, 105–119. Porges, S.W., 1995. Cardiac vagal tone: a physiological index of stress. Neuroscience and Biobehavioral Reviews 19 (2), 225–233. Smith, K.U., Smith, M.F., 1966. Cybernetic Principles of Learning and Educational Design. Holt, Rinehart, and Winston, New York. Smith, T.J., Smith, K.U., 1987. Feedback control mechanisms of human behavior. In: Salvendy, G. (Ed.), Handbook of Human Factors. Wiley, New York, pp. 251–293.