IXASONIC
IMAGING
8,
165180
(1986)
PATTERN RECOGNITION METHODS FOR OPTIMIZING MULTIVARIATE TISSUE SIGNATURES IN DIAGNOSTIC ULTRASOUND Michael
F. Insanal, R bert F. Wagnerl, Brian 2. Garra', Resa Momenan s , and Thomas H. Shawker
Center
1Office of Science and Technology for Devices and Radiological Health, Rockville, MD 20857
FDA
'Dept. of Diagnostic Radiology National Institutes of Health Bethesda, MD 20205 3Dept.
of Electrical Eng. and Computer George Washington University Washington, DC 20052
Science
Described is a supervised parametric approach to the detection and classification of disease from data. Statistical pattern acoustic recognition techniques are implemented to design the best ultrasonic tissue signature from a set of measurements and for a given task, and to rate its performance in a way that can be compared with other diagnostic tools. In this paper, we considered combinations of four ultrasonic tissue parameters to discriminate, liver and chronic active in z, between normal hepatitis. The separation between normal and diseased samples was made by application of the Bayes decision rule for minimum risk which includes the prior probability for the presence of disease and the cost of misclassification. Large differences in classification performance of various tissue parameter combinations were demonstrated using the Hotelling trace criterion (HTC) and receiver operating characteristic (ROC) analysis. The ability of additional measurements to increase or decrease discriminability, even measurements from other diagnostic modalities, can be evaluated directly in this manner. @ 1986 Academic Press, Inc. Key words:
I.
Classification, discriminant analysis, hepatic disease, Hotelling trace criterion, pattern recognition, principal components, quantitative ultrasound, ROC analysis.
INTRODUCTION
The fundamental problem in diagnostic ultrasound is the detection and classification of target signals in noisy backgrounds. In a detection task, data is analyzed and, based on specified criteria, a decision is made between two classes, e.g., disease or no disease. In a classification task, the choice is among any number of possible classes, e.g., normal tissue types or disease conditions. Conventional Bscan ultrasonography is currently limited by considerable variation in observer performance of such tasks, particularly in the case of liver disease [1,2]. However, recent studies have shown that quantitative ultrasonography, i.e., acoustic and image parameter estimation, can improve the diagnostic performance of an ultrasound exam [2,3]. In quantitative ultrasound, parameters are estimated from the acoustic data to form a tissue signature that, in an ideal implementation, would uniquely define the tissue type and state of health. Past efforts to quantitatively differentiate between normal and diseased tissues using 0X17346/86
165
$3.00
Copright 0 1986 by Academic Press, Inc. All rights of reproduction in any form reserved.
INSANA
ET AL.
single tool)
ultrasound parameters have not produced a consistent diagnostic suggesting that a combination of measurement parameters or features may be required to uniquely each tissue identify encountered _ type Obviously , the best features for classifying tissues have a large variability between or among the different tissue types and a small variability within a tissue Likely candidates are parameters that type. describe physical properties of the tissues, such as ultrasonic attenuation, speed of sound, and scattering. These have demonstrated clinical potential effectiveness beyond that of conventional ultrasonography by revealing quantitative information that may be hidden or otherwise unavailable to the viewer of the image. Recently, a number of papers have appeared that describe multiparameter methods for characterizing liver and breast tissue directly from purely statistical features of the Bscan texture [2,483. The genera 1 approach has been to first measure a large number of statistical properties or features for a region of interest, typically between forty [2] and several hundred [4], for a population of patients with known disease states. The multivariate methods of statistical pattern recognition have been used to reduce the feature space by selecting the least correlated features which best discriminate among the various disease classes of the population. Invariably, these investigations have revealed that fewer than ten image features were required to accurately classify the tissue states of interest, suggesting that the intrinsic dimensionality of the tissue characterization problem (at least for liver and breast tissue!) is probably between three and ten. Examples of image features that were found to be significant for classifying disease states are gray level mean, and measures of correlation, entropy and Unfortunately, this skewness. approach does yield a interpretation of not readily physical the statistical features selected. We have recently described a method of obtaining a tissue signature from the statistical properties of the acoustic speckle that describes the structure of the organ scanned. This work is an extension of our investigations into the statistics of the radiofrequency (rf) echo signal, its envelope (Bscan or magnitude signal), and the squared envelope (intensity signals) for simple and more complex [lo121 scattering 191 media found in diagnostic imaging. Three parameters were derived for use in grading subtle changes in image texture 1131 and machine detecting lowcontrast lesions [ll]. In this paper we describe methods of optimizing performance and automating the use of a multidimensional The goal of our analysis is to determine the smallest parameters required to provide the physician with objective criteria to detect and classify the presence of II. 1.
the diagnostic tissue signature. number of tissue physicallybased disease.
METHODS Describing
tissue
structure
from
image
texture
Histological studies have shown that tissue scatterers vary in size and shape, and that the different structures have varying degrees of spatial order. The simplest biological scattering medium is unclotted blood which is completely disordered, consisting of randomly distributed Rayleigh sea tterers. At the other extreme is the very complex anisotropic structure of skeletal muscle tissue. This tissue is highly ordered, with nearly periodic scatterers that repeat over a long range. The organization of scattering structures for most biological media fall somewhere in between blood and skeletal muscle. In this analysis, soft tissues are represented
166
PATTERN RECOGNITION METHODS IN ULTRASOUND
(random\ (IdI CLASS
pizq /
I long
range
\ order
short (e.g.,
CLASS
/\ pizq
as an scatterers
Schematic
acoustically of three
diagram
uniform classes
III
(Is) CLASS
1.
order vessels)
/pFLYizq
(Ts,vaAs)
Fig.
range blood
of
our
II model
medium in are positioned
of
a soft
which (Fig.
tissue
sets 1).
of
scattering
identical
medium.
discrete
Class I consists of small randomly positioned scatterers of sufficient concentration to give an echo signal with circular Gaussian statistics 1141, i.e., the echo signal is a complex, zeromean Gaussian random variable in which the real and imaginary parts have equal variance. Approximately seven to ten scatterers per resolution cell are sufficient to claim A histogram of the grayscale pixel Gaussian statistics to second order. values for the Bscan (magnitude) data will follow a Rayleigh probability density function (pdf) and the point signaltonoise ratio (SNR 1, i.e., The the ratio of mean to standard deviation, will be 1.91 [9,1?]. histogram of pixels in the squared Bscan (intensity) image is distributed exponentially with a SNR of 1.0. The medium is entirely characterized by firs torder statistics s&h as the average incoherent backsca ttered intenI sity from this diffuse tissue component, d’ Class II consists of small but nonrandomly distributed tissue scatterers with regular (quasiperiodic) and longrange order. This class of scatterer contributes a coherent or specular component to the echo signal that is spatially varying with mean ? and variance var(1 ). If both classes of scatterers, I and II, are pregent and if the dimer?sion of the regular structure is well below the resolution of the imaging system, then var(1 ) is zero and the SNR for magnitude and intensity are larger than if ozly class I scatterers’are present. The nature of the image texture is now more dependent on the properties of the medium. The medium can be described by two firstorder statistical quantities, the average backscattered intensities I and f and the statistics of the speckle are described by a Rician pdf [pO,ll]. ” If the quasiperiodic class II scatterers are resolved, then an average scatterer spacing h can be estimated and var(I ) is greater than zero. A modif ied Rician pdf, that has been generalis;d to include a spatially structured specular signal, describes the statistics of the speckle and has been derived in [lO,ll]. Examples of specular scatterers in tissues with resolvable longrange order are the portal triads in liver parenchyma and the collagenous sheaths that surround muscle fascicles [16,17]. The parameters d and var(Is) are calculated from the secondorder statistics of
167
INSANA
the intensity data necessary to uniquely in section IV.
ET AL.
and, when resolvable identify the medium.
structures This topic
is
are present, discussed
are further
Class III scatterers are nonrandom and specular, but with shortrange order, such as organ surfaces and blood vessels. These shortranged specular structures produce deterministic signals that must be eliminated from the data if tissue homogeneity is to be expected. A simple matched filter technique to automatically identify suspected blood vessels from the image data has been reported [ 12 1. Three scattering features that describe the tissue structure are measured from the autocorrelation and power spectrum, i.e., second order statistical propertires, of the squared Bscan signal. These are i, r = i /I and o: =var”‘(I )/Id and have been described in detail previously ]121.s P’o this threedimznsional feature space we add a fourth feature the slope of the ultrasonic attenuation coefficient with frequency, ao , having the units dB/cmMHz. This quantity is measured from the rf signals using a spectral difference method originally described by Kuc [la], with modifications to eliminate the effects of acoustic beam diffraction [19]. The ratio r is an indication class I and class II scatterers. firstorder statistical measure. are principally secondorder hand, structure of the sea ttering characteristic scale and magnitude 2.
Data
collection
and
of the relative scattering intensities of Like the attenuation coefficient, r is a The parameters 2 and IS: , on the other tissue features. They describe the tissue, in particular the average of specular scatterers.
processing
In this work, rf echo signals are recorded directly from a specially modified Diasonics mechanical sector scanner (Diasonics model DS20). A The 3.5 MHz / 19 mm diameter transducer, focused at 8 cm, was used. the pulse is approximately Gaussianshaped with a FWHM spectrum of bandwidth of 0.88 MHz, corresponding to a FWHM pulse length of 0.6 mm. An operator scanning the patient and viewing the realtime Bscan display monitor selects a region of interest (ROI) for analysis. and the rf signals for that ROI Logarithmic amplification is then disabled, Before recording the are digitized at 8 bits and at a rate of 22.1 MHz. signals, the operator has the opportunity to adjust the (linear) depthgain The compensator to visually obtain a constant overall image brightness. four parameters are calculated offline [20] from the average power spectrum for each ROI and the results are averaged. In this study, four to six ROIs were collected. The ROIs were approximately 4 cm in depth and 2 recorded during suspended cm in width (~20 data vectors) and were respiration. An intensity signal is calculated for each data vector recorded by In a precalculating the squared modulus of the analytic signal [14]. processing step, blood vessels are eliminated from each data vector when detected by matched filtering and the result is detrended to eliminate low The data frequency variance due to incomplete depthgain compensation. vectors are multiplied by a cosine taper window [21] and zeros are added to interpolate the spectrum and obtain vector lengths which are a power of two The power spectrum is calculated for each vector in the for FFT analysis. ROI and averaged. III.
DETECTION
the
The specific smallest error
AND
CLASSIFICATION
clinical between
OF DIFFUSE
objective two classes
in
this of
168
LIVER
paper ultrasound
DISEASE is
to data
discriminate  31 studies
with from
PATTERN
RECOGNITION
METHODS
IN
ULTRASOUND
a
1 0.8
0.2 I 0.8
I 1.0
I 1.2
I 1.4
I 1.6
I 1.8
I 2.0
0.2 I 0.2
I 0.4
I 0.6
zi (mm) Fig.
2.
I 0.8
I 1.0
I 1.2
I 1.4
Yl
Twodimensional scatter diagrams of u’ VS. 2 for liver data (a). from 31 normal subjects (0) and 48 patients %ith chronic hepatitis Scatter diagrams of the first two principal components (A). (b). calculated for all four dimensions of the data population in figure The corresponding eigenvalues for y 2a. and y represent 71 percent of the total variance in the da k a. Discriminability between the two classes for the original and rotated feature spaces remained essentially unchanged.
individuals with no known clinical evidence of liver disease and 48 studies This data is from patients with clinicallyproven chronic hepatitis. Initially we plotted in figure 2a for two of the four possible dimensions. apply principal components analysis to this training data set to study the Then, a statistical properties of each class of data in 4space. discriminant function is calculated to partition the feature space into the By varying the decision threshold of the discriminant two classes. function, an ROC (Receiver Operating Characteristic) curve can be generated to compare the diagnostic performance of features individually and in combinations. Features are selected for a given diagnostic task based on a scalar index of performance, A which is the area under the ROC curve. A large A indicates good diagzn)os tic performance. The ROC results are compared= with another scalar performance measure, the Hotelling trace criterion. 1.
Principal
components
analysis
An efficient and effective tissue characterization program is one which reduces the dimension of its feature space to include only those features that contribute toward detection and classification of disease states. The feature space, of course, is defined by the number and range of parameters used in the tissue signature. Ideally, we’d like each feature to contain uncorrelated information about the tissues. Several highly correlated features may provide good classification performance while conveying essentially the same information due to the high degree of correlation. Principal components (PC) analysis is a tool for studying correlations between features and establishing confidence intervals for predicting where in feature space the members of a class may be expected to fall. The first principal component of a multidimensional linear combination of observed features that accounts It can also fraction of the total feature variance 1221. vector in the direction of the best least squares Line The second principal component is orthogonal to the first
169
observation is the for the Largest be described as a through the data. and is the linear
I 1.6
INSANA
ET AL.
combination that accounts for the next largest fraction of variance, and on. Each are found from the covariance matrix of the sample populations. PC analysis determines sets of features that are optimal for representing They are not optimal for distinguishing among the classes the data. data. This can be seen by plotting the fourdimensional patient data the plane of the first two principal components (Fig. 2b). We have used analysis to graphically visualize the degree of correlation between features. To begin, the tissue  signature is defined, where xl = d, x2 = assumed to be normally distributed
p(x)
where li population. given by
and
E are the mean vector and covariance The sample mean is j; E
and the sample .
first combination Yl
The That
The quantity equation :p;;y’az”, The in the coefficients orientation original
(xx)T> 
=
)‘>
< > and
vet
principal of
the T
= al
coefficient is, alsatisfies
data
x=
vector the
principal of of a_lare of the space. all
Figure 3 shows the corresponding the elliptical If the two roots
x.)
and tor
denotesT (x  El
expected is its
of
set
component measurement allxl
the in first
= cos
matrix of covariance
the features
+ a21x2
is al equation
an
value. corresponding
of
+ a31x3
observations
+ a41x4
eigenvector
of
By
1
’
a21
= cos
the
class of normal data two principal axes, yl boundary marks the contour Xl and A2 are equal the
170
2’
is
a
. the
that satisfies 1 denotes the The vector
e
convention, row vector,
f 2 1
covariance
matrix.
the determinant first principal ~1 is normalized
component is interpreted geometrically greatest scatter in the measurement fact the directional cosines that principal axis relative to the
0
the qarent matrix is
(2)
Al is the greatest eigenvalue The subscript ]A11  s1 = 0. nd here I is the identity matrix. = 1. 1 1
first direction
of in PC
1
2 where o.. = <(x. 2 the c6lu& (_x  z) ‘i’, its transpose. The
feature vet tor  x = (x, , x7 v x2, x3 = o’ , and x4 = oo. I Th> dgta each dfmension, i.e.,
=
s = <(xX) 
linear
or r, in
so
as the data. determine axes in
axis The the the
*’
represented in two dimensions and and y . Along the major axis, y , (standard deviations). of i! .5o The variance ellipse ixlcircular.
PATTERN
RECOGNITION
METHODS
Fig.
1.0
1.2
1.4
1.6
1.8
IN
3.
liver data plotted in two of the four original measurement dimensions. 91 and 82 are the angles between the first princiand the original pal axis y axes, d an a o’, respectively. The &i.pse is the constant density contour containing a? percent of the probability mass for data that is normally distributed. Normal
2.0
d (mm)
in
the
direction
of
the
first
principal
axis
ULTRASOUND
is
given
by
1221
(6) where m is represents population the cluster
the the
in is
number of patients in the class. The )confidence interval 87 percent (1.50 selected arbitrarily. this twoapace an 81 was given by the mean vector, L.
contour for the
in figure 3 normal data The center of
figure 3 are nearly parallel to the original The principal axes in showing that the two features are weakly correlated. (The feature axes, correlation coefficient, p , between z and o’ was found to be 0.2.) Often the number of representative samples in th”e training sets is small and several of the features may be partially or entirely correlated: in this case, the covariance matrix may be illconditioned or singular. One method of improving the condition of the covariance matrix is to increase the size Another method is to reduce the order of the matrix of the training set. Since the principal by combining or eliminating correlated features. components are uncorrelated Linear combinations of the original features, features may be eliminated by using only those principal components y. with This way a Large number of corre’lated the greatest eigenvalues A.. features can be reduced to’s smaller number of oncorrelated features. Usually feature reduction is then performed on the uncorrelated features to avoid redundant information. We have used PC analysis to identify correlated features but have chosen to retain the unrotated feature space to allow for a more direct physical interpretation of the results. Previously a’ = var+(I r ‘were high$ (p < 0.1). 2.
Quadratic
4 instead of [12] we proposed the parameter v = var (Is)/: We replaced v with o: when it was discovere 8 that v and )/I . codrrelated (p = 0.7) whereas o ’ and r were Less correlated s
and
linear
discriminant
functions
A discriminant function is a rule for classifying observation into one of several classes [23,24]. The decision rules are developed from a training set by supervised methods . A training set of data is a set of known class membership. The form of the decision
171
a multivariate discriminants or what are called observations with rule depends on
INSANA
0.6
0.8
1.0
1.2
1.4
1.6
1.6
ET AL,
2.0
0.6
0.6
1.0
1.2
Fig.
4.
1.4
1.6
1.8
2.0
ii (mm)
d (mm)
(a). Normal (0) and chronic hepatitits (A) liver data in two dimens ions. The line is the decision boundary, Eq. (7), for 11’ = 0, i.e., equal prior probabilities and misclassification costs. The accuracy at this threshold is 83 percent. Same as figure 4a, (b). except that the data has been replaced by constant density contours.
characteristics multivariate liver tissue)
of the distribution of 2, which we normal. The two classes in our application and u2 (chronic active hepatitis).
We have used the standard different mean vectors 51 and This rule is quadratic in S2.
4(xX,) 
assumed are o 1
Bayes classifier for two classes x and different covariance matrices ht e data 2 and has the form

(xii,) 
have
 %(xx,) 
T
1 s2
IS11 (xX,) 
+ $
In
__
\<
to be (normal
Sl
&’
with and
(7)
is21
Eq.(7) is a decision rule that assigns the measurement true, and otherwise assigns x toci . P(ci. ) and P(o probabilities that the patientis no&al and i as hepatit where P(K~) + P(h!,) = 1. The term 1’ is the threshold
to class W 1 if ) are the prior 21 s, respectively, value for the
decision
cij
rule
and
is
equal
to
In
[::
2;
, where
z::]
2
is
the
cost
of
misclassifying responses from class j as those of class i. Eq.(7) is known as the Bayes decision rule for minimum risk since it was derived to minimize the expected cost of misclassifying data. As we shall see later, the performance of the classifier is studied for a range of R’ values. Therefore precise estimates of disease prevalence and misclassification The normal and hepatitis data are plotted in two costs are not critical. The line is the resulting quadratic of the four dimensions in figure 4. decision function of Eq.(7) for equal prior probabilities and costs, i.e., the right side of Eq.(7) is zero. Eq . (7) can be reduced to a classes have the same covariance matrices will be different and used: S = P(‘*.l)Sl + P(O*)S2. as
(x 1
X)2
linear function if matrix. In general, therefore an average The linear Bayes
T s1 
172
(ii1
we
assume that the two the sample covariance covariance matrix is classifier is expressed
+ ?F*;2, ,<
a’
*
(8)
PATTERN
RECOGNITION
METHODS
IN
ULTRASOUND
Eq.(8) says to average the noise in the two classes and use the combined The first term in Eq.(8) is the covariance matrix to prewhiten the data. linear discriminant function, the second term is the balance point half way side is the threshold. If the between the means, Q(? + x ) 3 and the right discriminant functiorilis r2 ess than the balance point plus the threshold, It may be instructive to view the vet tor as a prewhitening matched filter [25], in which case
More decision
information making
can
T mx

on be
the found
bmT(Xl+X2) 2
\<
likelihood in [26281.
function
In our experience with small patient patients per class), the linear and quadratic approximately the same net discriminability. 3.
Feature
,t’
.
(9)
approach
numbers Bayes
to
(currently classifiers
statistical
less
that have
50 given
selection
Features are selected based on diagnostic performance. The performance of a classifier can be evaluated for any combination of measurement features by testing data points for condition oj (positive for hepatitis) and measuring the true positive fraction (TPF) a&d false positive fraction (FPF). One possible summary measure is the diagnostic accuracy [29]. This in addition, the prior probability P(ci2) and is defined by the requires, equation A = A is positive negative
the
accuracy fraction fraction
(TPF)
p(ti2)
+ (1FPF)
(1
 P(K,))
.
(10)
of diagnosis at a fixed decision threshold. (TPF) is the sensitivity of the test and (TNF = 1  FPF) is the specificity of the test.
The the
true true
If A is measured using the same set of data that was used to train the classifier, obviously the results will be biased. Using another set of data with known clinical findings is a better test, but one that is not always available. As a compromise, we have chosen the “round robin” approach to performance estimation suggested by Castleman [30] when labeled data is at a premium. In this method, one patient point is withheld from the calculation of the decision rule. The withheld point is then classified from the result and the score is kept. When this is done iteratively for each point in both classes, the overall performance of the set of measurement parameters is estimated for a set of decision thresholds. For example, the accuracy of the classifier shown in figure 4 is 83 percent. The decision boundary curve in figure 4 results when Eq.(7) is an equality and 1 ’ = 0; i.e., equal prior probabilities and misclassification costs. Accuracy depends on disease prevalence P(w2) [31,32]. 90 percent of all patients tested do not have disease, and the decision, regardless of the data, was always highly biased test would be 90 percent accurate! 4.
ROC analysis
the
The performance accuracy over
of the
the range
tissue parameters may of possible decision
173
If, i.e., negative,
be studied thresholds.
for P(OJ,)
example, = 0.10, then this
by measuring This can
be
INSANA ET AL.
Normal 2.0
.oi
Deviate,
1.0
.05 .I
z(FP)
0.0
.3
1.0
.5 .7
2.0
.9 .95
ROC curves for a) all four ultrasound parameters considered ) and the pairs b) (U,r, (d, o'pb~~8 c) (r o#). A is the agea under thi OC cu&e (on a linear scale [29]) and has the range 0.5 6 As 6 1.0.
.99
VW
accomplished by varying the right side of the decision function, Eqs. (7), The actual level of the decision threshold will depend on the (a), or (9). task and the penalties for a wrong answer. In one screening procedure, for example, where the prior probability for disease may be small, the false positive fraction should be kept small and therefore a strict threshold is used [33]. When testing a high risk population, one may wish to relax the strict threshold and accept a higher false positive fraction to be sure of detecting as many of the positive cases as possible. Varying the right side of the decision function will generate a family of (TPF, FPF) pairs which span the range of operating thresholds, and ROC (Receiver Operating Characteristic) curves are a convenient way of displaying these results. ROC curves for three combinations of the four ultrasonic features are given in figure 5. These curves are plotted using a probability scale that is linear in the normal deviate [29]. With this scale, the ROC curve is generally a straight line [43]. A single scalar index of performance is the area under the ROC curve, A , evaluated when the data is plotted on a linear (not probability) scale. $alues for A range between 0.5 (guessing) and 1.0 (perfect discriminating) and may be dgscribed as the sensitivity of the tissue signature averaged over all specificities [32]. As values [34] are plotted in figure 6 for all possible combinations of the four measurement features, (a,r,o' s9~oL 5.
Hotelling
trace
criterion
trace
Another measure of overall criterion (HTC) [35371.
diagnostic The HTC is
J=tr where
SW is
the withinclass
betweenclass is
the expected
the means, of
scatter
the matrix
vector
and is within
, scatter
matrix,
matrix,
by s
(11)
S
Sb
of the mixture
given
performance is the Hotelling expressed by the scalar quantity
of all
=iilP&iS$
the brackets.
174
L classes, The operator
i.e.,
the mean of
tr { } is
the trace
PATTERN
1 Feature
RECOGNITION
METHODS
IN
ULTRASOUND
HotellIng
trsce critwbn,
J
Ares under ROC cum,
A,
I I
2 Features
i
[email protected] 0
3 Features
4 Features
i
0.6
a
r
0
d
0,a
r,a
dl,a
r,o
d,o
d,r
r,o,a
d,o,a
d,r,a
d,r,o
d,r,o,a
Features Fig.
Two summary measures considered IJarameters with chronic hepatitis. indicates one standard
6.
of the performance of the four ultrasound to discriminate between normals and patients The error bar on the A measurement deviation. aa oJ o Z%‘, and d z z. Note: s
The HTC may be considered as a generalized SNR2 to be used in selecting the best features for classification. .I will be large when the difference in the class means is large as compared to the withinclass variability in Therefore, features may be selected by maximizing J. One the data. criterion for an economical reduction in dimensionality involves finding a subspace k, where k 6 n and n is the total number of features, such that the
sum
in any simply simple signature provides
of
the
eigenvalues
of
in
SikSbk
the
subspace
is
larger
than
the
sum
other kdimensional subspace. Experimentally, this can be done by for all proposed tissue signatures. HTC is a comparing J values method of the intrinsic separability of a tissue evaluating that is faster and easier to calculate than ROC curves, but less detailed information.
Unlike ROC analysis, HTC may be extended to the L >, 2 class discrimination probl m. The HTC is simply the multic21ass generalization I of the Hotelling T statistic, and the Hotelling T statistic is the multivariate generalization of the univariate Student t test statistic The evolution of the T* statistic to the HTC is straightforward, but [Xl. no such direct evolution to the multiclass discrimination problem appears imminent in applications of ROC analysis 129).
to
Fukunaga the minimum E
the
Values four
,<
[P(c,)
(231 and probability P(ti,)14
of J are measurement
Barrett of
et error
exp[J/(8 plotted features.
al. for
[ 36 ] have shown classification
P(cl,) in
figure
can the
be related inequality
P(u2))l. 6 for
175
that J E by
all
possible
combinations
of
INSANA
IV.
ET AL.
DISCUSSION
We infer from the results of figure 6 that the of this diagnostic procedure is equivalently summarized index J or by the binormal ROC area A . A fundamental and J may be derived for the twos class case when A dfs tribu tions multivariate normal This are ]441. expressed in terms of the error integral
A
z
overall performance by either the HTC relationship between the underlying data relationship can be
=
(12)
The index J has a range of 0 to  , corresponding to a range to 1.0. We have compared our measurements of A with those Eq. (12) in which measured J values are used. %he agreement measured and predicted values was very good, well within the deviation error bar on A z (Fig.6).
in As of 0.5 predicted by between the one standard
The HTC is an overall measure of class separability that is well suited, and presently used [37], to design diagnostic sys terns. If one wishes to study and compare performance over a broad range of decision thresholds, then ROC analysis is ideal. To illustrate this point, consider the data in figure 6. If resources are limited to measuring one feature, the obvious choice for detecting chronic hepatitis is the average scatterer spacing, d. IncludE a:, r and ao in the tissue signature may increase overall detectability by an amount that may be compared with the added cost of making the additional measurements. Intuitively, one might expect the performance to improve, or at least remain unchanged, by making additional measurements on the tissue. However, as Devijver [45] points out, the overall detectability may decrease by including a correlated feature with marginal or no discriminability. This behavior was not observed in our data to any significant degree since the correlation coefficients were all less than 0.3. Now examine the ROC curves in figure 5 and note that curve (a) is for curve (b) is for (a‘,o’,& and curve (c) is for (r,c1o). features (d,r,UL,co), Including the features (r,ao) in the signature has little effect at low false positive fractions but does add to the detectability at larger FPFs. This information would be missed with such summary measures as As or J. The ROC curve can diagnostic performance. dependent. When studying results have indicated discriminating features.
be very important in the complete evaluation of As with any analysis, the results are highly task other patient populations, the preliminary that the other parameters can be the dominant This is the topic of a future report.
The performance of a classifier is also strongly affected by the ratio of the feature space, n. of the training set size, m, to the dimensionality It has been shown for the two class problem that if the classifier is to yield meaningful generalizations beyond the data, the ratio m/n must be greater than two [38], and where possible on the order of twenty 1391. For m/n is greater than or equal to twenty. data considered in this paper, results are a function of m/n and Jurs [40] points out that classification In his example for m/n = 5, the should be cautiously interpreted. of the training set probability is onehalf that 77 percent of the members A less bias as a result of chance alone. will be correctly classified Given the estimate of the performance may be found with increasing m/n. the importance of optimizing the economics of increasing the training set, dimensionali ty becomes evident.
176
PATTERN
RECOGNITION
METHODS
IN
ULTRASOUND
We searched for important characterization features by investigating the strong and weak points of human observers of textured images for detecting and classifying disease. Based on the work of Julesz [41], it seems that human Burgess et al. [42], and our own simulations 1131, observers are very efficient at discriminating differences in firstorder image brightness, while their efficiency is much image properties, e.g., Lower for higherorder detection tasks. Therefore, image processing using secondorder statistics, e.g., correlation properties of the image, might offer new information. This hypothesis is consistent with our data for normalhepatitis discrimination. Hepatitis cannot be consistently diagnosed from Bscans, possibly because the viewer does not have full access to the secondorder statistical informationin the image. Using our quantitative analysis, the secondorder features (d, oi ) clearly outperform the firstorder features (r,a, ). Psychophysical experiments are currently under way to better understand the role of secondorder statistical properties in conventional diagnostic ultrasonography. Using exclusively firstorder statistical measures to characterize in normal Liver tissues often gives ambiguous results. For example, we have found that the variance in the Bscan image from resolvable II scatterers (ordered structure) reduces the SNR.. from 1.9 to 1.7 falsely indicating a nonGaussian medium, i.e., few scatterers resolution cell. Whenever class II scatterers are present in secondorder statistical measures are needed to separate the variance (variance in the image that is due to regularlyspaced scatterers) from the classical Rician variance to obtain a unique The importance of using features that are associated with 1121. variables is emphasized in this case. V.
soft tissue class or Less, Per tissues, specular coherent signature physical
CONCLUSIONS
Pattern recognition techniques are an effective way to design and evaluate multivariate tissue signatures. For designing a tissue signature, the HTC was shown to be a fast and simple method of selecting the best based on maximizing measurements for detection and classification objectclass separability. ROC analysis may be used to evaluate the diagnostic performance of the tissue signature for comparison with other diagnostic tools. We have applied this formalism to diagnostic ultrasound to automatically discriminate between normal liver and chronic active hepatitis, Tissue classification was based on a fourdimensional ~in vivo. Three of the four measurements are derived from the feature vector. firstand secondorder statistical properties of the acoustic data and describe structural and scattering properties of the organ. The fourth is an estimate of the ultrasonic attenuation. When evaluating this feature vector as a tissue signature, we found that the secondorder statistical properties of the image provides diagnostic information that is not otherwise accessible to observers. ACKNOWLEDGEMENTS The authors gratefully acknowledge the sustained efforts of Mary Ann Russell at the National Institutes of Health in acquiring and managing the growing number of patient data. We also wish to thank Charles E. Metz and colleagues at the University of Chicago for sharing with us programs for statistically analyzing ROC curves and Harry Barrett for many helpful discussions. The mention of either an actual or of Health and Human
commercial implied Services.
products endorsement
177
herein of such
is not products
to
be by
construed the Department
as
INSANA ET AL.
REFERENCES Gosink, B.B., Lemon, SK., Scheible, W., and Leopold, G.R., Accuracy [ll of ultrasonography in diagnosis of hepatocellular disease,  AJR d133 1923 (1979).
[21
Raeth, U., Schlaps, D., Limberg, B., Zuna, I., Lorens, A., van Kaick, G * , Lorenz, W.J., Diagnostic accuracy of and Kommerell, B., computerized Bscan and conventional ultrasonotexture analysis graphy in diffuse parenchymal and malignant liver disease, J. Clin. Ultrasound 13, 8799 (1985).
131
Insana, M.F., Wagner, R.F., Garra, B.S., Statistical Approach to an Expert Diagnostic (1986). Proc. SPIE, Vol. 626, pp. 2429,
[41
Finette, S., Bleier, A., and Swindell, W., Breast tissue classification and recognition using diagnostic ultrasound pattern techniques: I. Methods of pattern recognition, Ultrasonic Imaging 5570 (1983).
and Shawker, TH., Ultrasonic System,
A in
5,
[51
Finette, S., classification techniques: (1983).
Bleier, A.R., Swindell, W., and Haber, K., Breast tissue usinn diagnostic ultrasound and nattern recognition II. Experimental results, Ultrasoni; Imaging 5; 7186
161
Nicholas, D., Nassiri, characterization from 12, 135143 (1986).
171
Cloostermans, M.J.T.M., Mel, H., Verhoef, W.A., and Thijssen, In vitro estimation of acoustic parameters of the liver Med. correlations with histology, Ultrasound Biol. 12, (1986).
[81
Crawford, applications placenta,
191
Wagner, R.F., Smith, S.W., Sandrik, J.M., and Lopez, H., Statistics of speckle in ultrasound Bscans, IEEE Tran. Sonics Ultrason. SU30, 156163 (1983).
ilO
Wagner, R.F., Insana, M.F., and envelope frequency (submitted).
and Brown, D.G., detected signals,
[Ill
Wagner, R.F., Insana, M.F., detection and classification ultrasound, Optical Eng. 2,
and Brown, D.G., of speckle 738742 (1986).
[121
Insana, M.F., Wagner, R.F., Garra, B.S., Brown, Analysis of ultrasound texture via T.H., statistics, Eng. 25, 743748 (1986). Optical
[131
Wagner, R.F., Insana, Texture Discrimination pp. 5764 (1985).
1141
Goodman, 1985).
D.K., ultrasonic
Garbutt, Bscan
P., and Hill, data, Ultrasound
C.R., Tissue Med. Biol.
D.C.,
Morris, D.T., Fenton, D.W., and Pryce, W.I., images of digital analysis of ultrasonic Ultrasound Med. Biol. Q, 7984 (1985). 
J.W.,
Statistical
The statistics  J. Opt. Unified texture
Optics,
178
(John
Wiley
Possible of the
&
and
of radio *
approach to the in diagnostic
D.G., and generalized
M.F., and Brown, D.G., Progress in Medical Imaging, in Proc.
J.M., and 3951
Shawker, Rician
in Signal SPIE, Vol Sons,
and 535,
New York,
PATTERN RECOGNITION METHODS IN ULTRASOUND
[16]
Insana, M.F., Wagner, R.F., Garra, B.S., and Smith, S.W., Identification Shawker, T.H., and diffusely scattering structures in phantoms via generalized Rician statistics, (1985) (Abstract only).
[17]
Shawker, T.H., Garra, B.S., Insana, M.F., Wagner, R.F., Stong, G.C., of tissue texture in human skeletal muscle: and Jones, B., Detection Preliminary results of an in vivo and in vitro study, Ultrasonic Imaging 8, 7172 (1986) (Abstract only). ~
118)
Kuc, R., Clinical application of an ultrasound for liver pathology coefficient estimation technique tion, IEEE Trans. Biomed. Eng. BME27, 312319 (1980).
[L9]
Insana, M.F., Zagzebski, J.A., and Madsen, spectral difference method for measuring Ultrasonic (1983). Imaging 5, 331345
[20]
paper are targeted for Although the methods described in this offline processing, a high speed device for estimating the statistical properties of Bscan or intensity images has been proposed at realtime rates. U.S. Patent Application filed November 18, 1985.
1211
Bendat, J.S. and Piersol, A.G., Measurement Procedures, Chapt.9, 1971). Morrison, D.F., Hill, New York,
[23]
Fukunaga, K., Chapts. 3,4,9,
[24]
Morrison, (McGrawHill,
[25]
Wagner, imaging
[26]
Van Trees, 1, Section
127) [28]
1291
1301
Multivariate 1967).
D.F., Multivariate New York, 1967).
R.F., and systems, .
Whalen, A.D., Detection Press, New York, 1971). Green, physics,
Methods,
Analysis
Signals
and York,
New
Chapt.
7, (McGrawRecognition,
Methods,
Chapt.
Unified SNR analysis 30 489518 (1985).
of
medical
and Modulation York, 1968).
Theory,
New
in
Noise,
6,
Chapt.
4,
Vol.
(Academic

D.M. and Swets, J.A., Signal Chapt. 1, (Robert E. Krieger
Detection Theory and PsychoPub., Huntington, NyT974).
and Pickett, R.M., Swets, J.A. Evaluation of Methods from Signal Detection Theory, Chapel, ~New York, 1982r Castleman, K.R., Digital Inc., Engelwood Cliffs,
attenuation characterisa
E.L., Improvements in the ultrasonic attenuation,
Statistical Pattern York, 1972).
Estimation, and Sons, of
T.J., Stong, G.C., of periodic, specular, skeletal muscle and TM Ultrasonic Imaging 7, 87
Statistical
Brown, D.G., Med. Biol.
H.L., Detection, 2.2, (John Wiley
Hall,
.Random  Data: (Wiley Interscience,
Statistical
Introduction to (Academic Press>ew
scans,
Trans.
Burckhardt, C.B., Speckle in ultrasonic Sonics Ultrason. SU25, 16 (1978).
1221
Bmode
IEEE
[15]
Image Processing, NJ, 1979).
179
p.
Diagnostic Systems: (Academic Press, 323,
(PrenticeHall,
INSANA ET AL.
[31]
Kundel, H.L., Investigative
[32]
Wagner, R.F., Fundamentals Theory in Imaging, in Proc.
[33]
Mets, C.E., Basic principles (1978). Med. 8, 283298
1341
A values were calculated using a FORTRAN program ROCFIT modified by C?E. Metz, P.L. Wang, and H.B. Kronman, at the University of Chicago from the program RSCORE II written by D.D. Dorfman. RSCORE II may be found in appendix D of [29].
[351
Gu, Z.H. criterion
[36]
Barrett, Theory,
1371
Smith, W.E. and Barrett, H.H., Hotelling of merit for the optimization of imaging e, 717725 (1986).
[38]
Cover, linear Trans.
[39]
Tou, J.T. and Gonzalez, R.C., (AddisonWesley, PP. 186187,
1401
Jurs, P.C., in analytic
[41]
Julesz, B., interactions,
[42]
Burgess, Efficiency (1981).
[43]
Disease Radiology
prevalence and radiological 11, 107109 (1982).
decision
and Applications SPIE, Vol. 626, pp. of
and Lee, S.H., Optical for image classification, H.H., Myers, K.J., in Proc. SPIE, Vol.
of Signal Detection 765761 (1986).
ROC analysis,
implementation Optical
making,
Seminars
Nuclear
of the Hotelling trace Eng. 2, 727731 (1984).
and Wagner, R.F., 626, pp. 231239
Beyond Signal (1986).
trace criterion systems, J. Opt.
Detection as a figure Sot. Am.
T.M., Geometrical and statistical properties of systems of in equalities with applications to pattern recognition, IEEE Electronic Computers EC14, 326334 (1965).
Pattern recognition chemistry, Science Textons, Nature ~
Pattern Reading,
Recognition MA, 1974).
Principles,
used to investigate (1986). .232 12191224
the elements of 290, 9197 (1981).
texture
A.E., Wagner, R.F., Jennings, of human visual discrimination, Also see J. Appl. Photog. Eng. 5,
multivariate
perception,
R.J.,
data and
and Barlow, Science 2,
their H.B., 9394
76780.
Swets, J.A., Form of empirical ROCs in discrimination tasks: Implications for theory and measurement Psychological Bulletin 99, 181198 (1986).
of
and diagnostic performance,
[44]
Private communications: C.E. Mets to R.F. Wagner, 1980; between H.H. Barrett and R.F. Wagner, 198586; C.E. Metz to H.H. Barrett, 1986; and Fiete R.D., Barrett H.H., Smith WE., and Myers K.J., The Hotelling trace criterion and its correlation with human performance, J. Opt. Sot. Am. (submitted).
1451
Devijver, Pattern
P.A., Statistical Recognition, K.S.
Fu,
pattern recognition, ed. (CRC Press Inc.,
180
in Applications Florida, 1982).
of