MCA -- MULTIPLE CLASSIFICATION ANALYSIS

GENERAL DESCRIPTION

MCA examines the relationships between several categorical independent variables and a single dependent variable, and determines the effects of each predictor before and after adjus­tment for its inter-correlations with other predictors in the analysis. It also provides informa­tion about the bivariate and multivariate relationships between the predictors and the depend­ent variable. See Andrews, et al., Multiple Classification Analysis, for a complete description of the methodology used.

COMMAND FEATURES

Missing Data:  Cases with missing data on the independent variables may be eliminated (see DELETE option). Cases with missing data on the dependent variable are auto­matically excluded from the analysis. MCA produces RECODE control statements for computing residuals if requested.

PRINTED OUTPUT

Dependent Variable Statistics:  For the dependent variable (Y):

Grand mean

Standard deviation (square root of unbiased estimator of the population variance.)

Sum of Y

Sum of Y-squared

Total sum of squares

Explained sum of squares

Residual sum of squares

Number of cases used in the analysis

The sum of weights

Independent Variable Category Statistics:  For each category of an independent variable:

The number of cases (raw, weighted, and percentages)

Mean and standard deviation

Deviation of the category mean (unadjusted and adjusted)

Adjusted class mean MCA coefficient

Eta and eta squared

Partial beta and beta-squared coefficients

Unadjusted and adjusted sum of squares

Bivariate frequency tables for every pair of predictors (optional)

One-Way Analysis of Variance Summary Statistics:  If only one independent variable is specified, the following are printed:

Eta squared

Adjustment factor

Adjusted eta and eta squared

Total sum of squares

Between-mean sum of squares

Within-groups sum of squares

F value (degrees of freedom are printed)

Interpretation of Results
(from Multiple Classification Analysis, Andrews, Morgan, et al, 1973)

The major interpretation in a MCA is of the adjusted and unadjusted coefficients printed out for each subclass. In a population where there was no correlation among the predictors, the observations in one class of characteristic A would be distributed over all classes of the other characteristics in a fashion identical to the way in which those in other classes of A were distributed. Hence, the unadjusted mean Y for each subclass of A would be an unbiased estimate of the effect of belonging to that class of characteristic A. In the real world, however, characteristics are correlated. Young people are more likely to be in lower income groups, and in higher education groups than are older people. The multivariate process is essentially one of adjusting for these "non-orthogonalities." The adjusted means are estimates of what the mean would have been if the group had been exactly like the total population in its distribution over all the other predictor classifications. It is useful not only to have the "pure" effects of each class adjusted for all the other characteristics, but also to see how these adjusted effects differ from the unadjusted effects.

Both the adjusted and unadjusted coefficients are expressed by the program as deviations from the overall mean, and are constrained so that their sum, weighted by the proportion in each subclass, is zero.

The adjusted coefficients for any predictor may be considered an estimate of the effect of that predictor alone "holding constant" all other predictors in the analysis. Differences between the adjusted and unadjusted coefficients can be analyzed, and explanations for these differences may often be found in the two-way tables of predictors. It is often valuable to compare the coefficients within a predictor to see whether there is a pattern or, possibly, a lack of pattern which is of theoretical interest.

The coefficients for the predictors do not provide definitive information about logical priorities, chains of causation, or about interaction effects. It is possible for the program to assign considerable explanatory power to a variable late in a causal chain, such as an attitude, when much of the credit "really" belongs to a logically prior, but not as powerful variable, such as race.

Interaction effects of two or more predictors on the dependent variable will not be revealed by the program, since the assumption is that the effects of all the predictors are additive, i.e. the effect for predictor A is assumed to be the same for one class of predictor B as it is for every other class.

A difficulty in using the adjusted coefficients as a presentational device is that the additivity assumptions may lead to absurd adjusted means for some groups (less than zero, for instance) if the assumption is inappropriate for the data being analyzed. This is particularly likely when the dependent variable is a dichotomy, such as home ownership. Clearly, it is not sensible to predict that less than 0 percent of a subgroup own a home.

Presentation of Results

It is most informative to the reader to present first the etas and betas, measures of the relative importance of each predictor singly and in competition with the others, and then to present the unadjusted and adjusted sub-group averages, together with a detailed description of what the subclasses represent and with the number of cases in each. (The number of cases should be included because it is an indicator of the potential variability of the estimates.) Multiple R2 unadjusted and multiple R2 adjusted are also usually reported.

We recommend that the results be given in the form of unadjusted and adjusted subgroup averages rather than in the form of deviations because the user finds it easier to scan unadjusted and adjusted subgroup averages than positive and negative deviations. However, the adjusted deviations can be included for convenience in seeing the net effects of each predictor. As noted above, a complication of subgroup averages is that occasionally the expected value is impossible (e.g. negative although the dependent variable is a variable with no negative values); if impossible expected values are presented, a short explanatory note should be included.

Examples of presentation of MCA results can be found in Barfield and Morgan (1969), Blumenthal, Kahn, Andrews and Head (1972), Johnston and Bachman (1972), Johnston (1973), Katona, Strumpel and Zahn (1971), Morgan. David, Cohen and Brazes' (1962), Mueller (1969), and Pelz and Andrews (1966).

RESIDUAL RECODE CONTROL STATEMENT OUTPUT

RECODE control statements to compute predicted and residual values based on the MCA regression may be written to the file assigned to RESIDUAL (option RESIDUALS). These statements may be used with LISTDATA to list the residuals or with TRANS to create a per­ma­nent residuals dataset.

INPUT DATA

The dependent variables must be measured on an interval scale or must be a dichotomy. Predictor variables must be categorical, preferably with six or fewer cate­gories. When using more than one predictor all codes must be in the range 0 to 31.

RESTRICTIONS

1.  Predictor codes must be in the range 0 - 31 when more than one predictor is defined.

2.  The total number of predictor codes, obtained by summing the number of codes for each predictor, must be less than or equal to the value assigned to the MAXC parameter. Thus, if there are two predictors, one with codes 0,1,2 and the other with codes 1,2,3,4, set MAXC to 7 or higher.

CONTROL STATEMENTS

Filter (optional)

Job Title (required if using a Runfile)

Options and Parameters

CRITERION=n                 Tolerance (0.0-1.0)  of the convergence test selected.
Default: CRITERION=.005.

DELETE=(MD1,MD2)     

MD1           Delete all cases where any independent variable equals its first missing-data code.

MD2           Delete all cases where any independent variable equals its second missing-data code.

DEPV=variable number   The dependent variable.

MAXC=n         The maximum total number of predictor codes for all predictors .
Default:  MAXC=99.

MAXI=n           The maximum number of iterations.
Default:  25 iterations.

 

PRINT=(DICT|CODES,TABLES,TRACE)

DICT           Print the input dictionary.

CODES      Print the input dictionary and category labels.

TABLES     Print pair-wise cross-tabulations of independent variables.

TRACE       Print the coefficients from all iterations.

RECODE=n    Use RECODE n, previously entered via the RECODE command.

RESIDUALS   Write RECODE control statements for computing predicted and residual values to RESIDUAL file. The predicted value variable number will be R10000 and the residual value vari­able number will be R10001.

TEST=%MEAN|CUTOFF|%RATIO.

                        The convergence test desired. If not specified, MCA iterates until the maximum number of iterations (MAXI) is exceeded. (see CRITERION)

%MEAN     Test whether the change in all coefficients from one iteration to the next is below a specified fraction of the grand mean.

CUTOFF    Test whether the change in all coefficients from one iteration to the next is less than a specified value.

%RATIO     Test whether the change is less than a specified fraction of the ratio of the standard deviation of the dependent variable to its mean.

VARS=variable numbers 
The list of independent variables. One-way analysis of variance is performed if only one variable is specified.

WT=n              Use variable n as a weight variable         

REFERENCES

Andrews, F. M., J. N. Morgan, J. A. Sonquist and L. Klem. Multiple Classification Analysis. Second edition. Ann Arbor: Institute for Social Research, The University of Michigan, 1973.

EXAMPLES

Example 1:   Predicting income (V268) from  occupation, marital status, and education.

File assignments:           dictin=scf.dic datain=scf.dat

Filter                               include v37=1

Job Title                          PREDICTING INCOME

Options and parameters: print=(dict) depv=v268 V=v251,v30,v32 del=(md1,md2) test=%mean

 

 

               *** MCA -- MULTIPLE CLASSIFICATION ANALYSIS ***

 

                                PREDICTING INCOME

Number of variables: 4 

 

The data are not weighted

 

For the independent variables, cases with MD1 or MD2 values will be deleted

 

The iteration maximum is  25

 

The convergence test is %MEAN

 

The tolerance factor is   .00500     

 

INPUT DICTIONARY:                     

 

  VNUM      NAME                 TYPE  LOC  WID  NDEC      MD1      MD2  REFNO

 

   V30  MARITAL STATUS             I     9   2     0                  9    30

 

   V32  EDUC OF HEAD               I    11   2     0                  9    32

 

   V37  RACE                       I    13   2     0                  9    37

 

  V251  OCCUPATION B               I    25   2     0                      251

 

  V268  TOTAL FAMILY INC           I    27   4     0                      268

 

        0 cases deleted due to missing data on the dependent variable.

 

        0 cases deleted due to missing data on the independent variables.

 

        0 cases deleted due to predictor codes outside the range 0 to 31.

 

      299 cases were used in the analysis.

 

RESULTS BASED ON ITERATION    6

 

DEPENDENT VARIABLE (Y) =   V268   TOTAL FAMILY INC       

 

MEAN                         10528.32   

 

STANDARD DEVIATION           7553.407   

 

SUM OF Y                     3147968.   

SUM OF Y SQUARE              .5014490E+11

 

TOTAL SUM OF SQUARES         .1700208E+11

 

EXPLAINED SUM OF SQUARES     .8352816E+10

RESIDUAL SUM OF SQUARES      .8649263E+10

 

NUMBER OF CASES                       299

 

PREDICTOR  V251   OCCUPATION B           

                                                UNADJUSTED

       NO OF  SUM OF             CLASS        DEVIATION FROM

CLASS  CASES  WEIGHTS   %        MEAN           GRAND MEAN       COEFFICIENT

   0     68       68  22.7     4592.206         -5936.115         -4256.094   

   1     30       30  10.0     16396.07          5867.746          1165.547   

   2     22       22   7.4     19716.09          9187.770          7577.927   

   3     14       14   4.7     15615.71          5087.393          3987.124   

   4     22       22   7.4     9988.636         -539.6847          547.4017   

   5     42       42  14.0     12596.05          2067.727          1663.999   

   6     36       36  12.0     10407.06         -121.2655          461.7471   

   7     36       36  12.0     7910.333         -2617.988         -1574.841   

   8     21       21   7.0     11960.00          1431.679          1774.740   

   9      8        8   2.7     4009.000         -6519.321         -5901.890   

 

                              STANDARD

CLASS     ADJUSTED MEAN      DEVIATION

   0        6272.228          4161.586   

   1        11693.87          9158.358   

   2        18106.25          6896.417   

   3        14515.45          11944.88   

   4        11075.72          5269.902   

   5        12192.32          5372.033   

   6        10990.07          4254.318   

   7        8953.480          5063.992   

   8        12303.06          6163.097   

   9        4626.431          2196.427   

 

 ETA-SQUARE =     .380238        BETA-SQUARE       .195452   

        ETA =     .616634        BETA              .442099   

 

 ETA-SQUARE (ADJ) =     .360938   

        ETA (ADJ) =     .600781   

 

 UNADJUSTED DEVIATION SS =     .646484E+10

   ADJUSTED DEVIATION SS =     .332309E+10

 

PREDICTOR   V30   MARITAL STATUS         

                                                UNADJUSTED

       NO OF  SUM OF             CLASS        DEVIATION FROM

CLASS  CASES  WEIGHTS   %        MEAN           GRAND MEAN       COEFFICIENT

   1    221      221  73.9     12449.90          1921.575          1123.470   

   2     17       17   5.7     7115.882         -3412.439         -2828.932   

   3     41       41  13.7     3732.463         -6795.858         -2956.380   

   4     16       16   5.4     5748.750         -4779.571         -4603.841   

   5      4        4   1.3     7640.000         -2888.321         -1330.495

                              STANDARD

CLASS     ADJUSTED MEAN      DEVIATION

   1        11651.79          7563.060   

   2        7699.389          4465.809   

   3        7571.941          2752.520   

   4        5924.480          4340.339   

   5        9197.826          8306.206   

 

 ETA-SQUARE =     .194470        BETA-SQUARE       .658475E-01

        ETA =     .440988        BETA              .256608   

 

 ETA-SQUARE (ADJ) =     .183511   

        ETA (ADJ) =     .428382   

 

 UNADJUSTED DEVIATION SS =     .330640E+10

   ADJUSTED DEVIATION SS =     .111955E+10

PREDICTOR SUMMARY STATISTICS

 

PREDICTOR   V32   EDUC OF HEAD           

                                                UNADJUSTED

       NO OF  SUM OF             CLASS        DEVIATION FROM

CLASS  CASES  WEIGHTS   %        MEAN           GRAND MEAN       COEFFICIENT

   1     16       16   5.4     5973.375         -4554.946         -564.7311   

   2     71       71  23.7     6579.493         -3948.828         -2085.182   

   3     44       44  14.7     11013.86          485.5426          397.8526   

   4     70       70  23.4     10257.70         -270.6211         -789.0604   

   5     37       37  12.4     11210.03          681.7060         -1273.955   

   6     30       30  10.0     14161.87          3633.546          2836.744   

   7     17       17   5.7     16022.71          5494.385          3034.737   

   8     14       14   4.7     19327.71          8799.393          7518.277

 

                              STANDARD

CLASS     ADJUSTED MEAN      DEVIATION

   1        9963.590          6006.004   

   2        8443.139          4868.404   

   3        10926.17          8730.284   

   4        9739.261          6009.121   

   5        9254.365          5760.727   

   6        13365.06          7470.542   

   7        13563.06          6769.267   

   8        18046.60          12470.24   

 

 ETA-SQUARE =     .203802        BETA-SQUARE       .949135E-01

        ETA =     .451445        BETA              .308080   

 

 ETA-SQUARE (ADJ) =     .184650   

        ETA (ADJ) =     .429709   

 

 UNADJUSTED DEVIATION SS =     .346507E+10

   ADJUSTED DEVIATION SS =     .161373E+10

 

ANALYSIS SUMMARY STATISTICS

 

DEPENDENT VARIABLE (Y) =   V268   TOTAL FAMILY INC       

 

R-SQUARED(UNADJUSTED) = PROP. OF VARIATION EXPLAINED BY FITTED MODEL:  .49128

 

 ADJUSTMENT FOR DEGREES OF FREEDOM =   1.07194

 

*** MULTIPLE R (ADJUSTED) =  .67430    MULTIPLE R-SQUARED (ADJUSTED) =  .45468

 

LISTING OF BETAS IN DESCENDING ORDER

 

RANK     VAR. NO.      NAME                                 BETA

 

   1     V251          OCCUPATION B                       .442099 

   2      V32          EDUC OF HEAD                       .308080 

   3      V30          MARITAL STATUS                     .256608 

 

*** MULTIPLE R (ADJUSTED) =  .67430    MULTIPLE R-SQUARED (ADJUSTED) =  .45468