MNA -- MULTIVARIATE NOMINAL ANALYSIS

 

MNA performs a multivariate analysis of nominal-scale dependent variables, using a series of parallel dummy-variable regressions derived from each of the dependent variable codes, dichotomized to a 0-1 variable. The program's major use is to give an additive multivariate model showing the relationship between a set of predictors and the dependent variable in terms of a set of coefficients analogous to MCA coefficients.

The advantage MNA has over other techniques applicable to the same data is the simplicity and direct interpretability of the MNA coefficients and the categorical pre­diction algorithm. See Andrews and Messenger, Multivariate Nominal Scale Analysis for a complete description of the MNA technique.

Statistics:  MNA computes the univariate distribution of the dependent variable, gives (in effect) a bivariate distribution of the dependent variable with each predictor, and computes and prints the multivariate "MNA coefficients."  Bivariate statistics are the bivariate theta and the code specific and generalized eta-square; they provide two alternatives for measuring the strength of the simple bivariate relationship between a specific predictor and the dependent varia­ble. The program also prints a series of statistics for each predictor called "Beta Square."  These indicate the relative impor­tance of the predictor when holding all other independent variables constant. Multi­var­iate statistics are the multivariate theta and the code specific and generalized R-square.

References:

Andrews, F. M., J. N. Morgan, J. A. Sonquist and L. Klem. Multiple Classification Analysis. Second edition. Ann Arbor: Institute for Social Research, The University of Michigan, 1973.

Andrews, F. M. and R. C. Messenger. Multivariate Nominal Scale Analysis. Ann Arbor:  Institute for Social Research, The University of Michigan, 1973.

 

Information on the analysis.

Numbers of cases eliminated due to missing data on the dependent variable and range of valid codes

Non-empty predictor codes

Minimum number of significant digits in solution vectors

Dependent Variable Statistics.

Frequency distribution

Weighted frequency distribution

Weighted frequency distribution expressed as a percent

R-squared (for each dependent variable code)

Adjusted R-squared (for each dependent variable code)

Predictor Variable Statistics.

Frequency for each code

Weighted frequency for each code

Weighted frequency expressed as a percent for each code

For each predictor code:

Weighted frequency marginal for each code of the dependent variable (Y) expressed as percents

Adjusted percents (sums of percents and coefficients) for each code of the depen­dent variable

Coefficients for each code of the dependent variable

Theta

Eta-squared (for each dependent variable code)

Beta-squared (for each dependent variable code)

Generalized eta-squared

Joint and Multivariate Prediction.

Generalized R-squared

Joint theta (proportion of cases correctly classed)

Classification matrix. Rows of the matrix indicate actual codes; columns indicate pre­dicted codes.

INTERPRETING MNA OUTPUT

Consult the example at the end of this write-up as noted in the following discus­sions. See Multivariate Nominal Scale Analysis (Andrews and Messenger, 1973) for a complete description of how to interpret MNA results.

Examination Strategies

In looking at a large number of detail statistics from MNA, two things are of par­ticular interest: 1) large coefficients, and 2) large differences between the percents and the adjusted percents.

If an independent variable is ordinal scale, the occurrence of monotonic change across successive coefficients or percentages may also be of interest. This occurs in the example in the way V46, "Better or worse a year from now" affects the likelihood of the first car being a compact.

Theta Statistic

The multivariate statistic Theta indicates the proportion of cases correctly classified after taking into account each respondent's scores on all dependent variables. In the example, Theta is .8043 indicating that 80% of the cases could be correctly classified after taking into account each respondent's scores on all independent variables. This is a gain of more than 10 percent­age points over the mode of the overall percentage dis­tribution (69.6% for "Large" car).

Identifying the mode is important; it shows that even if you know nothing about the respondents, you could predict the first car for everyone to be large and be cor­rect 69.6% of the time. Relationships of the independent variables to the depen­dent variable act to increase predictability above this 69.6% level.

The bivariate Theta statistic indicates the proportion correctly classified for a single independent variable.

Forecasts and the Proportion Classed Correctly

For any case a forecast can be derived. The forecast consists of a set of probabili­ties; it shows the likelihood of that case falling into each category of the dependent var­iable. You compute the prob­ability for each category by summing the coefficients rele­vant to that case and adding in the overall percent. Assume we have a person who earns $20,000 a year, is 28 years old, single, has a college degree, expects to be about as well off next year, expects his/her income to be a little bit more next year, and holds a professional position. The forecast is computed as shown in the table below:

   Size of First Car              Small    Compact    Mid-Size     Large

 

   Overall Percents                7.2        8.7       14.5       69.6

 

   Coeff:  $20,000/yr             -5.05       8.10       2.30      -5.35

   Coeff:  28 Years old           11.41       -.25      13.13     -24.29

   Coeff:  Single                 10.40      -2.10      -1.11      -7.19

   Coeff:  College Degree         15.64      -4.17       1.23     -12.69

   Coeff:  About the Same.         6.73      -4.91      -4.25       2.44

   Coeff:  A Little More Income    2.03       -.97      -2.74       1.68

   Coeff:  Professional          -19.31       1.05      15.73       2.52

 

   Forecast:                      29.10       5.45      38.78      26.69

The forecast gives a set of predicted scores for each case; you predict a case to be in the dependent variable for which the probability is highest. The person represented in the table above would be assigned the "Mid-Size" category.

 

Example:   Explaining size of first car for childless families. Predictors are income (bracketed), age of head of household (bracketed), education, and feelings of "well-offness."

 

                *** MNA - MULTIVARIATE NOMINAL SCALE ANALYSIS **

 

                EXPLAINING SIZE OF FIRST CAR FOR CHILDLESS FAMILIES

 

 Number of variables:   8

 

The data are not weighted

 

 Transforming the data by RECODE number 1  

 

 For the dependent variable, cases with MD1 or MD2 values will be deleted

 

 Number of cases =       138

 

         0 cases deleted due to missing data on the dependent variable.

 

         0 cases deleted due to missing data on the independent variables.

 

         0 cases deleted due to predictor codes outside range -99 to 999.

 

     PREDICTOR                          NON-EMPTY CODES

         R1 BRACKETED INCOME           2   3   4   5   6                       

         R2 BRACKETED AGE              1   2   3   4                           

        V30 MARITAL STATUS             1   2   3   4   5                       

        V32 EDUC OF HEAD               1   2   3   4   5   6   7   8           

        V46 B/W YEAR FROM NOW          1   3   5   8                            

        V49 SM/LG INC NEXT YEAR        0   1   3   5   8   9                   

       V251 OCCUPATION B               0   1   2   3   4   5   6   7   8   9   

 

 *** THE MINIMUM NUMBER OF SIGNIFICANT DIGITS IN THE SOLUTION VECTORS IS 4

 

 DEPENDENT VARIABLE  V193   SIZE OF CAR

 

 Code              1        2        3        5

               Small  Compact Mid-Size    Large   Totals 

 Frequency        10       12       20       96      138

 Percent         7.2      8.7     14.5     69.6    100.0

           

  R-squared    .2532    .3325    .3349    .3295

  Adjusted     .0000    .1035    .1066    .0994

 

        

                         *** MULTIVARIATE STATISTICS ***                        

 

      GENERALIZED R-SQUARED    .3207     MULTIVARIATE THETA    .8043    

 

 

                          CASES CORRECTLY CLASSED

 

                                  1        2        3        5

                              Small  Compact Mid-Size    Large

                         N    2.000    7.000    9.000   93.000

                PROPORTION     .200     .583     .450     .969

 

         ACTUAL(rows) vs. PREDICTED(columns) CLASSIFICATION MATRIX

 

                       |       1|       2|       3|       5|

                       |   Small| Compact|Mid-Size|   Large| Totals 

                       |--------|--------|--------|--------|

              Small   1|       2|       1|       1|       6|      10

                 ROW  %|    20.0|    10.0|    10.0|    60.0|   100.0

                       |--------|--------|--------|--------|

            Compact   2|       0|       7|       0|       5|      12

                 ROW  %|      .0|    58.3|      .0|    41.7|   100.0

                       |--------|--------|--------|--------|

           Mid-Size   3|       0|       1|       9|      10|      20

                 ROW  %|      .0|     5.0|    45.0|    50.0|   100.0

                       |--------|--------|--------|--------|

              Large   5|       0|       1|       2|      93|      96

                 ROW  %|      .0|     1.0|     2.1|    96.9|   100.0

                       |--------|--------|--------|--------|

 

                 Totals        2       10       12      114      138

                 ROW  %      1.4      7.2      8.7     82.6    100.0