# exploratory factor analysis là gì

In multivariate statistics, exploratory factor analysis (EFA) is a statistical method used vĩ đại uncover the underlying structure of a relatively large phối of variables. EFA is a technique within factor analysis whose overarching goal is vĩ đại identify the underlying relationships between measured variables. It is commonly used by researchers when developing a scale (a scale is a collection of questions used vĩ đại measure a particular research topic) and serves vĩ đại identify a phối of latent constructs underlying a battery of measured variables. It should be used when the researcher has no a priori hypothesis about factors or patterns of measured variables. Measured variables are any one of several attributes of people that may be observed and measured. Examples of measured variables could be the physical height, weight, and pulse rate of a human being. Usually, researchers would have a large number of measured variables, which are assumed vĩ đại be related vĩ đại a smaller number of "unobserved" factors. Researchers must carefully consider the number of measured variables vĩ đại include in the analysis. EFA procedures are more accurate when each factor is represented by multiple measured variables in the analysis.

Bạn đang xem: exploratory factor analysis là gì

EFA is based on the common factor model. In this model, manifest variables are expressed as a function of common factors, unique factors, and errors of measurement. Each unique factor influences only one manifest variable, and does not explain correlations between manifest variables. Common factors influence more kêu ca one manifest variable and "factor loadings" are measures of the influence of a common factor on a manifest variable. For the EFA procedure, we are more interested in identifying the common factors and the related manifest variables.

EFA assumes that any indicator/measured variable may be associated with any factor. When developing a scale, researchers should use EFA first before moving on vĩ đại confirmatory factor analysis (CFA). EFA is essential vĩ đại determine underlying factors/constructs for a phối of measured variables; while CFA allows the researcher vĩ đại test the hypothesis that a relationship between the observed variables and their underlying latent factor(s)/construct(s) exists. EFA requires the researcher vĩ đại make a number of important decisions about how vĩ đại conduct the analysis because there is no one phối method.

## Fitting procedures

Fitting procedures are used vĩ đại estimate the factor loadings and unique variances of the model (Factor loadings are the regression coefficients between items and factors and measure the influence of a common factor on a measured variable). There are several factor analysis fitting methods vĩ đại choose from, however there is little information on all of their strengths and weaknesses and many don't even have an exact name that is used consistently. Principal axis factoring (PAF) and maximum likelihood (ML) are two extraction methods that are generally recommended. In general, ML or PAF give the best results, depending on whether data are normally-distributed or if the assumption of normality has been violated.

### Maximum likelihood (ML)

The maximum likelihood method has many advantages in that it allows researchers vĩ đại compute of a wide range of indexes of the goodness of fit of the model, it allows researchers vĩ đại test the statistical significance of factor loadings, calculate correlations among factors and compute confidence intervals for these parameters. ML is the best choice when data are normally distributed because “it allows for the computation of a wide range of indexes of the goodness of fit of the model [and] permits statistical significance testing of factor loadings and correlations among factors and the computation of confidence intervals”.

### Principal axis factoring (PAF)

Called “principal” axis factoring because the first factor accounts for as much common variance as possible, then the second factor next most variance, and ví on. PAF is a descriptive procedure ví it is best vĩ đại use when the focus is just on your sample and you bởi not plan vĩ đại generalize the results beyond your sample. A downside of PAF is that it provides a limited range of goodness-of-fit indexes compared vĩ đại ML and does not allow for the computation of confidence intervals and significance tests.

## Selecting the appropriate number of factors

When selecting how many factors vĩ đại include in a model, researchers must try vĩ đại balance parsimony (a model with relatively few factors) and plausibility (that there are enough factors vĩ đại adequately tài khoản for correlations among measured variables).

Overfactoring occurs when too many factors are included in a model and may lead researchers vĩ đại put forward constructs with little theoretical value.

Underfactoring occurs when too few factors are included in a model. If not enough factors are included in a model, there is likely vĩ đại be substantial error. Measured variables that load onto a factor not included in the model can falsely load on factors that are included, altering true factor loadings. This can result in rotated solutions in which two factors are combined into a single factor, obscuring the true factor structure.

There are a number of procedures designed vĩ đại determine the optimal number of factors vĩ đại retain in EFA. These include Kaiser's (1960) eigenvalue-greater-than-one rule (or K1 rule), Cattell's (1966) scree plot, Revelle and Rocklin's (1979) very simple structure criterion, model comparison techniques, Raiche, Roipel, and Blais's (2006) acceleration factor and optimal coordinates, Velicer's (1976) minimum average partial, Horn's (1965) parallel analysis, and Ruscio and Roche's (2012) comparison data. Recent simulation studies assessing the robustness of such techniques suggest that the latter five can better assist practitioners vĩ đại judiciously model data. These five modern techniques are now easily accessible through integrated use of IBM SPSS Statistics software (SPSS) and R (R Development Vi xử lý Core Team, 2011). See Courtney (2013) for guidance on how vĩ đại carry out these procedures for continuous, ordinal, and heterogenous (continuous and ordinal) data.

With the exception of Revelle and Rocklin's (1979) very simple structure criterion, model comparison techniques, and Velicer's (1976) minimum average partial, all other procedures rely on the analysis of eigenvalues. The eigenvalue of a factor represents the amount of variance of the variables accounted for by that factor. The lower the eigenvalue, the less that factor contributes vĩ đại explaining the variance of the variables.

A short mô tả tìm kiếm of each of the nine procedures mentioned above is provided below.

### Kaiser's (1960) eigenvalue-greater-than-one rule (K1 or Kaiser criterion)

Compute the eigenvalues for the correlation matrix and determine how many of these eigenvalues are greater kêu ca 1. This number is the number of factors vĩ đại include in the model. A disadvantage of this procedure is that it is quite arbitrary (e.g., an eigenvalue of 1.01 is included whereas an eigenvalue of .99 is not). This procedure often leads vĩ đại overfactoring and sometimes underfactoring. Therefore, this procedure should not be used. A variation of the K1 criterion has been created vĩ đại lessen the severity of the criterion's problems where a researcher calculates confidence intervals for each eigenvalue and retains only factors which have the entire confidence interval greater kêu ca 1.0.

### Cattell's (1966) scree plot

Compute the eigenvalues for the correlation matrix and plot the values from largest vĩ đại smallest. Examine the graph vĩ đại determine the last substantial drop in the magnitude of eigenvalues. The number of plotted points before the last drop is the number of factors vĩ đại include in the model. This method has been criticized because of its subjective nature (i.e., there is no clear objective definition of what constitutes a substantial drop). As this procedure is subjective, Courtney (2013) does not recommend it.

### Revelle and Rocklin (1979) very simple structure

Revelle and Rocklin's (1979) VSS criterion operationalizes this tendency by assessing the extent vĩ đại which the original correlation matrix is reproduced by a simplified pattern matrix, in which only the highest loading for each item is retained, all other loadings being phối vĩ đại zero. The VSS criterion for assessing the extent of replication can take values between 0 and 1, and is a measure of the goodness-of-fit of the factor solution. The VSS criterion is gathered from factor solutions that involve one factor (k = 1) vĩ đại a user-specified theoretical maximum number of factors. Thereafter, the factor solution that provides the highest VSS criterion determines the optimal number of interpretable factors in the matrix. In an attempt vĩ đại accommodate datasets where items covary with more kêu ca one factor (i.e., more factorially complex data), the criterion can also be carried out with simplified pattern matrices in which the highest two loadings are retained, with the rest phối vĩ đại zero (Max VSS complexity 2). Courtney also does not recommend VSS because of lack of robust simulation research concerning the performance of the VSS criterion.

### Model comparison techniques

Choose the best model from a series of models that differ in complexity. Researchers use goodness-of-fit measures vĩ đại fit models beginning with a model with zero factors and gradually increase the number of factors. The goal is vĩ đại ultimately choose a model that explains the data significantly better kêu ca simpler models (with fewer factors) and explains the data as well as more complex models (with more factors).

There are different methods that can be used vĩ đại assess model fit:

• Likelihood ratio statistic: Used vĩ đại test the null hypothesis that a model has perfect model fit. It should be applied vĩ đại models with an increasing number of factors until the result is nonsignificant, indicating that the model is not rejected as good model fit of the population. This statistic should be used with a large sample size and normally distributed data. There are some drawbacks vĩ đại the likelihood ratio test. First, when there is a large sample size, even small discrepancies between the model and the data result in model rejection. When there is a small sample size, even large discrepancies between the model and data may not be significant, which leads vĩ đại underfactoring. Another disadvantage of the likelihood ratio test is that the null hypothesis of perfect fit is an unrealistic standard.
• Root mean square error of approximation (RMSEA) fit index: RMSEA is an estimate of the discrepancy between the model and the data per degree of freedom for the model. Values less that .05 constitute good fit, values between 0.05 and 0.08 constitute acceptable fit, a values between 0.08 and 0.10 constitute marginal fit and values greater kêu ca 0.10 indicate poor fit . An advantage of the RMSEA fit index is that it provides confidence intervals which allow researchers vĩ đại compare a series of models with varying numbers of factors.
• Information Criteria: Information criteria such as Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC)  can be used vĩ đại trade-off model fit with model complexity and select an optimal number of factors.
• Out-of-sample Prediction Errors: Using the connection between model-implied covariance matrices and standardized regression weights, the number of factors can be selected using out-of-sample prediction errors.

### Optimal Coordinate and Acceleration Factor

In an attempt vĩ đại overcome the subjective weakness of Cattell's (1966) scree test, presented two families of non-graphical solutions. The first method, coined the optimal coordinate (OC), attempts vĩ đại determine the location of the scree by measuring the gradients associated with eigenvalues and their preceding coordinates. The second method, coined the acceleration factor (AF), pertains vĩ đại a numerical solution for determining the coordinate where the slope of the curve changes most abruptly. Both of these methods have out-performed the K1 method in simulation. In the Ruscio and Roche study (2012), the OC method was correct 74.03% of the time rivaling the PA technique (76.42%). The AF method was correct 45.91% of the time with a tendency toward under-estimation. Both the OC and AF methods, generated with the use of Pearson correlation coefficients, were reviewed in Ruscio and Roche's (2012) simulation study. Results suggested that both techniques performed quite well under ordinal response categories of two vĩ đại seven (C = 2-7) and quasi-continuous (C = 10 or 20) data situations. Given the accuracy of these procedures under simulation, they are highly recommended[by whom?] for determining the number of factors vĩ đại retain in EFA. It is one of Courtney's 5 recommended modern procedures.

Xem thêm: addfr có nghĩa là gì

### Velicer's Minimum Average Partial test (MAP)

Velicer's (1976) MAP test “involves a complete principal components analysis followed by the examination of a series of matrices of partial correlations” (p. 397). The squared correlation for Step “0” (see Figure 4) is the average squared off-diagonal correlation for the unpartialed correlation matrix. On Step 1, the first principal component and its associated items are partialed out. Thereafter, the average squared off-diagonal correlation for the subsequent correlation matrix is computed for Step 1. On Step 2, the first two principal components are partialed out and the resultant average squared off-diagonal correlation is again computed. The computations are carried out for k minus one steps (k representing the total number of variables in the matrix). Finally, the average squared correlations for all steps are lined up and the step number that resulted in the lowest average squared partial correlation determines the number of components or factors vĩ đại retain (Velicer, 1976). By this method, components are maintained as long as the variance in the correlation matrix represents systematic variance, as opposed vĩ đại residual or error variance. Although methodologically akin vĩ đại principal components analysis, the MAP technique has been shown vĩ đại perform quite well in determining the number of factors vĩ đại retain in multiple simulation studies. However, in a very small minority of cases MAP may grossly overestimate the number of factors in a dataset for unknown reasons. This procedure is made available through SPSS's user interface. See Courtney (2013) for guidance. This is one of his five recommended modern procedures.

### Parallel analysis

To carry out the PA test, users compute the eigenvalues for the correlation matrix and plot the values from largest vĩ đại smallest and then plot a phối of random eigenvalues. The number of eigenvalues before the intersection points indicates how many factors vĩ đại include in your model. This procedure can be somewhat arbitrary (i.e. a factor just meeting the cutoff will be included and one just below will not). Moreover, the method is very sensitive vĩ đại sample size, with PA suggesting more factors in datasets with larger sample sizes. Despite its shortcomings, this procedure performs very well in simulation studies and is one of Courtney's recommended procedures. PA has been implemented in a number of commonly used statistics programs such as R and SPSS.

### Ruscio and Roche's comparison data

In 2012 Ruscio and Roche introduced the comparative data (CD) procedure in an attempt improve upon the PA method. The authors state that "rather kêu ca generating random datasets, which only take into tài khoản sampling error, multiple datasets with known factorial structures are analyzed vĩ đại determine which best reproduces the profile of eigenvalues for the actual data" (p. 258). The strength of the procedure is its ability vĩ đại not only incorporate sampling error, but also the factorial structure and multivariate distribution of the items. Ruscio and Roche's (2012) simulation study determined that the CD procedure outperformed many other methods aimed at determining the correct number of factors vĩ đại retain. In that study, the CD technique, making use of Pearson correlations accurately predicted the correct number of factors 87.14% of the time. However, the simulated study never involved more kêu ca five factors. Therefore, the applicability of the CD procedure vĩ đại estimate factorial structures beyond five factors is yet vĩ đại be tested. Courtney includes this procedure in his recommended list and gives guidelines showing how it can be easily carried out from within SPSS's user interface.

### Convergence of multiple tests

A review of 60 journal articles by Henson and Roberts (2006) found that none used multiple modern techniques in an attempt vĩ đại find convergence, such as PA and Velicer's (1976) minimum average partial (MAP) procedures. Ruscio and Roche (2012) simulation study demonstrated the empirical advantage of seeking convergence. When the CD and PA procedures agreed, the accuracy of the estimated number of factors was correct 92.2% of the time. Ruscio and Roche (2012) demonstrated that when further tests were in agreement, the accuracy of the estimation could be increased even further.

### Tailoring Courtney's recommended procedures for ordinal and continuous data

Recent simulation studies in the field of psychometrics suggest that the parallel analysis, minimum average partial, and comparative data techniques can be improved for different data situations. For example, in simulation studies, the performance of the minimum average partial test, when ordinal data is concerned, can be improved by utilizing polychoric correlations, as opposed vĩ đại Pearson correlations. Courtney (2013) details how each of these three procedures can be optimized and carried out simultaneously from within the SPSS interface.

## Factor rotation

Factor rotation is a commonly employed step in EFA, used vĩ đại aide interpretation of factor matrixes. For any solution with two or more factors there are an infinite number of orientations of the factors that will explain the data equally well. Because there is no unique solution, a researcher must select a single solution from the infinite possibilities. The goal of factor rotation is vĩ đại rotate factors in multidimensional space vĩ đại arrive at a solution with best simple structure. There are two main types of factor rotation: orthogonal and oblique rotation.

### Orthogonal rotation

Orthogonal rotations constrain factors vĩ đại be perpendicular vĩ đại each other and hence uncorrelated. An advantage of orthogonal rotation is its simplicity and conceptual clarity, although there are several disadvantages. In the social sciences, there is often a theoretical basis for expecting constructs vĩ đại be correlated, therefore orthogonal rotations may not be very realistic because they bởi not allow this. Also, because orthogonal rotations require factors vĩ đại be uncorrelated, they are less likely vĩ đại produce solutions with simple structure.

Varimax rotation is an orthogonal rotation of the factor axes vĩ đại maximize the variance of the squared loadings of a factor (column) on all the variables (rows) in a factor matrix, which has the effect of differentiating the original variables by extracted factor. Each factor will tend vĩ đại have either large or small loadings of any particular variable. A varimax solution yields results which make it as easy as possible vĩ đại identify each variable with a single factor. This is the most common orthogonal rotation option.

Quartimax rotation is an orthogonal rotation that maximizes the squared loadings for each variable rather kêu ca each factor. This minimizes the number of factors needed vĩ đại explain each variable. This type of rotation often generates a general factor on which most variables are loaded vĩ đại a high or medium degree.

Equimax rotation is a compromise between varimax and quartimax criteria.

### Oblique rotation

Oblique rotations permit correlations among factors. An advantage of oblique rotation is that it produces solutions with better simple structure when factors are expected vĩ đại correlate, and it produces estimates of correlations among factors. These rotations may produce solutions similar vĩ đại orthogonal rotation if the factors bởi not correlate with each other.

Several oblique rotation procedures are commonly used. Direct oblimin rotation is the standard oblique rotation method. Promax rotation is often seen in older literature because it is easier vĩ đại calculate kêu ca oblimin. Other oblique methods include direct quartimin rotation and Harris-Kaiser orthoblique rotation.

### Unrotated solution

Common factor analysis software is capable of producing an unrotated solution. This refers vĩ đại the result of a principal axis factoring with no further rotation. The so-called unrotated solution is in fact an orthogonal rotation that maximizes the variance of the first factors. The unrotated solution tends vĩ đại give a general factor with loadings for most of the variables. This may be useful if many variables are correlated with each other, as revealed by one or a few dominating eigenvalues on a scree plot.

The usefulness of an unrotated solution was emphasized by a meta analysis of studies of cultural differences. This revealed that many published studies of cultural differences have given similar factor analysis results, but rotated differently. Factor rotation has obscured the similarity between the results of different studies and the existence of a strong general factor, while the unrotated solutions were much more similar.

## Factor interpretation

Factor loadings are numerical values that indicate the strength and direction of a factor on a measured variable. Factor loadings indicate how strongly the factor influences the measured variable. In order vĩ đại label the factors in the model, researchers should examine the factor pattern vĩ đại see which items load highly on which factors and then determine what those items have in common. Whatever the items have in common will indicate the meaning of the factor.

Xem thêm: smart tv là gì