This guide presents a full walk-trough tutorial from how to load your survey data into SPSS, to preparing the data, validity and reliability testing and finally theory testing with correlation and regression analysis. The guide is a collection of SPSS instructions from the Research Seminar course at the VU university. Goodluck with your quantitative thesis or research!
How to get ThesisTools questionnaire data into SPSS
Thesistools is a free survey/questionnaire website that most of the Dutch students use to create their questionaires. Importing your Thesistools data into Spss can be done by following these steps.
1. Go to the ThesisTools website > Modify Questionnaire. Login, click ‘Results’ and download the data as an Excel file.
2. Open the Excel file (if you get a warning, just click ‘Yes’) . Delete the Page, Title and Legend rows.
3. Rename what is now the top-left cell (‘Question’) into respondentID (or something like that). Also, you can delete any blank columns. The result is a nice, clean Excel file (see below). Save the file as an Excel (.xls) file . Close Excel.
4. Open SPSS > File > Open Data, set “Type of files” to “Microsoft Excel (*.xls)” and select your Excel file
5. A dialogue screen will appear. Make sure that “Read variable names from the first row of data” is selected (which it is by default) and click ‘OK’. SPSS should now have opened your file, which can later save a proper SPSS file (.sav file). For practical purposes, you may want to shorten the labels/names of the variables. You can do this by going to the ‘Variable View’ sheet.
SPSS Analyses of Survey Data tutorial
This document contains a short description of the analyses that have to be performed on the data collected with the survey that is part of the BRM course. The aim of these analyses is to test your hypotheses by for instance getting insight into (1) the differences between groups within the organization or between different variables and (2) the relationships between different variables.
In this process, three main steps are distinguished:
- Prepare data
- Test operationalizations
- Test theory (hypotheses)
First, we have to make sure the data are ready for further analyses. This concerns the following steps:
Check for errors
Make sure that there are no errors in the dataset, such as impossible values or other mistakes that would affect your analysis.
- Look through the data set for strange things
- Get the descriptives for all items
- Analyze > Descriptives > Frequencies
- Analyze > Descriptives > Descriptives
Make sure that all items (= questions in the dataset) “point the same way”. Sometimes, the same concept is measured by items that are both positive and negative in terms of that concept. For instance, measuring attitude towards a product with items saying both “I like…” and “I dislike…”.
Reverse the codes for items that “point the wrong way”. Recode into new variables, and change values from 5>1, 4>2, 3>3, 2>4 and 1>5.
Transform > Recode > into different variables
Make new variables for open questions (if necessary)
Some questions may be open questions, e.g. “What’s your function?”. If that is the case, you don’t have a variable you can use to compare groups (for instance). In that case, you have to make a new variables with categories (e.g., management, sales, engineering, etc.).
Insert a new variable in the dataset, and manually assign categories to the different functions. Go through the dataset case per case.
Edit > Insert variables
Having gotten our data up to date, we can check for the quality of our operationalizations – in other words, the quality of the way we have measured our variables. Before continuing with the data analysis, we have to make sure that the measures we are basing our analyses on, are OK.
Test whether the different items that we assume to measure one variable, can indeed be taken together into one scale (e.g., kshare1 up to kshare8).
Perform a reliability analysis. Pay attention to the following criteria:
o Cronbach’s alpha (is this higher than .65?)
o Corrected item-total correlation (higher than .30?)
o Alpha if item deleted (can we relevantly improve reliability by deleting items?).
Based on these criteria, decide whether to continue with the scale, to delete some items or do away with the scale altogether.
- Analyze > Scale > Reliability Analysis
- You can click “Statistics” here and ask for Descriptives for scale, items and scale if item deleted.
Create the scales that you have just determined to be sound measurements of the variables you are measuring.
By adding the different items you have decided do belong together, or (preferably) computing the means of these sets of items. The advantage of using the means is that the scale will have scores between 1 to 5, and that you are able to compare the scores on different scales. Otherwise, you couldn’t compare the mean score on a scale consisting of 6 items (with scores ranging from 6 to 30) with the mean score on a scale with 8 items (with a range from 8 to 40).
Transform > Compute
o The “target variable” is the scale you are creating (for instance, “kshare”).
o The “numeric expression” would be the formula to create this scale, e.g.
§ Mean (kshare1, kshare2, kshare3,…, kshare8)
The previous steps have left you with a dataset in which all the key concepts are represented by scales that are valid measurements of these concepts. You now have a scale for each relevant concept, and of course the first few questions in the survey that provide you information on functions, departments, length of employment etc. You can now use these scales and other variables to test assumptions about differences and relationships.
Test for differences
It is interesting to see whether (for instance) people fulfilling different positions in the organization score differently on relevant variables. Do high level experts share more knowledge than administrative personnel? Do managers have a more positive view of the organization than non-managers? Does IT have a more positive perception of IT performance than the business side?
For differences between two groups: Perform a t-test. If you’re comparing two groups, choose the independent samples t-test. The grouping variable is the variable based on which you make groups – for instance, function, where group 1 is managers and group 2 is non-managers. The test variables are the variables for which you want to test whether the groups score differently (for instance, knowledge sharing). Pay attention to the following criteria in the output:
o F-value and significance of F-value (undere “Levene’s test for Equality of Variances”). This value tells you nothing about the actual differences found, it basically tells you in which row of the output you should look for results. What’s tested here, is whether the variances of the scores of both groups on the test variable are equal or not. If they’re not, SPSS is extra cautious with a number of things. If the significance of this F-value is lower than .05, this means that the variances differ significantly – and that you can not assume that there are equal variances, in other words, you should look in the bottom row of the output. That’s where you look for the following statistics.
o T-value: gives an indication of the strength of the difference.
o Degrees of freedom.
o Significance of t-value: this is the decisive statistic. If the t-value is significant, this means that there is a significant difference between the groups – in other words, that (for instance) high level experts share more knowledge than administrative personnel.
- For differences between more than two groups: Perform an analysis of variance (ANOVA). The “Factor” is the variable determining your groups (for instance, “department”), the “Dependent List” contains the variables for which you want to test whether there are differences between these groups (for instance, knowledge sharing). Pay attention to the following criteria in the output:
o F-value and significance of F-value. For this test, this is the decisive statistic: If the F-value is significant, this means that you have found a significant difference between the groups on this variable. Look at the descriptives (you can click that in the menu for the ANOVA) to get more insight into the actual nature of these differences.
o The F-value and descriptives don’t provide a definitive insight into which of the groups you distinguish differ from each other. In order to be able to determine exactly which departments (for instance) score higher or lower than the other ones, click Post Hoc. Then, you can choose from a number of post hoc tests that do tell you which groups differ from each other. Common post hoc tests are LSD, Bonferroni and Tukey. LSD is relatively simple, as it performs a number of t-tests between each of the groups. If the F-value is not significant, doing a Post Hoc test does not make much sense. Then, the conclusion is that there are no differences.
o T-test: Analyze > Compare Means > Independent Samples T-test
o ANOVA: Analyze > Compare Means > One-way ANOVA
§ Don’t forget Post Hoc
Test for relationships between variables
Both theoretically and practically, you can have a lot of assumptions about relationships between variables. For instance, is a knowledge friendly organizational culture positively related to the level of relational social capital? Do the different dimensions of social capital explain the level of knowledge sharing in the organization?
Correlations: If you’re only interested in the way variables are related, but not in the distinction between dependent and independent variables, you can compute correlations between these variables. Compute the Pearson correlation coefficient and see if the variables are positively related, negatively related, or not related at all. Pay attention to the following criteria in the output:
o Pearson’s r: this tells you the strength and direction of the relationship. If this statistic is 1, there is perfectly positive correlation, if it is -1, there is a perfectly negative correlation, and if it is 0, there is no correlation. In practice, of course, it will lie somewhere in between these extremes. There are a number of different views concerning what constitutes a strong or weak correlation, but the leading criterion is:
o The significance of this coefficient. If the level of significance is below .05 (p<.05), the correlation is significant and we can conclude that the two variables are related.
Regression analysis: Most of the time, you will be interested in the distinction between dependent and independent variables, and in the extent to which dependent variables are explained (or influenced) by these independent variables. Then, you will perform a regression analysis, which enables you to test a model like the following:
Typically , Y will be what you enter in the “dependent” box, and X1, X2 and X3 will be entered in the “independents” box. In a regression analysis, pay attention to the following criteria in the output:
o (Adjusted) R-square. This tells you how much variance in the dependent variable is explained by the independents in your model. For instance, an R-square of .48 means that 48% of the variance in the dependent is explained by your model. There is some discussion whether you should look at the R-square itself or the Adjusted R-square, but since the latter is a more conservative estimate it is preferable to choose the Adjusted R-square.
o F-value and significance of F-value. Here, the F-value tells you whether the proportion of variance explained by your model is significant. In other words, whether the (Adj) R-square is significant, if the model has enough explanatory power to be valuable. If the F-value is not significant, that means it makes no sense to continue your analysis and you have to design a different model.
o Betas and significance of Betas. The Beta can be compared to the correlation coefficient, and tells you how strong the relationship between your independent and dependent variable are, and what direction it has (positive or negative). Typically, Beta values will lie between -1 and 1, and the most important thing here is whether the Beta value is significant (p<.05).
o Analyze > Correlations > Bivariate
o Analyze > Regression > Linear
Moderation & Mediation Analysis in SPSS
Here follow the steps on how to do a moderation or mediation analysis in SPSS. For both analyses, use linear regression analyses.
a. analyze>descriptives>save as standardized values (select the independent and moderating variable)
b. transform>compute (calculate the product of the 2 standardized variables)
c. analyze > regression > linear (select your dependent variable, insert the independent and moderating variable in step 1, click next, and add the product in step 2)
d. Is the best of the product significant? Then there is moderation.
a. Is there an association of the independent variable with the mediator? (analyze >regression > linear; independent variable is inserted as independent variable and mediator is added as dependent variable)
b. Is there an association of the mediator with the dependent variable? (analyze > regression > linear; mediator is inserted as independent variable and the dependent variable is added as dependent variable)
c. Is there an association of the independent variable with the dependent variable? (analyze > regression > linear; independent variable is inserted as independent variable and dependent variable is added as dependent variable)
d. Does the association of independent and dependent variable reduce significantly (partial mediation) or disappear (full mediation) in case the mediator is added? (analyze à regression à linear; independent variable is inserted as independent variable and dependent variable is added as dependent variable in step 1; click next, and add the mediator in step 2)
More video tutorials on SPSS
For more information and video tutorials on SPSS go to SPSS student movies and view the following flash movies (these videos run more smoothly when you download them first onto your hard drive):
o Entering Data
o The Syntax Window
o Transforming Data
o One-Way Independent ANOVA