KNR 445
Statistical Applications in Science & Technology

Correlation

The purpose of this assignment is to demonstrate how the Pearson Product Moment Correlation Coefficient (r) is used to quantify the degree of association (relationship) between two variables. You will do a hand calculation of r to emphasize that the measure accounts for the relative position of each of the pair of variables relative to the mean of that variable. You will also use SPSS to create a correlation matrix, scattergrams, and to fit a regression line through the points of data used in the scattergram. The data to be used is both the CDC smoking data and the rain data from the Pantagraph.

Homework

Textbook Questions: you should be able to answer them all.

1. Identify all the procedures in SPSS that can be used to calculate the Pearson-product moment correlation coefficient (r).

2. Using the data rain.sav (created last assignment),
    a. create a scattergram and calculate the Pearson product-moment correlation coefficient between:
        i. Year and Year-to-date rainfall.
        ii. Summer rainfall and Year-to-date rainfall.
    b. From the scattergram, explain if the relationship is positive (one variable increases while the other increases) or negative (one variable increases while the other decreases).
    c. Is the interpretation of the relationship between Year and Year-to-date rainfall meaningful? Explain.
    d. Is the interpretation of the relationship between Summer rainfall and Year-to-date rainfall meaningful? Explain.
SPSS Output for Question 2.

3. In the editorial and the letter to the editor of the Indianapolis Star, questions are raised regarding the relationship among tax per pack of cigarettes, the number of persons of legal age who smoke, and the death rate related to cigarettes. Use the CDC smoking data for the following questions investigating the hypothesized relationships:
    a. Using hand calculations ONLY (even for mean values), calculate r between TaxRate and SmokerDeath for the Northwest and Southeast regions. Show all of your calculations.
    b. Using SPSS, calculate a matrix of correlation coefficients (r values) for the variables TaxRate, SmokerDeath and Smkr18.
        i. What value is on the diagonal (from top left to bottom right)? Why is this value along the diagonal?
        ii. What do you notice about the values below the diagonal and those above the diagonal?
        iii. To present a matrix of r values in a report, what changes would you make to the SPSS printout?
iv. One of the options when creating a scattergram is "Matrix". Select this option, and enter the variables TaxRate, SmokerDeath and Smkr18 in the same order as you did for procedure correlate. Comment on any similarity between the matrix of scattergrams created and the matrix of correlation coefficents created in step b. above.
    c. Create z-scores for TaxRate, SmokerDeath and %smokers
    d. Create a scattergram using the z-scores for the variables TaxRate and SmokerDeath.
    e. Create a scattergram for the variables TaxRate and SmokerDeath.
    f. Are scattergrams in 3d and 3e the same or different? Explain why this happens.
    g. Create a scattergram using the z-scores for the variables TaxRate and %smokers.
    h. Create a scattergram for the variables TaxRate and %smokers.
        i. Edit the scattergrams created in g and h to add lines representing the mean values of each variable. (After getting to Chart Editor, select Chart, choose Reference Line and play with this option).
        i. Does the alignment relative to the means of individual data points representing pairs of scores concur with r?
    j. Edit axis titles on scattergram 3.h, and provide a more descriptive title for the entire scattergram.
    k. Edit the scattergram created in 3.h to draw in the regression line (After getting to Chart Editor, select Chart, choose Chart Options, toggle Fit Line and play with this option).
        i. Describe the slope of the regression line in each scattergram. Does the slope concur with the calculated r value?
    l. Using the r and r2 values, interpret the relationship between:
        i. TaxRate and SmokerDeath
        ii. TaxRate and %smokers
        iii. %smokers and SmokerDeath
    m. How do your interpretations compare to those of the editorial writer and the writer of the letter to the editor?
        i. What advice would you give to someone wanting to draw a conclusion regarding a relationship between two or more variables?
SPSS Output for Question 3