## Sep 14/15

Sept 14/15: Statistics
Chapter 4: scatter plots
Language: think of input and output.
The input is an explanatory variable (the quantity which one can change).
The output is the response variable (the quantity which changes as a result of the activity which is studied).
Scatterplot: one plots the explanatory variable on the horizontal axis, and the response variable on the vertical axis. Example: state SAT scores.
Example: mass versus energy use.
Example: airline outsourcing versus delayed flights.
Example: manatees versus boats in Florida.
To use the data in the text on your calculator without typing the data in, you will need 1) the cable which connects your calculator to the USB port on your computer and 2) the program TICONNECT which is downloadable from
http://education.ti.com.
You download the data into an array (matrix) in TICONNECT; you then copy each column into a list in TICONNECT; you then download the lists into your TI.
Unfortunately StatsPortal is not set up to download information directly into lists.
It sounds harder than it is.
The alternative is to have the app on StatsPortal plot the scatter plot for you.
The TI will compute the regression line y = ax + b for you, as will CrunchIt.
The regression line is due to Gauss.
The idea is to take all of the equations yi = axi + b, and choose a and b to make the smallest square variation.
$\sum{{{({{y}_{i}}-(a{{x}_{i}}+b))}^{2}}}$
is made as small as possible. This becomes a quadratic in a; we find the vertex (lots of algebra).
This gives us $a=\frac{\sum{xy-n\bar{x}\cdot \bar{y}}}{\sum{{{x}^{2}}-n{{{\bar{x}}}^{2}}}},b=\bar{y}-a\bar{x}$. The TI finds these for you.

The TI will compute the correlation for you, if you turn ON the Diagnostic function. This is done using CATALOG (2nd 0). You scroll down in the CATALOG by using the green keys A – Z.
Once the diagnostic is on, you will see r and r2 displayed; these quantites measure how well the explanatory and response variables are correlated, or run together.
In the case of the manatees, there is a strong hint that the number of manatees killed is strongly related to the number of propellers in the water.
The definition of r is shown on page 104. It is quite complicated, being the average of the products of the z-values for the explanatory and response variables.
Much more complicated versions of this formula are available:
$r=\frac{\sum{xy-n\bar{x}\cdot \bar{y}}}{\sqrt{(\sum{{{x}^{2}}-n{{{\bar{x}}}^{2}})(\sum{{{y}^{2}}-n{{{\bar{y}}}^{2}})}}}}$
Note that r and a have the same numerator.
A perfect correlation will be r = 1 or r = -1. No correlation will be r = 0, or very close to 0.
See page 106 for scatterplots which have various correlation values from 1 to -1.
Correlation is NOT resistant; outliers have a significant effect on the correlation coefficient.
See the significant items in chapter 4 on the chapter review page on 111.