Introduction to Statistics

In this section of the website, you can find information and clarification on some of the fundamental parts of statistics. With this, you can find answers to questions you have on these topics. Each of these tabs lead to pages on that specific subject.

Ten Things to Know

1. Basic arithmetic and algebra: When coming into statistics, it can be really helpful to remember the order of arithmetic (Parenthesis, exponents, Multiplication and Division, Addition and Subtraction). You also will be using a little bit of algebra every once in a while, so make sure you brush up on the rules before you jump into stats.

2. Meaning of Greek symbols in math: Looking at statistical formulas might seem a little overwhelming the first time, because they tend to be filled with a lot of Greek letters and symbols. To make that first day of statistical formulas easier, try looking up the meanings of the different symbols being used and write them down.

3. How to find mean, median, and mode: Sometimes, it can be easy to mix up these three when referring to “average.” The differences are important to remember, so if you’re not sure how to find each of these, it might be helpful to look at our FAQ page to find the answer. You’ll definitely need to know for later on in your stats class.

4. Basic (REALLY basic) computer skills: In most statistic classes, you will have to use an online program (such as SPSS or Excel) to interpret the data or run different statistical tests. You don’t have to know how to run them beforehand (that’s what the classes are for!), but it does help to just have a basic understanding of how to work a computer. Also, make sure to take notes in class on the steps needed for running tests, because it might get easy to mix them up later on in the semester.

5. Basic probability: In some statistic classes, you might be required to solve problems about probability. It would be helpful to have some understanding of how you find probability and what it is in a simple way.

6. Ability to interpret word problems: Statistics classes LOVE to use word problems and different scenarios to present information to you. Sometimes it can be difficult to weed through all of the random facts to find what the actual problem is. Before you start working on a problem, try to make sure you have a clear understanding of what the question is asking; that way, you won’t waste time solving a problem you weren’t even asked to solve.

7. Understanding “variance” and “spread”: Variance and spread are terms that you will hear a lot in statistics. They basically refer to how close or far apart the numbers are in the data set (for example, if all of the numbers fall between 1 and 5, this data set would have a smaller spread than a data set with numbers ranging from 1 to 15). Having an understanding of these terms will be helpful when studying statistics.

8. Interpreting graphs: Statisticians often show their information in the forms of graphs. It is helpful to know what the graphs mean and how to interpret them. Not only does this help you understand the data given, but it also helps you know how to present your own data, should you need to make your own graphs someday.

9. How to ask the right questions: In research as well as in statistics, it is extremely helpful to know how to ask the right questions. When you begin to ask questions like, “How does X connect to Y?” or “What happens to Y when it is influenced by X?” it becomes easier to understand the questions given in class or in homework assignments.

10. Know that you can learn this! This is probably the most important point in this whole list. Coming into a statistics class with confidence is crucial to the learning process. Once you realize that you can learn the information, you can push past mistakes or information that confuses you, because you know that you can grow in the process. So when you get that homework assignment that seems impossible, or your teacher is explaining material that just doesn’t make sense, don’t lose hope! Statistics is learnable.

The Normal Distribution

Properties of the Normal Distribution

1. The normal curve is symmetrical about the mean μ

2. The mean is at the middle and divides the area into halves

3. The total area under the curve is equal to 1

4. It is completely determined by its mean and standard deviation σ (or variance σ2)

The Standard Normal Distribution

Central Limit Theorem

Given a sufficiently large sample size from a population with variance, the mean of all samples from the same population will be approximately equal to the mean of the population and tends toward a normal distribution.

Applications of the Normal Distribution

Confidence Intervals

The confidence interval for the mean of when the population has a standard deviation that is known is calculated as:

The confidence interval for the mean of when the population has a standard deviation that is unknown is calculated as:

Descriptions of Data

Mean– Mean is the average of a set of data points. This is calculated by adding up all of the data points in a set of data and then dividing that number by the number of data points in the set.

Median– Median is the middle number of a set of data. If the data set has an even number of terms then the median is the average of the middle two terms of the data set. If the data set is has an odd number of terms then the median is just the middle number of the data set.

Mode– The mode is the data point that occurs most frequently in a set of data.

Standard Deviation– The standard deviation can be calculated using the formula if it is a sample of data and if it is a population of data.

Variance– The variance can be calculated by squaring the standard deviation of a data set

.Range– The range can be calculated by subtracting the lowest data point from the highest data point.

Percentiles– Percentiles represent how much of the data is below that certain point in the date. Thus, the twenty-fifth percentile measures where twenty-fiver percent of the data has been accounted for. The fiftieth percentile measure where fifty-percent of the data has been accounted for, and the seventy-fifth percentile is where seventy-five percent of the data has been accounted for.

Quartiles– Quartiles are used primarily with box and whisker plots and measure twenty-five percent of the data set. So, the first quartile measures the first twenty-five percent of the data, the second quartile is the median of the data, or the fiftieth percentile, and the third quartile corresponds to the seventy-fifth percentile.

Counting Techniques

The Multiplication Principle

Permutations

The multiplication principle is that if there are x amount of ways of doing an action and y ways of doing another action then to combine those to actions you multiply x times y

Permutations are groupings where the order matters in how the objects are grouped. the formula to solve for permutations is:

Combinations

Tree Diagrams

Combinations are groupings where the order does not matter in how the objects are grouped. The formula to solve for combinations is:

Tree diagrams are useful for looking at any sort of grouping as it allows for a good visual of the different combinations that can occur.

Frequency Distributions & Graphs

Pie Chart

Include a key (legend)
Use different colors for different sections
Sections should add up to 100%

Histogram

Measure quantitative data
Data is split into bins or classes
Class width=(Highest Point-Lowest Point) / Number of Bins

Stem & Leaf Plot

“Stem” values are listed vertically, and “leaf” values are listed horizontally
A legend is needed to show how to read the data

Ogive

Graphs cumulative frequencies
Cumulative frequency is the sum of all frequencies up to a certain point
Relative frequency is the frequency at a certain point in time divided by the total frequency

Box & Whisker Plot

Provides clear means of what the quartiles are (Q1, Median, Q3)
Shows range of data (minimum and maximum)
Box outlines the interquartile range
Interquartile Range (IQR)= Q3-Q1
How to find Outliers:
- Q1-(1.5)IQR
- Q3+(1.5)IQR
- If a data point is outside of this range, the point is an outlier

Probability & Probability Distributions

Sample Spaces:

A sample space is simply all of the possible outcomes of an experiment.

Probability Rules:

The subtraction rule– The probability that an event A will occur is one minus the probability that event A does not occur.

The multiplication rule– The probability that both A and B occur is equivalent to the probability that A occurs times the probability of B occurring given that A has already occurred.

The addition rule– The probability that event A or event B occur is equivalent to the probability that A occurs plus the probability that B occurs minus the probability that A and b occurs.

Complementary Events: Events that only have two possible outcomes, either event A happens or event B happens.

The Binomial Distribution: This follows from an experiment that consists of n trials, where each outcome is equally likely, and there are only two possible outcomes, and the probability of success is the same for every trial. The mean is equivalent to n*p, the variance is n*p*(1-p), and the standard deviation is the square root of the variance.

Expected Value– This is just a weighted mean of all the different components that make up a distribution.

Correlation & Regression

What is Correlation

Calculating Correlation

Linear Regression

Hypothesis Testing

Hypothesis testing is vital to the world of applied statistics. What it hopes to accomplish is the comparison of two mutually exclusive statements, the null hypothesis (Ho) and alternative hypothesis (Ha), to determine which is best supported by collected data.

The Null Hypothesis

One-Tailed vs. Two-Tailed

The Alternative Hypothesis

Transcribed as Ho, the null hypothesis is the prediction that there is no difference between the samples being compared.

A one-tailed or directional hypothesis is defined as a prediction in which the region of rejection only exists on one half of the sampling distribution. Essentially, the nature of the effect of the independent variable is predicted.

A two-tailed or non-directional hypothesis is defined as a prediction in which the region of rejection exists on both sides of the sampling distribution. It is predicted that the independent variable will have an effect, but the nature of that effect remains unknown.

Transcribed as Ha, the alternative hypothesis is the prediction that a significant difference does exist between the samples being compared.

Goals of Hypothesis Testing

Reject the Null

Fail to Reject the Null

This is the goal! When you reject the null, the alternative hypothesis is accepted. This mean a difference does exist between the samples being compared.

In this instance the null hypothesis is accepted. This means that there is no difference between the samples being compared.