# Topic 3 - Data Collection, Sampling and Probability Distributions

## Contents

Gerald Keller (2011), Statistics for Management and Economics (Abbreviated), 9th Edition, pp. 161-209.

## Data Collection And Sampling - Chapter 5

### Direct Observation

Direct Observation is relatively inexpensive, we can make direct measurements, ask questions but generally avoid influencing behaviour.

### Experiments

Experiments can be expensive. Generally, a sample is split into 2 groups where one is a 'control group' and we impose a treatment on the other group (and then compare results).

### Surveys

The problem with surveys is that the response rate is usually low which reduces the validity. I.e. a specific type of person is likely respond which often does not provide a good cross section of the entire population under study.

### Personal Interview

Personal interviews involve asking prepared questions. They tend to be expensive but have high response rates and provide detailed information. Telephone interviews are cheaper but a lower response rate is generally the trade-off.

### Sampling Plans

Statisticians use samples because studying the entire population (whether it be people, houses, animals) takes too long and is too expensive (it is uneconomical).

• Self-Selecting Samples
• Biased, participants are more keenly interested in the issue than other members of the population - i.e. only a certain type of person that may not represent behavioural chracteristics of the entire population will partake in the data collection. This means the sample does not represent the population truthfully.
• E.g. Talk back radio, SLOP - Self-Selected Opinion Poll
• Simple Random Sampling
• A sample selected in such a way that every possible sample with the same number of observations is equally likely to be chosen (from the population of interest)
• Removes Bias
• Stratified Random Sampling
• Is obtained by separating the population into mutually exclusive sets, or strata & then drawing simple random
• E.g. Strata = Gender, Age, Occupation, Income
• Cluster Sampling
• Simple random sample of groups or clusters of elements (e.g. a block of houses)
• Increases sampling error (people living in same households in the same cluster are likely to be similar)
• Cost savings

### Sampling & Non Sampling Errors

• Sampling Error: Differences between the sample & the population that exists only because of the observations that happened to be selected for the sample
• Non Sampling Error: Errors in data acquisition [recording incorrect responses/measurements], Non response error, Selection Bias [when members of the target population cannot possibly be selected in the sample]

## Probability - Chapter 6

### Assigning Probability to Events - Definitions And Concepts

• Random Experiment - an action or process that leads to one of several possible outcomes
• E.g. flip a coin, measure time to assemble a computer
• Event - a collection or set of one or more simple events in a sample space
• An individual outcome of a sample is called a simple event
• Probability of an Event - the sum of the probabilities of the simple events that constitute the event
• E.g. Pass/Fail & know the probabilities of ABCD
• Sample Space - Is a list of all possible outcomes of an experiment, must be exhaustive & mutually exclusive
• Exhaustive - All possible outcomes
• Mutually Exclusive - no two outcomes can occur at the same time
• Mathematical Approach - ½ flipping a head
• Relative Frequency Approach - long run frequency for which the outcome occurs, increased observations = increased accuracy
• Subjective Approach - Degree of belief that we hold in the occurrence of the event

### Marginal And Joint Probability

Marginal Probability is computed by adding across rows or down columns to find the probability of an event occurring. The intersection of events (A & B) is what is known as Joint Probability and is the probability of A & B occurring at the same time. To work out joint probability we use conditional probability (below).

In the above example, the joint probability of passing & having Facebook is 0.5.

In contrast, marginal probabilities will tell you the probability of any one person passing, failing, having facebook or not having facebook. for example, the probability of not having facebook = 0.15 + 0.1 = 0.25

### Conditional Probability (If/Given)

#### Rules

1. P(e│f) = P(e and f) / P(f) = Probability of 'e' occurring given that 'f' has occurred.
• Ratio of joint probability to marginal probability
2. Rearranging yields the multiplication rule for Joint Probability - P(e and f) = P(e |f)P(f)
3. If P(e|f) = P(e) - Conditioning has no effect e & f are said to be independent

In the above example, the probability of passing given that the person has facebook, P(Pass│Has Facebook), P(Pass & Has Facebook)/P(Has Facebook) = (0.5)/0.75 = 2/3 or 66.67%

### Probability Rules And Trees

• Multiplication Rule
• P(A & B) = P(A)*P(B/A) / P(A & B) = P(B)*P(A/B)
• If Independent P(A & B) = P(A)*P(B)
• P(A or B) = P(A) + P(B) – P(A&B)
• If Mutually Exclusive P(A or B) = P(A) + P(B)

### Other Probabilities & Mathematical Expectation

• Heads (\$10 gain), Tails (\$5 loss), Expected Value of the Game 'X' = E(X) = ½ * 10 – ½ * 5 = 2 ½
• Odds into probabilities
• E.g. \$1.67 ALP [Pa], \$2.15 Coalition [Pc]
• Assuming a fair game, 0.67*Pa – 1(1-Pa) = 0, 1.67Pa= 1, Pa = 1/1.67 = 59.9% & Pc = 46.5%
• Pa + Pc 1.064 > 1 (6% profit margin for the betting agency)

## End

This is the end of this topic. Click here to go back to the main subject page for Business and Economic Statistics.