# Topic 3 - Data Collection, Sampling and Probability Distributions

This article is a topic within the subject Business & Economic Statistics.

## Contents |

## Required Reading

Gerald Keller (2011), Statistics for Management and Economics (Abbreviated), 9th Edition, pp. 161-209.

## Data Collection And Sampling - Chapter 5

^{[1]}

### Methods of Collecting Data

### Direct Observation

Direct Observation is **relatively inexpensive**, we can **make direct measurements, ask questions but generally avoid influencing behaviour**.

### Experiments

Experiments can be **expensive**. Generally, a sample is split into 2 groups where one is a 'control group' and we impose a treatment on the other group (and then compare results).

### Surveys

The problem with surveys is that the response rate is usually low which reduces the validity. I.e. a specific type of person is likely respond which often does not provide a good cross section of the entire population under study.

### Personal Interview

**Personal interviews** involve asking prepared questions. They tend to be **expensive** but have **high response rates and provide detailed information**. **Telephone interviews** are cheaper but a **lower response rate** is generally the trade-off.

### Sampling Plans

Statisticians use samples because studying the entire population (whether it be people, houses, animals) takes too long and is too expensive (it is uneconomical).

**Self-Selecting Samples**- Biased, participants are more keenly interested in the issue than other members of the population - i.e. only a certain type of person that may not represent behavioural chracteristics of the entire population will partake in the data collection. This means the sample does not represent the population truthfully.
- E.g. Talk back radio, SLOP - Self-Selected Opinion Poll

**Simple Random Sampling***A sample selected in such a way that every possible sample with the same number of observations is equally likely to be chosen (from the population of interest)*- Removes Bias

**Stratified Random Sampling**- Is obtained by separating the population into mutually exclusive sets, or strata & then drawing simple random
- E.g. Strata = Gender, Age, Occupation, Income

- Is obtained by separating the population into mutually exclusive sets, or strata & then drawing simple random

**Cluster Sampling***Simple random sample of groups or clusters of elements*(e.g. a block of houses)- Increases sampling error (people living in same households in the same cluster are likely to be similar)
- Cost savings

### Sampling & Non Sampling Errors

**Sampling Error**:*Differences between the sample & the population that exists only because of the observations that happened to be selected for the sample***Non Sampling Error**:*Errors in data acquisition [recording incorrect responses/measurements], Non response error, Selection Bias [when members of the target population cannot possibly be selected in the sample]*

## Probability - Chapter 6

^{[2]}

### Assigning Probability to Events - Definitions And Concepts

**Random Experiment**- an action or process that leads to one of several possible outcomes- E.g. flip a coin, measure time to assemble a computer

**Event**- a collection or set of one or more simple events in a sample space- An individual outcome of a sample is called a simple event

**Probability of an Event**- the sum of the probabilities of the simple events that constitute the event- E.g. Pass/Fail & know the probabilities of ABCD

**Sample Space**- Is a list of all possible outcomes of an experiment, must be exhaustive & mutually exclusive**Exhaustive**- All possible outcomes**Mutually Exclusive**- no two outcomes can occur at the same time

*Mathematical Approach*- ½ flipping a head*Relative Frequency Approach*- long run frequency for which the outcome occurs, increased observations = increased accuracy*Subjective Approach*- Degree of belief that we hold in the occurrence of the event

### Probabilities

### Marginal And Joint Probability

**Marginal Probability** is computed by adding across rows or down columns to find the probability of an event occurring. The intersection of events (A & B) is what is known as **Joint Probability** and is the probability of A & B occurring at the same time. To work out joint probability we use conditional probability (below).

*In the above example, the joint probability of passing & having Facebook is 0.5.*

*In contrast, marginal probabilities will tell you the probability of any one person passing, failing, having facebook or not having facebook. for example, the probability of not having facebook = 0.15 + 0.1 = 0.25*

### Conditional Probability (If/Given)

#### Rules

**P(e│f) = P(e and f) / P(f)**= Probability of 'e' occurring given that 'f' has occurred.- Ratio of joint probability to marginal probability

- Rearranging yields the multiplication rule for
**Joint Probability**- P(e and f) = P(e |f)P(f) - If
**P(e|f) = P(e)**- Conditioning has no effect e & f are said to be**independent**

*In the above example, the probability of passing given that the person has facebook, P(Pass│Has Facebook), P(Pass & Has Facebook)/P(Has Facebook) = (0.5)/0.75 = 2/3 or 66.67%*

### Probability Rules And Trees

**Multiplication Rule**- P(A & B) = P(A)*P(B/A) / P(A & B) = P(B)*P(A/B)
- If Independent P(A & B) = P(A)*P(B)

**Addition Rule**- P(A or B) = P(A) + P(B) – P(A&B)
- If Mutually Exclusive P(A or B) = P(A) + P(B)

### Other Probabilities & Mathematical Expectation

- Heads ($10 gain), Tails ($5 loss), Expected Value of the Game 'X' = E(X) = ½ * 10 – ½ * 5 = 2 ½
**Odds into probabilities**- E.g. $1.67 ALP [Pa], $2.15 Coalition [Pc]
- Assuming a fair game, 0.67*Pa – 1(1-Pa) = 0, 1.67Pa= 1, Pa = 1/1.67 = 59.9% & Pc = 46.5%
- Pa + Pc 1.064 > 1 (6% profit margin for the betting agency)

## End

This is the end of this topic. Click here to go back to the main subject page for Business and Economic Statistics.

## References

**Textbook** refers to Gerald Keller (2011), Statistics for Management and Economics (Abbreviated), 9th Edition,.