# Topic 1 - An Introduction to Statistics

## Contents

Gerald Keller (2011), Statistics for Management and Economics (Abbreviated), 9th Edition, pp. 1-96.

## What is Statistics?

### Key Concepts

• Statistics: involves collecting, analysing & interpreting data
• Descriptive Statistics: Organizing, Summarizing & Presenting data in an informative way
• Inferential Statistics: A body of methods to draw conclusions or inferences about characteristics of populations based on sample data
• Population: The group of all items of interest (e.g. people, animals, plants or things)
• Population Parameter: The descriptive measure of a population
• Used to represent certain population characteristics
• Usually represents the information we need (example - mean # of drinks consumed at the Uni (page 5 of textbook))
• Sample: A set of data drawn from the studied population
• Use small & manageable samples to draw conclusions about the larger group (population)
• Example - computing the mean # of soft drinks consumed by 500 Uni students to infer the population mean
• Statistic: is a descriptive measure of a sample, a quantity calculated from a sample data
• Statistical Inference: The process of making an estimate, prediction or decision about a population based on sample data
• Confidence Level – Proportion of times the statistical inference will be correct
• Significance Level – How frequently the conclusions will be wrong
• Exit Poll Example: A random sample of voters is asked who they voted for (statistic) & through statistical inference we estimate the results of the population

## Graphical Descriptive Techniques

### Type of Data & Information

A variable is some characteristic of a population or sample for example, a variable could refer to the mark received on a statistics exam.

• Values: Possible observations of the variable
• Data: The observed values of a variable
• Discrete Variables: Whole numbers or categories. E.g. cannot get 2.5 heads (coin flipping)
• Cannot take on all values between man/min value
• Continuous Variables: Variables can be any value between max/min value including fractions
• E.g. time taken to finish an exam
• Interval/Quantitative/Numerical Data - Real numbers
• All calculations permitted
• Nominal/Qualitative/Categorical - categories
• No Calculations permitted, only frequency or % occurrence
• Ordinal - Appears to be nominal but order of their values has meaning
• E.g. poor - 1, fair - 2, good - 3, very good - 4, excellent – 5 (the numbers could be 6,18,23,45,88 so long as the order is maintained. ) (textbook page 14)
• Only Ranking or Ordering Date calculations e.g. Median (pg 16)
• You cannot try and find the mean
• Time Series: Data referring to measurements at different points in time
• Cross Sectional: Data measured at a single point in time

### Nominal Data

#### Frequency Distribution Tables

• Frequency: How often the event occurs.
• Relative Frequency: Lists the categories/bins and the proportion with which they occur (of the total)
• Cumulative Frequency: Accumulating total of each category
• Mutually Exclusive: Results can only be included in one category (e.g. mode of transport – only can choose one out of car, bus, walk, cycle etc.)
• Please note that 'Ordinal data' should be arranged in order
Result Frequency Cumulative Frequency Relative Frequency
2 6 6 30%
3 4 10 20%
4 10 20 50%

#### Bar Charts

A Bar Chart is a visual representation of data presenting the frequency of events. It is typically used for qualitative data (hence spaces between categories/variables)

#### Pie Charts

Pie Charts create a visual representation of data presenting the relative frequency. Again, they are typically used for qualitative data.

### Describing the Relationship between Two Nominal Variables (Bivariate)

Bivariate analysis shows the relationship between two variables.

The following tables are to be used for nominal data.

#### Cross Tabulation/Classification Table

Occupation G&M POST STAR SUN Total
Blue Collar 27 18 38 37 120
White Collar 29 43 21 15 108
Professional 33 51 22 20 126
Total 89 112 81 72 354

This table is from the textbook with the information given. It shows the frequency of data fitting 2 variables. For example, 33 people work for G&M and are 'professionals'.

To get the totals on the right hand side, we simply sum up each column number of the same row. For example 27 + 18 + 38 + 38 = 120.

#### Row Relative Frequency Table

Occupation G&M POST STAR SUN Total
Blue Collar .23 .15 .32 .31 1
White Collar .27 .40 .19 .14 1
Professional .26 .40 .17 .16 1
Total .25 .32 .23 .20 1

This row relative frequency table shows the proportion of each value (compared to the total on the row). For example, we know 33 people work for G&M and are professionals. Thus, we know that G&M employs 33 of the 126 professionals. 33/126 = .26

### Side By Side Bar Charts

Side by side bar charts are generally used for nominal data.

## Graphical Descriptive Techniques II

### Graphical Techniques to Describe a Set of Interval Data

#### Histogram (Cross Sectional)

##### Create BINS/Classes to categorize numbers
• Example - if we were looking at how students get to UNSW, examples of classes could be Car, Bicycle, Train, Walk
• too many - doesn’t summarize the data enough
• too few - not enough information
• must be mutually exclusive & exhaustive
• Sturges Formula – 1 + 3.3ln(n) = no. of bins
##### Key Features of Histograms
• Symmetry: If a straight line is drawn through the middle, separating 2 identical sides
• Skewness: a long tail extending to the left/right
• Which Modal Class (Uni or Bi – Modal)
• Is it Bell Shaped?
• Outliers, Clusters Etc.

#### Stem & Leaf

Stem & Leaf as a description tool has a key advantage over the histogram as we can see actual observations and don’t lose potentially useful information.

### Describing Time Series Data

#### Line Chart / Time Series Plot

A line chart involves plotting the variable over time

### Describing the Relationship between Two Interval Variables - Bivariate Relationships

#### Scatter Diagram/Plot

• Data for 2 variables, Independent (X) / Dependant (Y)
• E.g. House size / House Price
##### Key Features of Scatter Plots
• Linearity
• Positive/Negative Linear Relationship. NB Correlation is not causation
• Non – Linear Relationship
• No Relationship
• Clusters
• Outliers

## End

This is the end of this topic. Click here to go back to the main subject page for Business and Economic Statistics.