Topic 1  An Introduction to Statistics
This article is a topic within the subject Business & Economic Statistics.
Contents

Required Reading
Gerald Keller (2011), Statistics for Management and Economics (Abbreviated), 9th Edition, pp. 196.
What is Statistics?
Key Concepts
^{[1]}
 Statistics: involves collecting, analysing & interpreting data
 Descriptive Statistics: Organizing, Summarizing & Presenting data in an informative way
 Inferential Statistics: A body of methods to draw conclusions or inferences about characteristics of populations based on sample data
 Population: The group of all items of interest (e.g. people, animals, plants or things)
 Population Parameter: The descriptive measure of a population
 Used to represent certain population characteristics
 Usually represents the information we need (example  mean # of drinks consumed at the Uni (page 5 of textbook))
 Population Parameter: The descriptive measure of a population
 Sample: A set of data drawn from the studied population
 Use small & manageable samples to draw conclusions about the larger group (population)
 Example  computing the mean # of soft drinks consumed by 500 Uni students to infer the population mean
 Statistic: is a descriptive measure of a sample, a quantity calculated from a sample data
 Statistical Inference: The process of making an estimate, prediction or decision about a population based on sample data
 Confidence Level – Proportion of times the statistical inference will be correct
 Significance Level – How frequently the conclusions will be wrong
 Exit Poll Example: A random sample of voters is asked who they voted for (statistic) & through statistical inference we estimate the results of the population
 Statistics: involves collecting, analysing & interpreting data
Graphical Descriptive Techniques
^{[2]}
Type of Data & Information
A variable is some characteristic of a population or sample for example, a variable could refer to the mark received on a statistics exam.
 Values: Possible observations of the variable
 Data: The observed values of a variable
 Discrete Variables: Whole numbers or categories. E.g. cannot get 2.5 heads (coin flipping)
 Cannot take on all values between man/min value
 Continuous Variables: Variables can be any value between max/min value including fractions
 E.g. time taken to finish an exam
 Interval/Quantitative/Numerical Data  Real numbers
 All calculations permitted
 Nominal/Qualitative/Categorical  categories
 No Calculations permitted, only frequency or % occurrence
 Ordinal  Appears to be nominal but order of their values has meaning
 E.g. poor  1, fair  2, good  3, very good  4, excellent – 5 (the numbers could be 6,18,23,45,88 so long as the order is maintained. ) (textbook page 14)
 Only Ranking or Ordering Date calculations e.g. Median (pg 16)
 You cannot try and find the mean
 Time Series: Data referring to measurements at different points in time
 Cross Sectional: Data measured at a single point in time
Nominal Data
Frequency Distribution Tables
 Frequency: How often the event occurs.
 Relative Frequency: Lists the categories/bins and the proportion with which they occur (of the total)
 Cumulative Frequency: Accumulating total of each category
 Mutually Exclusive: Results can only be included in one category (e.g. mode of transport – only can choose one out of car, bus, walk, cycle etc.)
 Please note that 'Ordinal data' should be arranged in order
Result  Frequency  Cumulative Frequency  Relative Frequency 

2  6  6  30% 
3  4  10  20% 
4  10  20  50% 
Bar Charts
A Bar Chart is a visual representation of data presenting the frequency of events. It is typically used for qualitative data (hence spaces between categories/variables)
Pie Charts
Pie Charts create a visual representation of data presenting the relative frequency. Again, they are typically used for qualitative data.
Describing the Relationship between Two Nominal Variables (Bivariate)
Bivariate analysis shows the relationship between two variables.
The following tables are to be used for nominal data.
Cross Tabulation/Classification Table
Occupation  G&M  POST  STAR  SUN  Total 

Blue Collar  27  18  38  37  120 
White Collar  29  43  21  15  108 
Professional  33  51  22  20  126 
Total  89  112  81  72  354 
This table is from the textbook with the information given. It shows the frequency of data fitting 2 variables. For example, 33 people work for G&M and are 'professionals'.
To get the totals on the right hand side, we simply sum up each column number of the same row. For example 27 + 18 + 38 + 38 = 120.
Row Relative Frequency Table
Occupation  G&M  POST  STAR  SUN  Total 

Blue Collar  .23  .15  .32  .31  1 
White Collar  .27  .40  .19  .14  1 
Professional  .26  .40  .17  .16  1 
Total  .25  .32  .23  .20  1 
This row relative frequency table shows the proportion of each value (compared to the total on the row). For example, we know 33 people work for G&M and are professionals. Thus, we know that G&M employs 33 of the 126 professionals. 33/126 = .26
Side By Side Bar Charts
Side by side bar charts are generally used for nominal data.
Graphical Descriptive Techniques II
^{[3]}
Graphical Techniques to Describe a Set of Interval Data
Histogram (Cross Sectional)
Create BINS/Classes to categorize numbers
 Example  if we were looking at how students get to UNSW, examples of classes could be Car, Bicycle, Train, Walk
 too many  doesn’t summarize the data enough
 too few  not enough information
 must be mutually exclusive & exhaustive
 Sturges Formula – 1 + 3.3ln(n) = no. of bins
Key Features of Histograms
 Symmetry: If a straight line is drawn through the middle, separating 2 identical sides
 Skewness: a long tail extending to the left/right
 Which Modal Class (Uni or Bi – Modal)
 Is it Bell Shaped?
 Outliers, Clusters Etc.
Ogive (Cumulative Relative Frequency Graph)
Stem & Leaf
Stem & Leaf as a description tool has a key advantage over the histogram as we can see actual observations and don’t lose potentially useful information.
Describing Time Series Data
Line Chart / Time Series Plot
A line chart involves plotting the variable over time
Describing the Relationship between Two Interval Variables  Bivariate Relationships
Scatter Diagram/Plot
 Data for 2 variables, Independent (X) / Dependant (Y)
 E.g. House size / House Price
 Data for 2 variables, Independent (X) / Dependant (Y)
^{[4]}
Key Features of Scatter Plots
 Linearity
 Positive/Negative Linear Relationship. NB Correlation is not causation
 Non – Linear Relationship
 No Relationship
 Clusters
 Outliers
 Linearity
End
This is the end of this topic. Click here to go back to the main subject page for Business and Economic Statistics.
References
Textbook refers to Gerald Keller (2011), Statistics for Management and Economics (Abbreviated), 9th Edition,.