The Assignment Data (LGAData.xls) file contains, LGA data (sourced from ABS) for a population of 400 LGA in Australia. We are required to select a random sample of 50 LGA from this population.
For creating our random sample of 50 LGA from the population we take one of our ID say suppose we take ID of Kapil Yogi i.e MIT170802 . Hence the last two digits of his student I.D. is 02. Hence our dataset will contain LGA 02-51 ( As in between LGA 02-51 there are 50 samples including both ends).
The soft-copy of our sample LGAdata is named as LGA_sample.
Task 1 Figure 1:Histogram Showing type of transport of 50 sampled LGA’s
The variable “V21” gives the information about the type of transport of each LGA. Here 0 is denoted for “Car” and “1” for public transport.
Using Excel we get that 28 LGA (From Frequency Chart) in our sample indicate use of public transport.
We store the values of “V21” in B column.
[ Excel Formula: =COUNTIF(B2: B51,0) ]
Figure 2: Pie Diagram Showing type of occupation of 50 sampled LGA’s
The variable “V11” gives the information about the type of occupation of each LGA. Here, the indications are as follows : Managers=1; Professionals=2;Sales workers=3;Administration=4 .
Using Excel we get that the occupation type occurs most frequently in your sample is Administration indicated by “4”. (Also evident from the pie diagram)
We store the values of “V11” in C column.
[ Excel Formula: =MODE(C2:C51)] Figure 3: Pie Diagram Showing age group of 50 sampled LGA’s
The variable “V3” gives the information about the age Group of each LGA . Here, the indications are as follows : 30-34 =1; 35-39=2; 39-44 =3; 45-50=4
Out of 50 sampled LGA 8 LGA’s satisfy having people within 30-34 years age group.
Hence,required proportion= (From Relative frequency Pie Diagram)
We store the values of V3 in E column.
[ Excel formula: =COUNTIF(E2:E51,1) ]
Task 2
a)
Here in our sample V15 gives the information about Occupation4 i.e number of Administration.
Here we have to sort our sample “occupation 4” data.
Using Excel we have sorted the data using the following path: Home->Sort & Filter
The percentile location formula is given by, = (n + 1)
Here,P represents the percentile rank and n denotes the number of observations under consideration.
Here n=50.
P=70 so we get 35.7.
So interger portion is 35 and fractional part is 0.7. The 35th and 36th observations of ordered data are respectively 11 and 13.So the 70th percentile is 11+(13-11)*0.7=12.4
For The first quartile, P=25 so we get .
So interger portion is 12 and fractional part is 0.75. The 12th and 13th observations of ordered data are respectively 0 and 0.So the first quartile is 0+(0-0)*0.75=0
For the third quartile, P=75 so we get .So interger portion is 38 and fractional part is 0.25. The 38th and 39th observations of ordered data are respectively 17 and 18.So th third quartile is 17+(18-17)*0.25=17.25
b)
The 70th percentile that we have determined informs us that in 70% of the LGA’s the number of people with occupation “4” i.e administration is 12.4(12 if rounded off) or less (in our sample data).
c)
Let,
Here the Inter-Quartile Range is given by, IQR== 17.25-0=17.25, it gives the spread of middle 50% of our data.
So,50% of the LGA’s have number of administration in between 0 and 17.25 .This is a measure of spread which is not influenced by extreme small or large values.
We store the values of V15 in A column.
[ Excel Formula: = QUARTILE(A2:A51,3)-QUARTILE(A2:A51,1) ]
Task 3
a)
Using Excel we find out the following descriptive statistics table of our sample “occupation 4” data.
Mean |
8.46 |
Standard Error |
1.190167 |
Median |
6.5 |
Mode |
0 |
Standard Deviation |
8.415753 |
Sample Variance |
70.8249 |
Kurtosis |
-0.94489 |
Skewness |
0.666345 |
Range |
25 |
Minimum |
0 |
Maximum |
25 |
Sum |
423 |
Count |
50 |
Table 1: Descriptive Statistics of Occupation 4 sample data
[Excel Path: Data->Data Analysis->Descriptive Statistics ]
b)
From the Task 2 we get IQR=17.25 and Q3=17.25 and Q1=0.
Now,
The upper inner fence limit : IFUL = Q3 + 1.5 x IQR =17.25+1.5*17.25=43.125
The lower inner fence limit : IFLL = Q1 – 1.5 x IQR=0-1.5*17.25= -25.875
For your sample “occupation 4”data considering all the measures from previously done,
Here in the data the minimum and maximum values are 0 and 25 respectively .Hence both are within the IFLL,IFUL limits. So there is no outliers.
Here an appropriate measure of central tendency is, Median as measure of skewness is 0.666345 i.e here the data is skewed. Here Median=6.5
Here an appropriate measure of dispersion is, Interquatile range as the data is skewed. Here IQR=17.25.
d)
The variable under consideration is “Number of administration”. Here the mean is 8.46,median=6.5,first quartile=0 and third quartile=17.25,Standard deviation=8.4(approx),IQR=17.25.
Here -=17.25-6.5= 10.75 . ( and =6.5-0=6.5 .
So the data is positively skewed i.e its longer tail is towards larger values of the variable under consideration.
Here ,mean=8.46 refers that on an avg if we pick a LGA randomly then its Number of administration would be 8.46 (9 approx) on an avg.
Here,standard deviation=8.4, is a measure of spread,it accounts all the values of the variable,it measures the variability of the data .It measures how the data is deviated from the mean value.
Here ,IQR i.e inter quartile range gives the range in which the 50% of the middle values .Here the range is [0,17.25]
Task 4
a)
From Table 1, the measure of Kurtosis -0.94489,so the data is Platykurtic i.e. the tails are very thin compared to the normal distribution.
The measure of skewness 0.666345,so the data is positively skewed.
In case of normal distribution Mean=median=mode but here mean=8.46,mode=0 and median=6.5
So according to these three pieces of evidence our sample “occupation 4” data has not been obtained from a normally distributed population .
According to Standard normal table P(Z<1.5)=0.9332 where Z follows standard normal distribution.
Hence , P(-1.5<Z<1.5)=2(0.9332-0.5)=0.8664. (As Z is symmetric about 0)
So 50*0.8664=43.32 i.e 43
So approximately 43 values out of 50 should lie within 1.5 standard deviations from the mean.
c)
According to the descriptive statistics table
Mean=8.46 and standard deviation=8.415753
The bound for 1.5 standard deviation spread from the mean is given by [-4.1636295,21.0836295]
Going through the data we observe that 44 observations out of 50 lie within the above interval so it satisfies the result in (b) . (only difference of One observation can be ruled out). Hence the result does not confirm our conclusion in (a)
[Sheldon, Ross (2010). Introductory Statistics, Academic Press,USA.]
Task 5
Using Excel we find out the following descriptive statistics table of our sample “occupation 4” data.
Here, We have considered only those which are required for computation of the confidence interval.
Mean |
8.46 |
Standard Error |
1.190167 |
Standard Deviation |
8.415753 |
Sample Variance |
70.8249 |
Count |
50 |
Hence :
A point estimate of the mean “Occupation 4” of the population is given by the sample mean
i.e 8.46
A 90% confidence interval estimate of the mean “Occupation 4” of the population.
[ , ]
,n=50, s = 8.415753
=upper 100% point of a t distribution with (n-1) degrees of freedom.
=1.676551 [ Excel Formula: =T.INV(0.95,49) ]
Here, Upper CI=8.46+1.676551=10.45537601
Lower CI=8.46-1.676551=6.464623986
Hence the 90% confidence interval is given by [6.464623986,10.45537601] i.e [6.46,10.46] (upto 2 decimal places)
iii)
In the context of the variable in this task if we collect samples again and again from the population then 90% of the times the population mean number of administration lies within [6.46,10.46] i.e [6,10] (rounded off)
The 90% confidence interval of mean number of administration lies within [6.46,10.46] i.e [6,10] ,hence it does not contain the value 59,so we would not consider the interval estimate obtained in (a), to be satisfactory.
Task 6
(a)
Here we are interested in the values of “V8” i.e Income category. According to our data the indexes are following : $650-$800=0; $801-$850=1
In this case we are focusing on the $650-$800 income earners.
Using Excel we find out the following: Out of 50 LGA’s 24 are $650-$800 income earners.
We store the data of “V8” in column F.
[ Excel Formula: =COUNTIF(F2:F51,0) ]
(i)
A point estimate of the proportion of $650-$800 income earners in the population is obtained
As, (ii)
A 99% confidence interval estimate of the $650-$800 income earners of the population is given by
[ , ]
Here is the observed proportion of $650-$800 income earners in our sample.
is the 100 % point of a standard normal distribution. n is the sample size i.e 50.
Here for 99% confidence interval,α=0.01,n=50, and 2.575829.
The value of is obtained using the following [ Excel Formula: =NORM.INV(0.995,0,1) ]
So,
Upper CI= = 0.661992846
Lower CI= = 0.298007153
Hence the 99% Confidence interval is given by,
[0.298007153, 0.661992846] i.e [0.3,0.66] (upto 2 decimal places)
Let the population proportion of $650-$800 income earners is denoted by P
Now,P follows Normal distribution with mean==0.48 and standard deviation==0.070654086
By empirical rule of Normal distribution the 95% of the values of normal distribution lies within 2 standard deviation from the mean.
So the 95% Confidence interval (based on the Empirical rule) of the $650-$800 income earners in the population is [0.48-2*0.070654086,0.48+2*0.070654086] i.e [0.338692,0.621308] i.e [0.34,0.62] (upto 2 decimal places)
[Akobeng AK. Confidence intervals and p-values in clinical decision making. Acta Paediatr. 2008;97:1004–1007]c)
The 99% confidence interval of the $650-$800 income earners in the population is [0.3,0.66] (upto 2 decimal places) where as, the 95% Confidence interval (based on the Empirical rule) of the $650-$800 income earners in the population is [0.34,0.62] (upto 2 decimal places).
Hence the length of the 95% confidence interval is 0.62-0.34=0.28 and the length of the 99% confidence interval is 0.66-0.3=0.36 . Hence the 99% confidence interval’s length is more than 95% confidence interval’s length as it is quiet obvious as 99% confidence interval will contain the value of the population mean 99% of the times if we repeatedly collect samples from our population where as 95% confidence interval will contain the value of the population mean 95% of the times if we repeatedly collect samples from our population. So the more accurate the confidence interval is the more spread it is. Direction of the spread is expected as we give an interval estimate so when we increase our accuracy level .
[Altman D, Bland JM. Confidence intervals illuminate absence of evidence. BMJ. 2004;328:1016–1017]
References
Sheldon, Ross (2010). Introductory Statistics, Academic Press,USA.
· Hoel,P.G.,(1971),Introduction to Mathematical Statistics,Fourth Edition,USA
· Feller,William(2013),An introduction to Probability Theory and Its Applications,Volume I,Third Edition,U.K.
· Du Prel, J.-B., Hommel, G., Röhrig, B., & Blettner, M. (2009). Confidence Interval or P-Value?: Part 4 of a Series on Evaluation of Scientific Publications. Deutsches Ärzteblatt International, 106(19), 335–339.
· Akobeng AK. Confidence intervals and p-values in clinical decision making. Acta Paediatr. 2008;97:1004–1007
· Altman D, Bland JM. Confidence intervals illuminate absence of evidence. BMJ. 2004;328:1016–1017
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download