Though there is a slight difffference in the age distribution between the two gen ders in the prisoners, the distribution is not very signifificant. This is illustrated by the bar plots. To plot the bar plots, we make use of the following R Code:
>prisoners = cbind(Males = c(920, 4684, 5624, 5407, 4609, 3756, 2408, 1541, 975, 595, 681),
Females= c(65, 316, 475, 483, 404, 349, 234, 135, 72, 32, 27))
>rownames(prisoners) = c(“19 years and under”, “20 to 24 years”, “25 to 29 years”,”30 to 34 years”,
“35 to 39 years”, “40 to 44 years”,”45 to 49 years”, “50 to 54 years”, “55 to 59 years”,
“60 to 64 years”, “65 years and over”)
>barplot(prisoners, main=”Bar plot of Gender vs age of Prisoners”,col=colors(),
xlab=”gender of prisoners”,legend=rownames(prisoners))
The resulting bar plot is illustrated The rate(ratio) of car insurance claims to policies in Stockholm is
23174
32614
0.0711, whereas, the rate of car insurance claims to policies in rural areas is
31913
846957 = 0.0377. Thus, from the rates, it is clear that there are more claims in
Stockholm as compared to rural areas.Yes, the Body Mass Index(BMI) diffffers for those PIMA females who have been pregnant with those who have never been pregnant. This is seen in the box-plots for the Body Mass Index with respect to their pregnancy status. The code in R to produce the box plots are as follows:
>Sheet=read.csv(“/Downloads/new_file.csv”,sep=”,”,header = T)
>boxplot(Sheet$BMI~Sheet$EVER.PREGNANT, main=”Difference in BMI with State of Pregnancy”,
xlab=”State of Pregnancy”, ylab=”BMI”)
1plot of Q1(1).png plot of Q1(1).png
Figure 1: Bar plot of data
In the above code, the‘new fifile.csv‘ is the modifified PIMA.csv fifile, in which the fifirst column is split into four columns of AGE, DIASTOLIC, BMI and EVER.PREGNANT for ease of analysis.
The box plots produced The plot clearly shows that females who have never been pregnant to have higher BMI than those who have been pregnant.The scatter plot, when plotted clearly shows us a non-zero correlation between the Diastolic Blood pressure and the age of the PIMA people. We calculate the coeffiffifficient of correlation by Pearson’s method using R’s bulit in ‘cor‘ function. We also calculate the correlation coeffiffifficient using Spearman Rank correlation.The confifidence interval at 95% level is calculated by permuting or bootstrapping the data. Alternatively, the 95% confifidence interval is also calculated using the function ‘CIr‘ from the package ”psychometric”. The exact R code is as follows:
(i)For plotting the scatter plot and histograms:
>Sheet=read.csv(“/Downloads/new_file.csv”,sep=”,”,header = T)
>plot(Sheet$AGE, Sheet$DIASTOLIC)
>par(mfrow=c(2,1))
>hist(Sheet$AGE)
>hist(Sheet$DIASTOLIC)
The scatter plot is The code for calculating the coeffiffifficient of correlation is:
>Sheet=read.csv(“/Downloads/new_file.csv”,sep=”,”,header = T)
>cor(Sheet$AGE,Sheet$DIASTOLIC, method=”pearson”)
>cor(Sheet$AGE,Sheet$DIASTOLIC,method=”spearman”)
The coeffiffifficient of correlation was found out to be 0.3259467(pearson’s method)
and 0.3676724(spearman’s method).
The code for calculating the confifidence interval at 95% is given below. In addition, we also have the code for the evidence that correlation is not random. The correlation assuming the null hypothesis-that the true correlation is zero,
is calculated for the bootstrapped data and compared. The observed correlation coeffiffifficient is greater than that obtained when the data set is bootstrapped. Moreover, the observed correlation coeffiffifficient lies within the confifidence interval limits of the correlation coeffiffifficient by bootstrapping at a signifificance level of 95%. Hence, the null hypothesis can be rejected and thus, there is a correlation between age and diastolic blood pressure.
>Sheet=read.csv(“/home/prajnan/Downloads/new_file.csv”,sep=”,”,header = T)
>obs.cor = cor(Sheet$DIASTOLIC,Sheet$AGE)
>x= replicate(1000, {
post.perm = sample(Sheet$AGE)
cor(Sheet$DIASTOLIC, post.perm)
})
>hist(x,col=colors(),xlab=”correlation coefficient assuming null hypo”)
>x= replicate(1000, {
samp = sample(1:n, replace=TRUE, size=n)
cor(Sheet$DIASTOLIC[samp], Sheet$AGE[samp])
})
>quantile(x,c(0.050,0.950))
>hist(x,col=colors(),xlab=”correlation coefficient of the simulations”)
4The confifidence interval was obtained to be (0.2692707, 0.3780436).The histogram of correlation coeffiffifficients assuming null hypotheis is shown coeffff assuming null.png coeffff assuming null.png The histogram of simulated correlation coeffiffifficient showing confifidence inter val is as follows:
confifidence interval.png confifidence interval.png The confifidence at 95% level was also calculated using the package ”psycho
metric” as follows:
>library(psychometric)
>CIr(r=0.3259467, n=732, level=0.95)
In the above code, ‘r‘ corresponds to the coeffiffifficient of correlation, ‘n‘ to the
sample size and ‘level‘ to the confifidence level.
The output of the code was the interval having lower limit = 0.2596146 and upper limit = 0.3892176. We note the difffference in the two methods clearly.The linear regression for the data between Age and Diastolic Blood pressure is fifitted using the data provided in the ‘csv‘ fifiles and the R functions ‘lm‘ and ‘abline‘. Later the null hypothesis that the slope of the regression line is zero is tested by a similar method of bootstrapping applied to the correlation coeffiffifficient before. Since the observed slope lies within the confifidence interval limits for the simulated slope at a signifificance level of 95%, the null hypothesis is rejected.The exact codes are as follows:
>lm(Sheet$AGE~Sheet$DIASTOLIC)
>plot(Sheet$AGE,SHeet$DIASTOLIC)
>abline(lm(Sheet$AGE~Sheet$DIASTOLIC)
>slope = coef(lm(Sheet$DIASTOLIC~Sheet$AGE))[2]
>x= replicate(1000, {
height.perm = sample(Sheet$AGE)
coef(lm(Sheet$DIASTOLIC~height.perm))[2]
>hist(x,col=colors(),xlab=”slope assuming null hypo”)
The resulting output(regression line) The histogram of the slope assuming null hypothesis is shown below 6ssuming null hypo.png ssuming null hypo.pngThe intercept and the slope of the regression was shown to be: Intercept=10.9172
and slope(Sheet$DIASTOLIC)=0.3095 The slope implies that on an average,the Diastolic blood pressure increases by 0.3095 per year increase in the age of PIMA Female.
To calculate the 95% confifidence interval for the mean diastolic blood pressure of 40 year old females, we fifirst fifiltered the ‘csv‘ fifile to include only those rows that correspond to 40 in the AGE column. Then, we used the AVERAGE() and STDEV() functions to calculate the Mean and Standard Deviation of the Diastolic Blood pressure of the 12 40 year old females. Lastly, we used the R’s built in function ‘qnorm‘ to calculate the 95% confifidence interval by using the following formula and code
> a=75
> s=8.50668
> n=12
> error=qnorm(0.975)*s/sqrt(n)
> left=a-error
> right=a+error
In the above code, fifirstly, the error is assumed to be normally distributed.The variable ‘a‘ and ‘s‘ are the mean and standard deviations of the Diastolic blood pressure of the 12 females aged 40. ‘n‘ is the sample size, which is 12.‘left‘ refers to the lower confifidence limit and ‘right‘ refers to the upper confifidence limit. The default of 95% is taken into consideration. The result of the above code was: Lower limit = 70.18698 and Upper limit
= 79.81302, which implies that the Diastolic Blood pressure is within the range
of 70.18698 to 79.81302 at 95% confifidence interval.
References
[1] How to Interpret a Regression Line(2017)[online]. Accessed from https://www.dummies.com/education/math/statistics/how
to-interpret-a-regression-line/ on 04/10/2017
7[2] Calculating Confifidence Intervals-R Tutorial (2017)[online]. Accessed
from https://www.cyclismo.org/tutorial/R/confifidence.html on 04/10/2017
[3] Confifidence Interval for Linear Regression—R Tutorial(2017)[online]. Ac
cessed from https://www.r-tutor.com/elementary-statistics/simple-linear-regression/confifidence
interval-linear-regression on 04/10/2017
[4] Data Frame—R Tutorial(2017)[online]. Accessed from https://www.r
tutor.com/r-introduction/data-frame on 04/10/2017
[5] How to Calculate Confifidence Intervals of Correlations with R—R-Bloggers (2017)[online].
Accessed from https://www.r-bloggers.com/how-to-calculate-confifidence-intervals
of-correlations-with-r/ on 04/10/2017
[6] Histogram in R from a csv fifile with four columns- StackOverflflow(2017)[online].
Accessed from https://stackoverflflow.com/questions/46569340/histogram-in-r-from
a-csv-fifile-with-four-columns on 04/10/2017
[7] Linear Regression in R(2017)[online]. Accessed from https://r-statistics.co/Linear
Regression.html on 04/10/2017
[8] Quick R:Correlations(2017)[online]. Accessed from https://www.statmethods.net/stats/frequencies.html
on 04/10/2017
[9] Quick R :Bar Plots(2017)[online]. Accessed from https://www.statmethods.net/graphs/bar.html
on 04/10/2017
[10] Quick R; Box Plots(2017)[online]. Accessed from www.statmethods.net/graphs/boxplot.html
on 04/10/2017
[11] Data Visualization in R(2017)[online]. Accessed from https://www.datacamp.com/courses/data
visualization-in-r on 04/10/2017.
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download