Analysis Of Prisoners, Car Insurance Claims And Female Pima People

Question 1 – Analysis of Prisoners in Australia

Though there is a slight difffference in the age distribution between the two gen ders in the prisoners, the distribution is not very signifificant. This is illustrated by the bar plots. To plot the bar plots, we make use of the following R Code:

>prisoners = cbind(Males = c(920, 4684, 5624, 5407, 4609, 3756, 2408, 1541, 975, 595, 681),

Females= c(65, 316, 475, 483, 404, 349, 234, 135, 72, 32, 27))

>rownames(prisoners) = c(“19 years and under”, “20 to 24 years”, “25 to 29 years”,”30 to 34 years”,

“35 to 39 years”, “40 to 44 years”,”45 to 49 years”, “50 to 54 years”, “55 to 59 years”,

“60 to 64 years”, “65 years and over”)

>barplot(prisoners, main=”Bar plot of Gender vs age of Prisoners”,col=colors(),

xlab=”gender of prisoners”,legend=rownames(prisoners))

The resulting bar plot is illustrated The rate(ratio) of car insurance claims to policies in Stockholm is

23174

32614

0.0711, whereas, the rate of car insurance claims to policies in rural areas is

31913

846957 = 0.0377. Thus, from the rates, it is clear that there are more claims in

Stockholm as compared to rural areas.Yes, the Body Mass Index(BMI) diffffers for those PIMA females who have been pregnant with those who have never been pregnant. This is seen in the box-plots for the Body Mass Index with respect to their pregnancy status. The code in R to produce the box plots are as follows:

>Sheet=read.csv(“/Downloads/new_file.csv”,sep=”,”,header = T)

>boxplot(Sheet$BMI~Sheet$EVER.PREGNANT, main=”Difference in BMI with State of Pregnancy”,

xlab=”State of Pregnancy”, ylab=”BMI”)

1plot of Q1(1).png plot of Q1(1).png

Figure 1: Bar plot of data

In the above code, the‘new fifile.csv‘ is the modifified PIMA.csv fifile, in which the fifirst column is split into four columns of AGE, DIASTOLIC, BMI and EVER.PREGNANT for ease of analysis.

The box plots produced The plot clearly shows that females who have never been pregnant to have higher BMI than those who have been pregnant.The scatter plot, when plotted clearly shows us a non-zero correlation between the Diastolic Blood pressure and the age of the PIMA people. We calculate the coeffiffifficient of correlation by Pearson’s method using R’s bulit in ‘cor‘ function. We also calculate the correlation coeffiffifficient using Spearman Rank correlation.The confifidence interval at 95% level is calculated by permuting or bootstrapping the data. Alternatively, the 95% confifidence interval is also calculated using the function ‘CIr‘ from the package ”psychometric”. The exact R code is as follows:

(i)For plotting the scatter plot and histograms:

>Sheet=read.csv(“/Downloads/new_file.csv”,sep=”,”,header = T)

>plot(Sheet$AGE, Sheet$DIASTOLIC)

>par(mfrow=c(2,1))

>hist(Sheet$AGE)

>hist(Sheet$DIASTOLIC)

Question 2 – Analysis of Car Insurance Claims in Sweden

The scatter plot is The code for calculating the coeffiffifficient of correlation is:

>Sheet=read.csv(“/Downloads/new_file.csv”,sep=”,”,header = T)

>cor(Sheet$AGE,Sheet$DIASTOLIC, method=”pearson”)

>cor(Sheet$AGE,Sheet$DIASTOLIC,method=”spearman”)

The coeffiffifficient of correlation was found out to be 0.3259467(pearson’s method)

and 0.3676724(spearman’s method).

The code for calculating the confifidence interval at 95% is given below. In addition, we also have the code for the evidence that correlation is not random. The correlation assuming the null hypothesis-that the true correlation is zero,

is calculated for the bootstrapped data and compared. The observed correlation coeffiffifficient is greater than that obtained when the data set is bootstrapped. Moreover, the observed correlation coeffiffifficient lies within the confifidence interval limits of the correlation coeffiffifficient by bootstrapping at a signifificance level of 95%. Hence, the null hypothesis can be rejected and thus, there is a correlation between age and diastolic blood pressure.

>Sheet=read.csv(“/home/prajnan/Downloads/new_file.csv”,sep=”,”,header = T)

>obs.cor = cor(Sheet$DIASTOLIC,Sheet$AGE)

>x= replicate(1000, {

post.perm = sample(Sheet$AGE)

cor(Sheet$DIASTOLIC, post.perm)

})

>hist(x,col=colors(),xlab=”correlation coefficient assuming null hypo”)

>x= replicate(1000, {

samp = sample(1:n, replace=TRUE, size=n)

cor(Sheet$DIASTOLIC[samp], Sheet$AGE[samp])

})

>quantile(x,c(0.050,0.950))

>hist(x,col=colors(),xlab=”correlation coefficient of the simulations”)

4The confifidence interval was obtained to be (0.2692707, 0.3780436).The histogram of correlation coeffiffifficients assuming null hypotheis is shown coeffff assuming null.png coeffff assuming null.png The histogram of simulated correlation coeffiffifficient showing confifidence inter val is as follows:

confifidence interval.png confifidence interval.png The confifidence at 95% level was also calculated using the package ”psycho

metric” as follows:

>library(psychometric)

>CIr(r=0.3259467, n=732, level=0.95)

In the above code, ‘r‘ corresponds to the coeffiffifficient of correlation, ‘n‘ to the

sample size and ‘level‘ to the confifidence level.

The output of the code was the interval having lower limit = 0.2596146 and upper limit = 0.3892176. We note the difffference in the two methods clearly.The linear regression for the data between Age and Diastolic Blood pressure is fifitted using the data provided in the ‘csv‘ fifiles and the R functions ‘lm‘ and ‘abline‘. Later the null hypothesis that the slope of the regression line is zero is tested by a similar method of bootstrapping applied to the correlation coeffiffifficient before. Since the observed slope lies within the confifidence interval limits for the simulated slope at a signifificance level of 95%, the null hypothesis is rejected.The exact codes are as follows:

>lm(Sheet$AGE~Sheet$DIASTOLIC)

>plot(Sheet$AGE,SHeet$DIASTOLIC)

>abline(lm(Sheet$AGE~Sheet$DIASTOLIC)

>slope = coef(lm(Sheet$DIASTOLIC~Sheet$AGE))[2]

>x= replicate(1000, {

height.perm = sample(Sheet$AGE)

coef(lm(Sheet$DIASTOLIC~height.perm))[2]

>hist(x,col=colors(),xlab=”slope assuming null hypo”)

The resulting output(regression line) The histogram of the slope assuming null hypothesis is shown below 6ssuming null hypo.png ssuming null hypo.pngThe intercept and the slope of the regression was shown to be: Intercept=10.9172

and slope(Sheet$DIASTOLIC)=0.3095 The slope implies that on an average,the Diastolic blood pressure increases by 0.3095 per year increase in the age of PIMA Female.

To calculate the 95% confifidence interval for the mean diastolic blood pressure of 40 year old females, we fifirst fifiltered the ‘csv‘ fifile to include only those rows that correspond to 40 in the AGE column. Then, we used the AVERAGE() and STDEV() functions to calculate the Mean and Standard Deviation of the Diastolic Blood pressure of the 12 40 year old females. Lastly, we used the R’s built in function ‘qnorm‘ to calculate the 95% confifidence interval by using the following formula and code

> a=75

> s=8.50668

> n=12

> error=qnorm(0.975)*s/sqrt(n)

> left=a-error

> right=a+error

In the above code, fifirstly, the error is assumed to be normally distributed.The variable ‘a‘ and ‘s‘ are the mean and standard deviations of the Diastolic blood pressure of the 12 females aged 40. ‘n‘ is the sample size, which is 12.‘left‘ refers to the lower confifidence limit and ‘right‘ refers to the upper confifidence limit. The default of 95% is taken into consideration. The result of the above code was: Lower limit = 70.18698 and Upper limit

= 79.81302, which implies that the Diastolic Blood pressure is within the range

of 70.18698 to 79.81302 at 95% confifidence interval.

References

[1] How to Interpret a Regression Line(2017)[online]. Accessed from https://www.dummies.com/education/math/statistics/how

to-interpret-a-regression-line/ on 04/10/2017

7[2] Calculating Confifidence Intervals-R Tutorial (2017)[online]. Accessed

from https://www.cyclismo.org/tutorial/R/confifidence.html on 04/10/2017

[3] Confifidence Interval for Linear Regression—R Tutorial(2017)[online]. Ac

cessed from https://www.r-tutor.com/elementary-statistics/simple-linear-regression/confifidence

interval-linear-regression on 04/10/2017

[4] Data Frame—R Tutorial(2017)[online]. Accessed from https://www.r

tutor.com/r-introduction/data-frame on 04/10/2017

[5] How to Calculate Confifidence Intervals of Correlations with R—R-Bloggers (2017)[online].

Accessed from https://www.r-bloggers.com/how-to-calculate-confifidence-intervals

of-correlations-with-r/ on 04/10/2017

[6] Histogram in R from a csv fifile with four columns- StackOverflflow(2017)[online].

Accessed from https://stackoverflflow.com/questions/46569340/histogram-in-r-from

a-csv-fifile-with-four-columns on 04/10/2017

[7] Linear Regression in R(2017)[online]. Accessed from https://r-statistics.co/Linear

Regression.html on 04/10/2017

[8] Quick R:Correlations(2017)[online]. Accessed from https://www.statmethods.net/stats/frequencies.html

on 04/10/2017

[9] Quick R :Bar Plots(2017)[online]. Accessed from https://www.statmethods.net/graphs/bar.html

on 04/10/2017

[10] Quick R; Box Plots(2017)[online]. Accessed from www.statmethods.net/graphs/boxplot.html

on 04/10/2017

[11] Data Visualization in R(2017)[online]. Accessed from https://www.datacamp.com/courses/data

visualization-in-r on 04/10/2017.

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Analysis Of Prisoners, Car Insurance Claims And Female Pima People ”

Get high-quality paper

NEW! AI matching with writer