This paper discusses the concepts of statistics and data analysis. It entails the understanding of statistics terms, various tools used in analysis of data and general statistics. Statistics is defined as a methodology whereby mathematicians and statisticians use for collecting, analyzing, interpretation, and making inferences about a sample of data or from information (Aberson, 2010). Therefore, from the above definition, it is clear that statistics is more than tabulation of the numbers and graphical presentation of information. In detail, statistical methods are used in coming up with:
In conclusion, statistics provides the methodology for,
Population and sample are basic concepts used in statistics. Population is characterized as a set of all individuals, subjects, or objects that an investigator is interested on during the study. Sample is defined as a set of individuals from the population that will be involved in a study (Agarwal).
Descriptive and inferential statistics are the major types of statistics. Descriptive statistics is a branch of statistics that is devoted summarizing and description of data while inferential statistics is a branch of statistics that is concerned with making of inference about population (Fraser, 2012). In general, descriptive statistics consists of methods used in organization and summarizing of information while inferential statistics consists of methods used in drawing of conclusions and measuring reliability of the conclusions about the population under study (Brase, 2013). Descriptive statistics consists of measures of central tendencies that comprise of mean, median, mode, range, minimum and maximum values, variance, and standard deviation. Descriptive statistics also comprises of construction of tables, charts, and graphs. Inferential statistics consists of methods such as point estimation, hypothesis testing, and interval estimation where all are based on the probability theory (Friedman, 2010).
Features of the population that are under investigation are summarized as numerical parameters. Therefore, the research problem becomes as an investigation of the values of the parameters (Givens, 2013). The population parameters are usually unknown and the sample statistics are used in making inference about the parameters. In general, a statistic is used to make an inference about an unknown parameter (Daniel, 2010).
The main objective of statistics is to understand what the data contains. Below are the steps to be followed in any data analysis:
A variable is defined as any measurable characteristic that varies from individual members of population. The main types of variables in statistics are quantitative and qualitative variables. Quantitative variables include height, weight, length, and width. Quantitative variables may be classified as continuous or discrete variables. Qualitative variables include eye color, marital status, sex, and hair color. Qualitative variables may be classified as either nominal or ordinal variables (Field, 2014).
The data used in this paper is from an experimental study that intended to investigate the relationship between age, gender, type of chest pain, amount of blood sugar and the class of the subject whether sick or healthy (Knopov, 2012). The data is obtained from a web resource: https://mercury.webster.edu/aleshunas/Data%20Sets/Supplemental%20Excel%20Data%20Sets.htm
The dataset comprises of 100 subjects with the following variables, age, gender, chest type pain, blood pressure, whether the fasting blood sugar is less than 120 and the class of a patient. Age, and blood sugar are quantitative variables while gender, chest pain type, and the class of the subject are qualitative variables.
Table 1
age |
blood pressure |
|
Valid |
100 |
100 |
Missing |
0 |
0 |
Table 1 above indicates the sample size of the study undertaken. The results in the table above indicates there were 100 subjects in the study.
Table 2
Descriptive Statistics |
|||||||||
N |
Range |
Minimum |
Maximum |
Mean |
Std. Deviation |
Variance |
Kurtosis |
||
Statistic |
Statistic |
Statistic |
Statistic |
Statistic |
Statistic |
Statistic |
Statistic |
Std. Error |
|
age |
100 |
34 |
37 |
71 |
54.76 |
8.316 |
69.154 |
-.882 |
.478 |
blood pressure |
100 |
76 |
104 |
180 |
132.37 |
15.048 |
226.437 |
.098 |
.478 |
Valid N (list wise) |
100 |
Table 2 above represents the descriptive statistics of the quantitative variables age and blood pressure. The subject with the lowest age was 37 years while the oldest was 71 years old. The mean age of the study 54.76 years which is approximately 55 years. The standard deviation of age was 8.316. The subject with highest blood pressure had 180 while the patient with the lowest had a blood pressure of 104. The standard deviation of blood pressure was 15.048.
Table 3
Statistics |
|||
age |
blood pressure |
||
N |
Valid |
100 |
100 |
Missing |
0 |
0 |
|
Mean |
54.76 |
132.37 |
|
Median |
56.00 |
130.00 |
|
Mode |
44a |
130 |
|
Std. Deviation |
8.316 |
15.048 |
|
Variance |
69.154 |
226.437 |
|
Range |
34 |
76 |
|
a. Multiple modes exist. The smallest value is shown |
Table 3 above shows the measures of central tendencies of the quantitative variables. Age had a median of 56, mode of 44, and a range of 34. Therefore, majority of the subjects under study were aged 44 years. Blood pressure had a median of 130 and a mode of 130. Therefore, majority of subjects recorded a blood pressure of 130.
Fig 1 and fig 2 below represents histograms of age and blood pressure respectively. From the histograms below we can conclude that the data is normally distributed as neither of the two variables is skewed. Blood pressure has two values as outliers while age has none.
Fig 1 Age histogram
Fig 2 Blood Pressure histogram
Table 4
sex |
|||||
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
||
Valid |
Female |
29 |
29.0 |
29.0 |
29.0 |
Male |
71 |
71.0 |
71.0 |
100.0 |
|
Total |
100 |
100.0 |
100.0 |
Table 4 above shows the distribution of subjects by gender. This is illustrated in fig 2 below
Fig 2
Fig 2 indicates male were majority 71% and female 29%.
The inferential statistics discussed under this study include Chi-Square test of independence, linear regression, and test of means. Before embarking on inferential statistics, it is wise to test whether the data follows normality in order to ascertain whether to use parametric or non-parametric techniques in analysis (Lee).
The hypothesis for testing normality is as follows
H0: The data follows normality
H1: The data does not follow normality
Table 5
Tests of Normality |
||||||
Kolmogorov-Smirnova |
Shapiro-Wilk |
|||||
Statistic |
df |
Sig. |
Statistic |
df |
Sig. |
|
blood pressure |
.113 |
100 |
.423 |
.971 |
100 |
.526 |
a. Lilliefors Significance Correction |
Table 5 above indicates the results for testing normality of data. The Shapiro-Wilks p-value from the table above (0.526) is greater than the level of significance at 0.05. Therefore, we fail to reject the null hypothesis and conclude that the data follows normality. The reason behind using Shapiro-Wilk test instead of Kolmogorov-Smirnov is the sample size, since the sample size is greater than 25 we use Shapiro-Wilk test (Machin, 2010).
Chi-Square Test
Chi-Square test is a statistical test used to test the association between two variables (Paulk, 2012). The hypothesis used in testing for association is as follows:
H0: Gender and type of chest pain are independent/ there is no significant association between gender and type of chest pain
H1: Gender and type of chest pain are not independent/ there is a significant association between gender and type of chest pain.
Table 6
sex * Fasting blood sugar <120 Crosstabulation |
|||||
Fasting blood sugar <120 |
Total |
||||
False |
True |
||||
sex |
Female |
Count |
26 |
3 |
29 |
% within sex |
89.7% |
10.3% |
100.0% |
||
% within Fasting blood sugar <120 |
29.9% |
23.1% |
29.0% |
||
% of Total |
26.0% |
3.0% |
29.0% |
||
Male |
Count |
61 |
10 |
71 |
|
% within sex |
85.9% |
14.1% |
100.0% |
||
% within Fasting blood sugar <120 |
70.1% |
76.9% |
71.0% |
||
% of Total |
61.0% |
10.0% |
71.0% |
||
Total |
Count |
87 |
13 |
100 |
|
% within sex |
87.0% |
13.0% |
100.0% |
||
% within Fasting blood sugar <120 |
100.0% |
100.0% |
100.0% |
||
% of Total |
87.0% |
13.0% |
100.0% |
Table 6 above indicates that both males and the females had fasting blood sugar that is more than 120.
Chi-Square Test Table
Table 7
Chi-Square Tests |
|||||
Value |
df |
Asymp. Sig. (2-sided) |
Exact Sig. (2-sided) |
Exact Sig. (1-sided) |
|
Pearson Chi-Square |
.255a |
1 |
.614 |
||
Continuity Correctionb |
.031 |
1 |
.860 |
||
Likelihood Ratio |
.265 |
1 |
.607 |
||
Fisher’s Exact Test |
.751 |
.444 |
|||
N of Valid Cases |
100 |
||||
a. 1 cells (25.0%) have expected count less than 5. The minimum expected count is 3.77. |
|||||
b. Computed only for a 2×2 table |
Table 7 represents the various types of tests under chi-square, our interest from the table above is the “Pearson Chi-Square”. From the above results, Pearson Chi-Square value is 0.255 with a p-value of 0.614, since the p-value is greater than the level of significance at 0.05, we fail to reject the null hypothesis and conclude that there is statistically significant association between Gender and whether the fasting blood sugar is less than 120 (Pons).
Table 8
Symmetric Measures |
|||
Value |
Approx. Sig. |
||
Nominal by Nominal |
Phi |
.050 |
.614 |
Cramer’s V |
.050 |
.614 |
|
N of Valid Cases |
100 |
Both Cramer’s V and Phi tests the strength of association between variables. In table 8, the strength of association between the two variables (0.050) is very weak.
Table 9
blood pressure * sex Crosstabulation |
|||||
sex |
Total |
||||
Female |
Male |
||||
blood pressure |
104 |
Count |
0 |
1 |
1 |
% within blood pressure |
.0 |
1.0 |
1.0 |
||
% within sex |
.0 |
.0 |
.0 |
||
% of Total |
.0 |
.0 |
.0 |
||
105 |
Count |
1 |
0 |
1 |
|
% within blood pressure |
1.0 |
.0 |
1.0 |
||
% within sex |
.0 |
.0 |
.0 |
||
% of Total |
.0 |
.0 |
.0 |
||
108 |
Count |
1 |
0 |
1 |
|
% within blood pressure |
1.0 |
.0 |
1.0 |
||
% within sex |
.0 |
.0 |
.0 |
||
% of Total |
.0 |
.0 |
.0 |
||
110 |
Count |
0 |
7 |
7 |
|
% within blood pressure |
.0 |
1.0 |
1.0 |
||
% within sex |
.0 |
.1 |
.1 |
||
% of Total |
.0 |
.1 |
.1 |
||
112 |
Count |
0 |
2 |
2 |
|
% within blood pressure |
.0 |
1.0 |
1.0 |
||
% within sex |
.0 |
.0 |
.0 |
||
% of Total |
.0 |
.0 |
.0 |
||
115 |
Count |
0 |
1 |
1 |
|
% within blood pressure |
.0 |
1.0 |
1.0 |
||
% within sex |
.0 |
.0 |
.0 |
||
% of Total |
.0 |
.0 |
.0 |
||
Total |
Count |
29 |
71 |
100 |
|
% within blood pressure |
.3 |
.7 |
1.0 |
||
% within sex |
1.0 |
1.0 |
1.0 |
||
% of Total |
.3 |
.7 |
1.0 |
Table 9 below indicates there is difference in blood pressure between male and females.
Chi-Square Tests Table
The Chi-Square Test below indicates that there is no statistically significant association between gender and blood sugar.
Table 9
Chi-Square Tests |
|||
Value |
df |
Asymp. Sig. (2-sided) |
|
Pearson Chi-Square |
27.510a |
25 |
.331 |
Likelihood Ratio |
33.753 |
25 |
.113 |
N of Valid Cases |
100 |
||
a. 48 cells (92.3%) have expected count less than 5. The minimum expected count is .29. |
Table 10
Symmetric Measures |
|||
Value |
Approx. Sig. |
||
Nominal by Nominal |
Phi |
.525 |
.331 |
Cramer’s V |
.525 |
.331 |
|
N of Valid Cases |
100 |
Table 10 tends to differ on the association between gender and blood pressure. The Cramer’s V value indicates a strong association between the two variables (Vogt, 2012).
Conclusion
The study above reveals interesting facts about the relationship between gender, age, blood pressure levels, and classification of subjects as either sick or healthy. However, more studies should be conducted in order to come up with substantial evidence above.
References
Aberson. (2010). Applied power analysis for the behavioral sciences. New York: Routledge Academic.
Agarwal, B. L. (n.d.). Basic Statistics.
Brase, C. H. (2013). Understanding basic statistics. Australia: Cole Cengage Learning.
Daniel, W. W. (2010). Biostatistics. Chichseter: John Wiley.
Field, A. P. (2014). Discovering statistics using R. London: Sage.
Fraser. (2012). Business Statistics for competitive advantage with Excel 2010. New York: Springer.
Friedman, L. M. (2010). Fundamentals of clinical trials. New York: Springer.
Givens, G. H. (2013). Computational statistics. Hoboken: Wiley.
Knopov, P. S. (2012). Regression Analysis Under A Priori Parameter Restrictions. New York: Springer-Verlag.
Lee, E. T. (n.d.). Statistical methods for survival data analysis.
Machin, D. A. (2010). Randomized clinical trials. West Sussex: Wiley-Blackwell.
Paulk, A. (2012). Understanding regression analysis. New Delhi: Orange Apple.
Pons, O. (n.d.). Inequalities in analysis and probability.
Vogt, W. P. (2012). Correlation and regression analysis. Los Angeles: Sage.
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download