Statistics And Math: Exploring Measures, Variables, And Distributions

Summary and Explanation of Descriptive Statistics

Discuss about the Statistics and Math.

An appropriate height and weight is essential in order to have an appropriate BMI. Age is also an important factor that affects the weight of a person. Both male and female wants to have a perfect weight in according to their height and age. They go to gym for this purpose and they do regular exercises outside the gym. It is seen that some of the customers have different viewpoint regarding the equipments of gym.

In this assignment, an idea would be given about the various measures of statistics. These measures of statistics would be explained in this assignment and they would be used in the data of the gym which includes height, weight, BMI, age, and the responses regarding the visit to the gym. Graphs and charts would be provided in this assignment in order to give a clear idea about the various measures of statistics in context of these data.

The values of the variable that are obtained by counting is called discrete random variable. When the random variable takes any value in the range of two specific values, the variable is called continuous variable. A variable that can take infinitely many uncountable values is called continuous random variable (Vogt and Barta 2013). The discrete variables in this research are “Do you do regular exercise outside of the gym” and “Is it important to have a high variety of equipment”. The continuous variables in this research “height” and “weight”.

Mean and median had been used in order to study the data set. The mean and median was calculated for the two continuous variables; height and weight. Mean is defined as the average of the values of the variables in the data set. It is calculated by dividing the sum of all the values of the variable by the number of values of the variable (Kock 2013). There are three types of means that are used in statistics. They are arithmetic mean, geometric mean and harmonic mean. Generally, arithmetic mean is used in the descriptive statistics. The height and weight of 100 samples would undergo arithmetic mean as they are continuous variables and it would provide the average height and weight of the staples in the survey. The mean of the continuous random variable was found to be 170.55 units while the mean of another continuous random variable, weight, was found to be 76.36 units.

Discrete and Continuous Variable

Median is defined as the middle value of the data set when the data are arranged in either ascending or descending order. Median is the second quartile of the data set and it separates the higher value of the data set from the lower values (Vogt and Barta 2013). Median is superior to mean in the viewpoint that median is not much skewed and it is not much affected by higher or lower values. The median of the variable “height” was found to be 170 units while the median of the variable “weight” was found to be 77 units.

Figure 1: scatter plot of the values of “height”

(Source: created by author)

Figure 2: scatter plot of the values of “weight”

(Source: created by author)

The two measures of variations that are standard deviation and range. Standard deviation is defined as the deviation of the values of a variable from the mean of the variable in the data set (Allen 2013). A lower value of standard deviation indicates that the values of the variable lies closer to the mean while higher value indicates that the values of the variable are widely spread (Statistics 2013). The standard deviation of the variable, “height” was found to be 12.81. It shows that the values of heights of the chosen sample are moderately spread over the data set. The standard deviation of the variable, “weight” was found to be 15.22. This shows that the values are moderately spread over the data set.

Another measure of variation is range. It indicates the difference between the maximum and minimum values of the variable. This provides the idea about the highest and lowest values of the variables. The range of variable, “height” was found to be 49 while the range of the variable “weight” was found to be 63 (Lake 2013). This shows that the highest and lowest value of “height” in the data set is 194 units and 145 units respectively, while the highest and lowest value of “weight” was found to be 109 units and 46 units respectively.

On considering discrete random variables, “Do you do regular exercise outside of the gym” and “Is it important to have a high variety of equipment”, the proportion of samples who do not do regular exercise outside the gym was found to be 0.54 and the mean of the variable (np) was found to be 54 while the variance (npq) was found to be 24.84 (Hong 2013). The proportion of the sample who said that it is not important to have a high variety of equipment was 0.3, mean (np) was found to be 0.30 while the variance (npq) was found to be 21. It shows that both the discrete variables are highly deviated from the mean value of the variable.

Descriptive Statistics

“Height” and “weight” are the two continuous variables that are considered for the research. Both the continuous variables are found to follow normal distribution. A distribution is said to be a normal distribution when the mean, median and mode of the continuous variable almost coincide with each other. Under the central limit theorem, it is seen that the averages of the random variables which are independently drawn, converge to normal distribution. Normal distribution is generally used in sampling distribution due to central limit theorem. It is seen that the mean, median and mode of the continuous variables, “height” and “weight” almost coincide with each other. The skewness of both the variables is nearly equal to zero and the kurtosis has a slightly negative value. Thus, “height” and “weight” is considered to follow normal distribution.

Hypothesis test was performed between two continuous random variables, “height” and “weight” at 95% level of significance. The hypothesis test was conducted to check whether height is independent of weight or not. Two tailed t-test was conducted at 5% level of significance for this purpose (Kruschke 2013). The null hypothesis and alternative hypothesis framed for this test is as follows:

H₀: height and weight are independent of each other

H₁: height and weight are dependent of each other

On performing two tailed t-test at 5% level of significance, the p value of the test was found to be 6.6459E-110 (de Winter 2013). This value is found to be less than 0.05 and it shows that the hypothesis test is significant. The null hypothesis is rejected in this case and it can be interpreted that the height and weight are dependent on each other.

Regression analysis was performed considering “BMI” as the dependent variable and “height” and “weight” as the independent variables. The correlation coefficient between the dependent variable and independent variables was found to be 0.98579 (Draper and Smith 2014). This shows that the dependency of the dependent variable on the independent variables is high. The independent variables influence “BMI” to a larger extent. The regression equation found in the case is as follows:

BMI = 51.37692 – 0.30174 * height + 0.342475 * weight

The regression equation shows that with the change in one unit of height, the BMI would change by 0.30174 units while with change in one unit of weight, the change in BMI would influence by 0.342475 (Montgomery et al. 2015). On absence of the values of height and weight, it is seen that the BMI of the chosen sample would be 51.37692.

Random Variable and its Probability Distribution

Figure 3: line fit plot of height

(Source: created by author)

The line fit plot is plotted with “BMI” on y-axis and “height” on x-axis. It is seen that the actual and predicted values of the variable “height” lies near to each other (Kleinbaum et al. 2013). This shows that the model is a good fit model and the regression model can be used for further extrapolation or interpolation.

Figure 4: line fit plot of weight

(Source: created by author)

The line fit plot is plotted with “BMI” on y-axis and “weight” on x-axis. It is seen that the actual and predicted values of the variable “weight” lies near to each other. This shows that the model is a good fit model and the regression model can be used for further extrapolation or interpolation.

Conclusion

On analyzing the given data, it can be concluded that “height” and “weight” are the two continuous variables considered in the assignment. The two discrete variables in the assignment are “Do you do regular exercise outside of the gym” and “Is it important to have a high variety of equipment”. The average value of “height” and “weight” was found to be 170.55 units and 76.36 units respectively. The median value of “height” and “weight” was found to be 170 units and 77 units respectively. The standard deviation of “height” and “weight” was found to be 12.81 units and 15.22 units respectively while the range was found to be 49 units and 63 units respectively. The probability distribution of the discrete random variables was found to be binomial distribution. Hypothesis test was conducted between the two variables “height” and “weight”. The result of the hypothesis test was found that the height and weight are dependent on each other. Regression analysis was performed considering “BMI” as the dependent variable and “height” and “weight” as the independence variables. The regression equation was found to be BMI = 51.37692 – 0.30174 * height + 0.342475 * weight. It was also seen that there exists a strong correlation between the dependent variable and independent variables.

It is recommended that the gym instructors must influence the customers to do regular exercise both in gym and outside gym. It would help them to stay fit and healthy. It is also recommended that the customers must be given an idea about the ideal BMI and they should be instructed and influenced to exercise accordingly. Thus, the customers in gym would have a perfect BMI and they would stay fit after exercising.

References

Allen, D., 2013. Measures of Central Tendency.

de Winter, J.C., 2013. Using the Student’s t-test with extremely small sample sizes. Practical Assessment, Research & Evaluation, 18(10), pp.1-12.

Draper, N.R. and Smith, H., 2014. Applied regression analysis. John Wiley & Sons.

Hong, Y., 2013. On computing the distribution function for the Poisson binomial distribution. Computational Statistics & Data Analysis, 59, pp.41-51.

Kleinbaum, D.G., Kupper, L.L., Nizam, A. and Rosenberg, E.S., 2013. Applied regression analysis and other multivariable methods. Nelson Education.

Kock, N., 2013. Using WarpPLS in E-Collaboration Studies: Descriptive Statistics, Settings. Interdisciplinary Applications of Electronic Collaboration Approaches and Technologies, 62.

Kruschke, J.K., 2013. Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General, 142(2), p.573.

Lake, L., 2013. Basic Descriptive Statistics: Measures of Central Tendency.

Montgomery, D.C., Peck, E.A. and Vining, G.G., 2015. Introduction to linear regression analysis. John Wiley & Sons.

Statistics, A.E.R.D., 2013. Measures of Central Tendency.

Vogt, A. and Barta, J., 2013. The making of tests for index numbers: Mathematical methods of descriptive statistics. Springer Science & Business Media.

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Statistics And Math: Exploring Measures, Variables, And Distributions ”

Get high-quality paper

NEW! AI matching with writer