The data could be stated as information based on objects, business and employees. The business house generated data could provide insights into the functioning of the business. Based on data, business managers can take decisions associated with business. In the current report, we as a team of data analysis, analyse data of the Food Dairy industry.
The data of food industry in poultry (eggs) is a survey of types of chicken. Two parameters of the types of chicken are such as health check and weight of sample eggs. The data of food industry in dairy (milk) has mainly four parameters. These are cow type, cow weight in kg, weekly average milk fat in percentage and “PrePasturization pathogen”. In first table, eggs for public consumption are classified by weight. A sample of weights in gram of the 10 most recent eggs plus other data from a sample of the chicken of manager is given below. Another sample of the weights in kg of the milk fat percentage for a small dairy herd is given below. The milk of each cow is also checked prior to pasteurisation and indication provided if any specific pathogen is found.
In the food dairy sampled data, the variables regarding egg are Chicken Id, Check type and Health Check and Sample egg weights in 10 samples. Among them Check Id and Health Check are not used in data analysis. Only Chicken type (nominal) and Sample egg weights (numerical) are used for calculation. The variables regarding milk are Cow Id, Cow type, Cow weight (Kg), Weekly average milk fat in percentage and PrePasturisation pathogen. Here, Cow Id and PrePasteurisation pathogen are not used for data analysis. Only Cow type (nominal), Cow weight in kg (numeric) and weekly average milk fat in percentage (numeric) are used for calculation and data analysis.
The report elaborates the method to understand statistical processes and analysis for using the tools effectively. Finally, inferences were drawn and decisions were granted for business purpose.
1. Discrete Probability Distribution:
We would like to investigate that how discrete probability distribution is used in statistics. In probability and statistics, discrete probability distribution is termed as a probability distribution of a function that has countable values and specifies a probability mass function (f(x)). Hence, the distribution of a random variable X is discrete, and X is called a discrete random variable if and only if-
Here, u belongs to X.
A discrete random variable can only consider finite or countably finite number of values. The mostly used discrete probability distributions of random variables are Binomial distribution, Poisson distribution, Bernoulli distribution and Geometric distribution. Hence, it can be said that a discrete distribution defines the probability of occurrences of individual values of a discrete random variable. An example of discrete probability distribution is the distribution of errors found in each page of a book.
In the current analysis, the weights of eggs of chicken sampled from various types of chicken are discrete value variables. Therefore, their distribution is also discrete probability distribution. The discrete values also found in milk data also. The weights of different types of cows and their average level of milk fat indicate discrete probability distribution. For finding a trend of weights of eggs of different chickens, we can use discrete distribution. It would help us to optimize the better performing chickens. Not only is that, for estimating cow weight and their quality of milk (according to milk fat), discrete distribution helpful. It would help us to pet better performing cows in most numbers. The business could be more fruitful and profitable with the support of discrete random sample analysis.
2. Inferential Statistics:-
Inferential statistics is very much crucial for decision making of analysed data in business purpose. Inferential statistics is the method of drawing inference about a population from the sample. The ideas of inferential statistics could also be used for making sample pre-assumptions. Testing of hypothesis is an integral part of inferential statistics. It is often seen that a bivariate or multivariate analysis is beneficial for inferential statistics. Inferential statistics can also be used to compare the distribution of the sampled data and to estimate the probability of population. It could also be used for testing the rule of CLT (Lomax and Hahs-Vaughn 2012). The following steps are necessary to meet the aim of inferential statistics:
3.1. Normal Distribution:
The distribution of data is a crucial aspect for data analysis. The normality assumption of data supports to comment that the data presents the phenomenon occurring in nature. Normal distribution is also known as Gaussian distribution that looks like a bell shaped curve graphically. The probability density function of a Normal distribution is-
(x is finite)
Here, µ is the mean and σ2 is the variance of normal distribution.
The central limit theorem (CLT) justifies the normality of any parametric distribution. According to the CLT, if a large sample is taken from a population, then the mean of sampled data will be equal to the mean of the population data. Additionally, the values of mean, median and mode are equal for a normal distribution. The normal distribution is symmetric in nature. When the mean is zero, then there exists two equal halves of the bell-shaped curve for a normal shaped data. The area (total mass volume) under the curve is equal to 1 and is defined by the standard deviation of the data (Rosenblatt 1956).
The distribution of the data in a sample is tested for normality utilising the rules of normal distribution. The sampling distribution investigates the shape of data. The weekly average milk fat is analysed for sampling distribution. We can check the normality of sampled data by central limit theorem. In the below, the distribution of residual points are tested by normal probability plot and one sample t-test.
3.2. Sampling Distribution and Normal Distribution:
The probability distribution of the statistic is called a sampling distribution. Any type of continuous random variable such as exponential distribution, gamma distribution of Weibull distribution could be transformed to normal distribution. Central limit theorem helps to convert any continuous distribution to standard normal distribution (mean = 0 and variance =1). Normal distribution is symmetric and easy to handle. When the probability density function (pdf) f(x) of any continuous distribution cannot be integrated for common ranges then full range can be integrated as-
The probability for X in range dx around x is for a distribution f(x) is denoted by f(x)dx. The probability should be the same if it is written in terms of another variable y=y(x) (Lange, Little and Taylor 1989). Hence,indicates area plot of Standard Normal distribution of weights of eggs of Black Minorcas chicken.The central limit theorem is applied to find the normalised distribution of weights of eggs of Black Minorcas Chicken.
3.3. Sampling Work by Continuous Random Variable:
We are interested to investigate how statistical data analysis could be devised by exact sampling work of continuous random variable for dairy (eggs and milk) purpose. A continuous random variable is a random variable that can take values measured on a continuous scale such as height and weight. Here, we assumed the continuous random variable as the average of weight of samples of eggs for Black Minorcas Chicken and transformed it to the standard normal distribution as the following table.
Power of the test is given by = β = (1-α) = (1-0.498918547) = 0.501081. It is the type II error. Type I error is the critical region of the normalisation (Lieberman and Cunningham 2009). Therefore, Type I error = α = 0.498918547.
4. Analysis:-
4.1. Descriptive Statistics:
4.1.1. Measure of central tendency:-
In statistics, the major measures of central tendencies are Mean, Median and Mode. These measures are used to indicate the location and concentration of the samples. We being a data analysis team are interested to check the performances of different chickens and hens according to the eggs and milk fat in terms of their qualitative mean, median and mode.
Mean (X-bar): Median (md): For discrete probability distributionMode:For discrete probability distribution the formula of mode is Our query is to know,
Which cow has more weekly average milk fat?
Which chicken provides more weighted eggs?
Which weight of egg has maximum frequency?
What is the median value of weights of the eggs?
Mean of weights of sampled eggs is least in case of Rhode Island Red chicken (39.3576 gm) and highest in case of Hybrid Commercial chicken (56.85825 gm). Median of the weights of sampled eggs are highest in case of Hybrid Commercial chicken (60.647 gm) and lowest in case of Light Sessex (35.902 gm).
The mean weight of the cows is 430.6942 kg. The mean weekly average milk fat in percentage of Jersey cow (4.1) is lesser than Kerry cow (4.978148148). The Jersey cows have greater weight than Kerry cows with respect to mean and median. The Jersey cows have lesser average fat of milk than Kerry cows with respect to mean and median.
Mode of weekly average milk fat for Jersey cow is 4.85%. Mode of weights of eggs in the first sample is 36.13 gm, third sample is 35.37, ninth sample of eggs is 34.55 gm and tenth sample of eggs is 33.26 gm.
Mode value of eggs of Black Minorcas is 36.8 gm, Buff Sussex is 37.02 gm, mode of the eggs of Rhode Island red is 32.85 gm and mode of Welsomer chicken eggs is 67.46 gm.
These shows Jersey cows and Rhode Island red chickens are not profitable for business purpose. Oppositely, Kerry cows and Hybrid Commercial chicken are profitable for business purpose.
4.1.2. Measures of Dispersion:-
The popular measures of dispersion are Range, Standard Deviation and Variance. Besides, Inter-quartile Range (IQR) and Mean deviation (about mean or median) are also standard measures of dispersion. Greater dispersion indicates greater variability of weights of eggs and weekly average milk fat and vice versa.
Where, Q3= third quartile and Q1 = first quartile of discrete dataset.
Our query is to find-
Which eggs and milk fat has maximum or minimum range?
What is the variability of sampled eggs and weekly milk fat?
What is the standard deviation of eggs and milk fat of cows?
The standard deviance of weights of sampled eggs is highest in case of Light Sessex chicken (17.89262 gm) and lowest in case of Rhode Island Red chicken (9.263249 gm). The range of the weekly average milk fat for Jersey cow is 1.71% and Kerry cow is 2.88%. Variance and standard deviation of average milk fat are high in case of Kerry cow. Variance of weekly average milk fat for Jersey cow is 0.222354545454553% and Kerry cow is 0.403707977207957%. Standard deviation of average milk fat for Jersey cow is 0.471544849886575% and Kerry milk fat is 0.635380183203692%. The standard deviation of weekly average milk fat (in %) is 0.713797233. Standard deviation of weight of cows is 53.83966833. Consecutively, the mean of weekly average milk fat (in %) is 4.5742.
The greater variability for Light Sessex chicken eggs and milk fat of Kerry cows is observed. One thing is to note that variability is good but much variability is not acceptable for the profit of business.
ANOVA Test for equality of means and Brief Discussion:-
The analysis of variance table (ANOVA) table is executed to test the equality of mean of weights of eggs of different types of chicken taken from 10 samples. Therefore, our null hypothesis is-
HA: µ1 = µ2 = µ3 = … = µ8
H0: the difference of means in at least one case is not equal to zero. That is (µi -µj) is not equal to 0 for at least two different values of i and j (Dixon and Frank 1950).
The p-value is 0.678723033 (>0.05). Therefore, we accept the null hypothesis at 95% confidence interval having F = 0.691395.
The average weight sampled eggs is 56.85825 gm in case of Hybrid commercial chicken that is greater than average weight of sampled eggs of any other chicken. Rhode Island Red chickens give least weighted eggs of 39.3576 gm.
The stacked bar plot indicates sample wise weights of eggs of different chickens. The total weight of eggs is highest in case of Bleck Minorcas chicken followed by Buff Sussex chicken. The total weight of eggs is lowest in case of Rhode Island Red. Black Minorcas chickens are maximum in number (9) present in the sample whereas Rhode Island Red are minimum in number (4) present in the sample. The Pearson correlation coefficient ( r ) is (-0.256). The value indicates a weak negative association between cow weight and weekly average milk fat (Benesty et al. 2009).
4.2. Confidence Interval
In case of inferential statistics, Confidence interval at 95% level is described as while a similar sample is taken again then with a 95% probability (Nakagawa and Cuthill 2007), it could be said that weekly average milk fat ranges from 4.420007 to 7.6485 according to the insignificant effect of weight of cows in kg. The Jersey cows’ weight ranges from (459.6022 + 26.13995) kg to (459.6022 – 26.13995) kg according to the descriptive statistics with 95% probability. Similarly, weight of Kerry cows ranges from (406.0689 + 12.52268) kg to (406.0689 – 12.52268) kg with 95% probability. Similarly, weekly average milk fat in percentage of Jersey cows ranges from (4.1 + 0.203911283) to (4.1 – 0.203911283) with 95% probability. Weekly average milk fat in percentage of Kerry cows ranges from (4.978148148 + 0.251348027) to (4.978148148 – 0.251348027). In the first sample, sampled egg weights taken from 10 samples provide weight-range of eggs of eight types of chickens with 95% probability. An example of confidence interval of two sample-mean Z-test is provided in the following formula
The data analysis team observed that weight of Jersey cow is greater than Kerry cows. However, the quality of milk in terms of milk fat is greater for Kerry cows. Therefore, Kerry cow is much profitable for business.
4.3. Hypothesis Testing
In statistics, a hypothesis testing is utilised to test whether a given condition is true in a sample data. All hypothesis tests verify the presence of a Null hypothesis or an alternative hypothesis. For the present data, we examine the presence of a null hypothesis or an alternative hypothesis. For the current data, we examined the hypothesis whether the weight of cows has high significant effect on weekly average milk fat in percentage or not. The null hypothesis is –
H0: Weight of cows has high significant effect on weekly average milk fat.
HA: Weight of cows does not have high significant effect on weekly average milk fat.
To test the null hypothesis, we used the linear regression model and Pearson correlation coefficient.
4.4. Discrete Probability Distribution:
The discrete probability distribution is applied to measure the estimates and distribution of weights of eggs in tem samples and the weights of cows as well as weekly average milk fat of sampled cows.
The data analysis team visualized that use of discrete probability distribution helped them a lot to consider the performances of chickens and cows. We applied here binomial probability distribution that is a discrete probability distribution. We executed frequency probability histogram with the milk data.
A Binomial probability distribution is created by the probability of “n” repeated trials. The probability of success (p) is same in every trial. The trials are independent in nature. A binomial random variable (discrete) is the number of successes X in “n” repeated trials of a binomial experiment. The binomial distribution is nothing more than probability distribution of binomial random variable. The binomial probability refers to the probability that a binomial experiment results in exactly X successes. The binomial probability distribution has following properties:
The mean of the distribution is = n*p
The variance of the distribution is = n*p*(1-p)
The standard deviation of the distribution is = SQRT(n*p*(1-p))
We provided the histogram of probabilities of frequencies achieved from binomial trial of weekly milk fat of Kerry cows and Jersey cows in appendix.
4.5. Regression Analysis:
Regression analysis is used in data analysis to test the association between dependent and independent variables. In case of two variable linear regression model, dependent variable is accounted as response and independent variable is accounted as predictors. Regression analysis detects how much of the dependent variable could be predicted with the help of the independent variable. Regression analysis determines how much of the dependent variable could be predicted with the help of the independent variable (McCullagh 1984). The regression model is represented as-
i = 1(1)n
Here, Yi are the values of dependent variables
Xi are the values of independent variables
a is the intercept,
b is the slope of the regression equation
The dependent and response variable is Weekly average milk fat in percentage and the single independent and predictor is Cow weight in kg. The linear regression model is-
Weekly average milk fat (%) = a + b * Cow weight (Kg) (Lomax and Hahs-Vaughn 2012)
The fitted linear regression model is-
Weekly average milk fat (%) = 6.03425935 – 0.003390014 * Cow weight (Kg)
The interpolation method could be used to apprehend the regression equation. For any of the given two variables, we could find the value of other variable. Now, if we put some value in weight of cows in kg, then we can get weekly average milk fat (%) and vice versa. This way from one known value, we can predict and estimate other value. Let use consider, weekly average milk fat in percentage is 5%, the weight of cow is [(5 – 6.03425935)/ (0.003390014)] kg = 305.090 kg. Conversely, let the weight of cow is 500 kg. Then the predicted weekly average milk fat would be (6.03425935 – 0.003390014 * 500) % = 4.34%.
Multiple R-square = 0.06538197. Only the 6.53% of variability of Weekly average milk fat in percentage is described by Cow weight in kg. So, the “coefficient of determination” is very weak in this case. The linear relationship is very weak in this linear model. The p-value of significance F-statistic is 0.073091 (>0.05). Therefore, we accept the null hypothesis of absence of significant association of these two variables.
We can conclude that from the viewpoint of correlation coefficient and simple linear regression, we observe the absence of significant association between weekly average milk fat and cow weight. The residual scatter plot of residual values of linear regression model.
The mean weight of Jersey cow (459.6022 kg) is greater than Kerry cow (406.0689 kg). Variance and standard deviation are high in case of Jersey cow. Both are positively distributed.
4. Conclusion:
In this present assignment, we analysed the distribution of discrete or random data tabulated by two samples. We can estimate weekly average milk fat in percentage by the use of weight of cows. An extension of regression analysis reflects in hypothesis testing. The ANOVA table indicates the differences of means of weights of eggs from different samples. Rhode Island Red shows least average weights of eggs whereas Hybrid commercial and Black Minorcas have highest average weights of eggs. The Jersey cows are heavier in weight but Kerry cows produce milk of better quality. We conclude that the dairy company should optimize their business profit by expansion of number of Sessex and Commercial chickens for eggs and Kerry cows for milk.
5. Recommendation:
From the analysis, we find the differences of weights of eggs for different chickens. Possible measurements are needed to be taken. Rhode Island leg chicken and Jersey cows provide the performances among all types of chickens and cows respectively. Therefore, these two species need special care, food and nutrition (Black 2016). Therefore, according to the data analysis on the quality and measurements of the investigated company, we recommend that Hybrid Commercial and Black Minorcas chicken should be farmed more than other chicken in the dairy company. Correspondingly, farming of Kerry cows should be adopted more than Jersey cow for better profit of the dairy company as per as our team analysis.
6. References
Benesty, J., Chen, J., Huang, Y. and Cohen, I., 2009. Pearson correlation coefficient. In Noise reduction in speech processing (pp. 1-4). Springer Berlin Heidelberg.
Black, K. (2016). Business Statistics, John Wiley.
Dixon, W.J. and Massey Frank, J., 1950. Introduction To Statistical Analsis. McGraw-Hill Book Company, Inc; New York.
Lange, K.L., Little, R.J. and Taylor, J.M., 1989. Robust statistical modeling using the t distribution. Journal of the American Statistical Association, 84(408), pp.881-896.
Levine, D.M., Berenson, M.L. and Stephan, D., 1999. Statistics for managers using Microsoft Excel (Vol. 660). Upper Saddle River, NJ: Prentice Hall.
Lieberman, M.D. and Cunningham, W.A., 2009. Type I and Type II error concerns in fMRI research: re-balancing the scale. Social cognitive and affective neuroscience, 4(4), pp.423-428.
Lomax, R. and Hahs-Vaughn, D. (2012). An introduction to statistical concepts. 1st ed. New York: Routledge.
McCullagh, P., 1984. Generalized linear models. European Journal of Operational Research, 16(3), pp.285-292.
Mendenhall, W., Beaver, R. and Beaver, B. (2012). Introduction to Probability and Statistics. 14th ed. Cengage Learning.
Nakagawa, S. and Cuthill, I.C., 2007. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biological reviews, 82(4), pp.591-605.
Oja, H., 1983. Descriptive statistics for multivariate distributions. Statistics & Probability Letters, 1(6), pp.327-332.
Rosenblatt, M., 1956. A central limit theorem and a strong mixing condition. Proceedings of the National Academy of Sciences, 42(1), pp.43-47.
Salkind, Neil J. (2015). Excel Statistics: A Quick Guide. 3rd ed. Sage Publications.
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download