The present analysis is based on a dataset retrieved from the world bank. The research team is interested in analysing the health and population statistics for the period ranging from 2001 to 2015. The geographical region for which the data pertains is East Asia and Pacific Region. The research team would like to investigate the relation between crude death rate, total health expenditure as a percentage of GDP and Gross National Income per capita (GNI). The research intends to relate the health expenditure and GNI to crude death rate.
The analysis assumes significance in the fact that government and planners can utilise the information for increasing the health of the country and thus reducing the mortality rate.
The research intends to analyse the geographical region of East Asia and Pacific Region.
In order to undertake the analysis is separated into four segments. In the first segment one-variable analysis is done. The variables are analysed through box-plots and histogram. This is followed by two-variable is done. Again we use box-plots to visualise / analyse the data. Cluster analysis of the dataset is used to cluster the countries on the basis of certain variables. K-means clustering is used to cluster the countries. Finally, regression analysis is used to relate two-variables.
Quantitative data analysis of the data is done. The data is derived from the World Bank for the period of 2001 to 2015. East Asia and Pacific Region has only been considered for doing the research.
Prior to using the data set we need library functions to analyse the dataset. It is essential to check for the presence of library files being present in R. In the condition that the library files are present the libraries are loaded else they need to be installed.
Once the necessary library files are loaded, the data file needs to be imported into “R.” The dataset is given in csv format. Thus, the R command suitable for importuning csv file is used. The command opens a window. Selecting the location of the csv would load the data file into the R program. Moreover, it is also found that there are missing values. In addition, the data file contains information pertaining to different attributes of health and population. For the analysis of the dataset crude death rate, total health expenditure as a percentage of GDP and Gross National Income per capita (GNI) have been selected. Thus the above attributes are selected for further analysis. Further it is also seen that data for the year 2015 is absent. Thus data for the year 2015 is omitted from further analysis.
The first one-variable analysis is a boxplot of crude death rate (per 1000 people) in the region. The analysis provides us with the information of deaths occurring in the region. It indicates the variance in the decrease in population. The average death rate is 6.33 per 1000 persons with a standard deviation of 1.44. Moreover, the median death rate is also 6.33. thus, the death rate in the region is normally distributed. The minimum death rate in the region is little less than 2 and the maximum death rate is more than 10.
The total health expenditure (as a % of GDP of a country) of the region is analysed through a histogram. The health expenditure of the countries from 2001 to 2014 is analysed. A histogram reflects the distribution of the expenditure. The average health expenditure of the region is 6.6 with a standard deviation of 4.2. The median extent of Health Expenditure is 5.14. It can be visualised from the histogram that the distribution of expenditure is skewed. Thus, 50% of the countries spend less than the average health expenditure of the region.
The ratio of the income of the country to its population is known as GNI. The variables represent the average income of the population of the country. A boxplot is used to represent the GNI. The analysis shows that there is wide variation in the average income of the countries of the region spread over 14 years (from 2001 to 2014). The average income of the countries of the region is $11522 with a standard deviation of $15406. Moreover, 50% of the population had an average income of $3570. The minimum and maximum average income is $310 and $76300 respectively.
For the first two-variable analysis we analysed the distribution of GNI per capita across countries of the region. From the analysis it is found that for most of the countries in the 14 years (2001 to 2014) the gross national income is less than $10000. Moreover, for countries like MACAO and Australia there have been wide variations in the income distributions during the period of study. In addition, it can be further visualised that the income distribution has been skewed within the period of the study also.
For the second two-variable analysis we analysed the crude death rate across countries of the region. From the analysis it is found there is a wide variation in the crude death rate for most of the countries in the 14 years (2001 to 2014). It is found from the analysis that for most of the countries of the region the crude death rate is more than 6 per 1000. In addition, it can be further visualised that the crude death rate is skewed for the period for most of the countries. Moreover, there is a wide variation in the crude death rate of the countries.
Clustering is the process of segregating the dataset into groups such that individual groups have similar properties. The objective followed in clustering is to locate a centroid for a group of dataset (Guha and Mishra 2016). The add more data and revaluate the centroid. All the data is added and the final value of the centroid is evolved. The process of k-means clustering takes place in two stages. In the first stage an estimate is made regarding the initial number of clusters. The second stage is an aggregation of a number of steps. Initially the distance between the data points as well as the cluster centres is calculated. Next a datapoint is assigned to the cluster wherein the distance between the datapoint and the cluster centre is the minimum. Finally, the cluster centre is recalculated. The process is repeated till all values converge (Kodinariya and Makwana 2013).
The relation of crude death rate (per 1000 people) to the gross health expenditure of the countries of the region is grouped into clusters. After some initial calculations it is found that the countries of the region can best be grouped into four groups. There is a 85.0% similarity in sum of squares of clusters.
The means of the clusters are
Cluster |
SP.DYN.CDRT.IN |
SH.XPD.TOTL.ZS |
1 |
7.164452 |
9.974887 |
2 |
4.304804 |
3.526437 |
3 |
6.500958 |
5.237244 |
4 |
7.729946 |
2.392542 |
The relationship between two variables is investigated through the use of linear regression (Rucker et al., 2015). The independent variable is used to predict the changes in the dependent variable. The formula: is used to represent the relationship. The value of “m” represents the change in the value of Y with change in one unit of the value of “X” (Kass et al., 2014).
In the first of the regression analysis the crude death rate (per 1000) people is related to the gross national income of the countries of the region for 2014.
The crude death rate can be expressed as:
Thus it can be envisaged that with increase in Gross National Income there is a decrease in Crude Death Rate. From the plot also it can be visualised that the crude death rate decreases with increase in Gross National Income.
In the second of the regression analysis the crude death rate (per 1000) people is related to the Health Expenditure of the countries of the region.
The crude death rate can be expressed as:
From the plot is found that there is small increase in Crude death rate with increase in Health Expenditure of a country.
Conclusion
The analysis of the data shows that the while the crude death rate of the region is normally distributed, the total health expenditure and gross national income is not normally distributed. Moreover, there are wide variations in the crude death rate and gross national income of the countries of the region. The countries of the region can be grouped into four clusters based on crude death rate and health expenditure. Further, it is found that with increase in gross national income there is a decrease in crude death rate. On the other hand, with increase in total health expenditure there is an increase in crude death rate.
Whilst trying to make the analysis several difficulties were faced. The primary difficulty was the selection of the correct attributes. The attributes had to be so selected that they could be related. The difficulty was compounded by the presence of missing values. Moreover, the year variables since the year variable were in columns the data had to be recast to take year as rows. This was also done with the intention of getting the attributes as columns and the values for years in the columns of the attributes. The library files in R made the work easier to melt and dcast the dataset. All in all, the importance of pre-processing, use of linear regression, clustering and representation of attributes as a one-variable and two-variable plot was a good learning experience.
Reference
Guha, S. and Mishra, N., 2016. Clustering data streams. In Data Stream Management (pp. 169-187). Springer, Berlin, Heidelberg.
Kass, R.E., Eden, U.T. and Brown, E.N., 2014. Generalized Linear and Nonlinear Regression. In Analysis of Neural Data (pp. 391-412). Springer, New York, NY.
Kodinariya, T.M. and Makwana, P.R., 2013. Review on determining number of Cluster in K-Means Clustering. International Journal, 1(6), pp.90-95.
Rucker, D.D., McShane, B.B. and Preacher, K.J., 2015. A researcher’s guide to regression, discretization, and median splits of continuous variables. Journal of Consumer Psychology, 25(4), pp.666-678.
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download