The provided dataset is a sample of 2013 -2014 of Australian taxation office which indicates the lodge tax return of lodgment method. The lodgment method is either a tax agent or self-prepare, the people pay tax by using one of the lodgment method, either they select a tax agent or self-prepare to pay tax for a financial year. The assignment is based on the analysis of lodgment method for gender and different age groups. The article is based on the analysis of information of individual tax return after the end of the financial year.
There are two types of data which can be used for analysis, first is Primary data, which can directly have collected from the customers of the organization on the basis of questionnaire. And other is Secondary data, which can have collected from official website of the organization and also from other official websites related to organization. (Goodwin, 2012, p.130). The dataset 1 is a subset of Australian taxation office collected from official website of the ATO. The data is collected by a government site not by the specific user, so it is a secondary data type. The data can be categorized as qualitative or , the qualitative data can be further categorized as binary level, nominal level or the ordinal level of measurements and the quantitative data can be further categorized as interval or ratio level measurements. (Morgan, 2013, p.9). The variable gender has two categories (0=male and 1=female), so it is a nominal level of variable, it indicates the gender of a tax payer.
The variable age_range contains integer values, so it is a quantitative variable. It indicates the age of a tax payer.
The variable lodgment_method have two categories (A=Tax agent and B=Self prepare), so it is a nominal level of variable, it indicates the lodgment category of a tax payer.
The variable tot_in_amt contains integer values, so it is a quantitative variable. It indicates the total income amount of a tax payer in which a tax payer will pay tax.
The variable tot_ded_amt contains integer values, so it is a quantitative variable. It indicates the total deduction amount of a tax payer. It is an amount of total deduction in the actual income amount as a tax.
The first five cases of dataset 1 is sown below:
Gender |
age_range |
Lodgment_method |
Tot_inc_amt |
Tot_ded_amt |
0 |
5 |
A |
49612 |
8184 |
0 |
6 |
A |
131313 |
7686 |
1 |
6 |
S |
53320 |
1201 |
0 |
9 |
S |
56748 |
95 |
1 |
6 |
A |
84863 |
2016 |
Statistical methods basically a process of collection, summarizing, analysis and the interpretation of the analysis. Making questionnaire on the basis of the importance of factors of the study of the organization. The characteristics of the study will contain, a specific plan, design structure to get the answers from the respondents. The questionnaire will contain the questions related to the open ended, closed ended, and the nominal, ordinal and interval level ratio variables. The analysis of the collected data from the questionnaire will indicate the strength, weakness, opportunities and threats of the factors of the study. The statistical data will indicate a summary statistic of the analysis, which will contain the graphical representation of each factor, numerical summary of each factor and the final principal components of the study. (Brace, 2008, p.45).
The procedure of the data collection and analysis can be derived as follows:
The survey includes following variables:
Section 2:
The variable lodgment method is a qualitative variable which have two categories as “A=Tax agent” or “B=Self prepare”. So, a pie chart will be suitable for lodgment method, the percentage of frequencies for each type of preference of lodgment method is shown below:
So, the number of people who hire tax agent to pay tax are 740 and the number of people who pays tax by self-preparation are 260.
The sample size is 1000 and 740 people hired a tax agent to pay tax. The sample proportion p is a point estimate of the population proportion. So, the point estimate for p for the proportion is:
Now use Z-statistic to calculate the 95% confidence interval, the formula to 95% confidence interval is shown below:
Here, is the sample proportion, n is the sample size and Z is the critical value at a specified level of significance. The critical value at 5% level of significance is 1.96. So, the confidence is calculated as:
Hence, the 95% confidence interval of the proportion of tax payers who lodge the tax return by using an Agent is (0.712, 0.767).
The lower limit of the confidence interval is 0.712 and the upper limit is 0.767. The confidence interval contains the sample proportion value (0.74), so it can say that sample of 1000 people is a representative of the population.
Section 3:
The variable lodgment method is a qualitative variable which have two categories as “A=Tax agent” or “B=Self prepare”. So, a pie chart will be suitable for lodgment method, the percentage of frequencies for each type of preference of lodgment method is shown below:
So, the number of people who hire tax agent to pay tax are 245 and the number of people who pays tax by self-preparation are 258.
The sample size is 245 and 500 people hired a tax agent to pay tax. The sample proportion p is a point estimate of the population proportion. So, the point estimate for p for the proportion is:
Now use Z-statistic to calculate the 95% confidence interval, the formula to 95% confidence interval is shown below:
Here, is the sample proportion, n is the sample size and Z is the critical value at a specified level of significance. The critical value at 5% level of significance is 1.96. So, the confidence is calculated as:
Hence, the 95% confidence interval of the proportion of tax payers who lodge the tax return by using an Agent is (0.443, 0.530).
The lower limit of the confidence interval is 0.443 and the upper limit is 0.530. The confidence interval contains the sample proportion value (48.7%), so it can say that sample of 503 people is a representative of the population.
Thus, the people who prefer to hire tax agent in dataset 1 is greater than the people who prefer hire tax agent in dataset 2. So, dataset 2 indicates almost equal number of persons prefer to hire tax agent or self-preparation to pay the tax while dataset 1 indicates most of the persons prefer to hire tax agent than self-preparation to pay the tax.
The age group is a quantitative variable and the lodgment method is a qualitative variable, the obtained histogram and the frequency for each age group corresponding to the age groups by using excel is shown below:
Count of Lodgment_method |
Column Labels |
||
Row Labels |
Agent |
Self Prepared |
Grand Total |
0 |
41 |
16 |
57 |
1 |
34 |
15 |
49 |
2 |
57 |
11 |
68 |
3 |
78 |
16 |
94 |
4 |
85 |
22 |
107 |
5 |
86 |
15 |
101 |
6 |
75 |
17 |
92 |
7 |
82 |
27 |
109 |
8 |
74 |
30 |
104 |
9 |
61 |
38 |
99 |
10 |
51 |
41 |
92 |
11 |
16 |
12 |
28 |
Grand Total |
740 |
260 |
1000 |
The histogram for row percentages is shown below:
So, the maximum age group belongs to the age 5.
The chi-square test applied to test the association between the categorical variables. The obtained analysis for age group corresponding to the lodgment method are done in dataset 1 excel worksheet:
The formula for the test statistic is given below:
Here, is the expected frequency and is the observed frequency. The chi-square test will be used if expected frequency is greater than or equal to 5. The formula to calculate the expected frequencies is shown below:
The calculated expected frequencies for all the age groups corresponding to the lodge method is shown below:
Count of Lodgment_method |
Column Labels |
||
Row Labels |
Agent |
Self Prepared |
Grand Total |
0 |
42.18 |
14.82 |
57 |
1 |
36.26 |
12.74 |
49 |
2 |
50.32 |
17.68 |
68 |
3 |
69.56 |
24.44 |
94 |
4 |
79.18 |
27.82 |
107 |
5 |
74.74 |
26.26 |
101 |
6 |
68.08 |
23.92 |
92 |
7 |
80.66 |
28.34 |
109 |
8 |
76.96 |
27.04 |
104 |
9 |
73.26 |
25.74 |
99 |
10 |
68.08 |
23.92 |
92 |
11 |
20.72 |
7.28 |
28 |
Grand Total |
740 |
260 |
1000 |
All of the expected frequencies are greater than 5, so chi-square test for association will be used for analysis. Consider the null and the alternate hypothesis as shown below:
Null hypothesis: There is no association between the age group corresponding to the lodgment method.
Alternate hypothesis: There is an association between the age group corresponding to the lodgment method.
The chi-square statistic calculations are shown below:
Count of Lodgment_method |
Column Labels |
||
Row Labels |
Agent |
Self Prepared |
Grand Total |
0 |
0.033 |
0.094 |
0.127 |
1 |
0.141 |
0.401 |
0.542 |
2 |
0.887 |
2.524 |
3.411 |
3 |
1.024 |
2.915 |
3.939 |
4 |
0.428 |
1.218 |
1.645 |
5 |
1.696 |
4.828 |
6.525 |
6 |
0.703 |
2.002 |
2.705 |
7 |
0.022 |
0.063 |
0.086 |
8 |
0.114 |
0.324 |
0.438 |
9 |
2.052 |
5.839 |
7.891 |
10 |
4.285 |
12.196 |
16.481 |
11 |
1.075 |
3.060 |
4.135 |
Grand Total |
12.460 |
35.464 |
47.924 |
The degree of freedom for the test is:
The p-value for the chi-square test is less than 0.0005.
According to the results obtained, the value of chi-Square test statistic is 47.92. So, the p-value of the test is less than the level of significance 0.05, thus the null hypothesis of the test gets rejected. Hence, it can conclude that there is an association between the age group corresponding to the lodgment method.
The total income is a quantitative variable and the lodgment method is a qualitative variable, the obtained boxplot by using the Statkey is shown below:
The above boxplot indicates the outliers in the data set of total income corresponding to lodge method agent and self-prepared.
The obtained dot plot is shown below:
So, maximum number of people who wants to hire a tax agent have total income between 0 to 50000.
The obtained summary statistics is shown below:
Statistics |
A |
S |
Overall |
Sample Size |
740 |
260 |
1000 |
Mean |
60601.249 |
43878.846 |
56253.424 |
Standard Deviation |
70226.303 |
42013.481 |
64495.602 |
Minimum |
-7752 |
0 |
-7752 |
Q1 |
25320.50 |
18216.00 |
23017.50 |
Median |
46077.50 |
37318.00 |
44113.50 |
Q3 |
73555.00 |
57724.50 |
70593.00 |
Maximum |
1052414 |
352377 |
1052414 |
The average income who prefer tax agent is 60601.24 and the maximum income is 1052424.
The average total income for who prefer self-prepare to pay tax is 43878.84 and the maximum total income for self-prepared is 352377.
The distribution of income for the lodgment method is positive skewed as most of the income belongs to the left side. So, it can say the data for the total income is skewed and data is not normally distributed. The boxplot shows outliers in the dataset, which indicates data for income is non-normally distributed.
The scatterplot is a way to represent the visual relationship between two quantitative variables, the visual representation indicates the strength of relationship between the variables or how they are associated. The one variable can be considered as explanatory variable and another variable can be considered as the response variable. The positive trend of scatterplot indicates a positive association between the variables, as value of one variable increases the corresponding value of another variable also increases. (Rubin, 2009, p.209). The negative trend of scatterplot indicates a negative association between the variables, as value of one variable increases the corresponding value of another variable decreases.
The no trend of scatterplot indicates a non-association between the variables. Correlation is a measure of the relationship between the two variables. It measures the strength of relationship between two or more normally distributed interval or ratio level variables. The coefficient of correlation is denoted by r, and the value of correlation coefficient lies value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation.
In general, if (r) lies between 0-0.19, then the strength of relationship between two variables is very weak. If (r) lies between 0.20-0.39 then strength of relationship between two variables is weak. If (r) lies between 0.40-0.59 then strength of relationship between two variables is moderate. If (r) lies between 060-0.79 then strength of relationship between two variables is strong. And, if the value of correlation coefficient (r) lies between 0.79-0.99 then it can say that the strength of relationship between two variables is very strong. (Israel, 2009, p.111). The scatterplot for the total income amount and total deduction amount for people who hire a tax agent and self-prepared is shown below:
Thus, as the value of the variable on the horizontal axis increases weakly, the corresponding value of the variable on the vertical axis increases weakly.
And, as the value of the variable on the horizontal axis increases weakly, the corresponding value of the variable on the vertical axis increases weakly. Thus, there is a weak positive association between the variables.
The value of correlation coefficient for the relationship between total income amount and total deduction amount for people who hire a tax agent is 0.385. And, the value of correlation coefficient for the relationship between total income amount and total deduction amount for people who self-prepared is 0.396.
The value of the correlation coefficient is 0.385 for the association between total income amount and total deduction amount for people who hire a tax agent. And, the value correlation coefficient is 0.396 for the association between total income amount and total deduction amount for people who prepare by self.
Thus, there is a weak positive association between total income amount and total deduction amount for people who hire a tax agent, and there is a weak positive association between total income amount and total deduction amount for people who prepare by self.
The number of people who hire tax agent to pay tax are 740 and the number of people who pays tax by self-preparation are 260. The 95% confidence interval of the proportion of tax payers who lodge the tax return by using a tax agent is (0.712, 0.767), so it can say that sample of 1000 people is a representative of the population.
The number of people who hire tax agent to pay tax are 245 and the number of people who pays tax by self-preparation are 258. The 95% confidence interval of the proportion of tax payers who lodge the tax return by using an Agent is (0.443, 0.530), so it can say that sample of 503 people is a representative of the population.
There is an association between the age group corresponding to the lodgment method.
The maximum number of people who wants to hire a tax agent have total income between 0 to 50000 and the maximum number of people who wants to self-preparation have total income between 0 to 50000. The average income who prefer tax agent is 60601.24 and the maximum income is 1052424, and the average total income for who prefer self-prepare to pay tax is 43878.84 and the maximum total income for an individual id 352377.
The value of the correlation coefficient is 0.385 for the association between total income amount and total deduction amount for people who hire a tax agent. And, the value correlation coefficient is 0.396 for the association between total income amount and total deduction amount for people who prepare by self.
Thus, there is a weak positive association between total income amount and total deduction amount for people who hire a tax agent, and there is a weak positive association between total income amount and total deduction amount for people who prepare by self.
The people who prefer to hire tax agent in dataset 1 is greater than the people who prefer hire tax agent in dataset 2. The distribution of income for the lodgment method is positive skewed as most of the income belongs to the left side. So, it can say the data for the total income is skewed and data is not normally distributed for the data set 1, and the data may conclude wrong findings. Thus, researcher should collect the data again to do the analysis for the further research.
References:
Goodwin, S. (2012) SAGE secondary data analysis. India: SAGE publications Pvt. Ltd.
Morgan, D. (2013) Integrating Qualitative and Quantitative methods: A Pragmatic Approach. India: SAGE publications Pvt. Ltd.
Bethlehem, J. (2010) Applied Survey Methods. United States of America: JOHN WILEY & SONS, INC., Publication.
Brace, I. (2008) questionnaire Design: How to Plan, Structure and Write Survey Material For Effective Market Research. Second edition. USA: Kogan Page publishers.
Rubin, A. (2009) Statistics for Evidence-based Practice and Evaluation. Second edition. Canada: Cengage Learning.
Israel, D. (2009) Data Analysis in Business Research. India: SAGE publications Pvt. Ltd.
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download