The aim of this assignment is to test skills of collecting and analyzing data to answer a specific business problem. The assignment also seeks to present an opportunity to apply the theories learnt during the course such as finding numerical summaries, displaying with appropriate graphs and using statistical inferences to solve business problems, including constructing hypotheses, test them and interpreting the findings (Ryabko, Stognienko, & Shokin, 2004).
We are presented with data for NSW transport system in order to come up with decision based recommendations that aims at improving public transport system. The project presents a series of research questions which need to be answered based on the knowledge gained in the course of the study.
The first dataset (dataset 1) is a secondary data provided by NSW transport system. The data has a total of 1000 observations with six variables. The description of the variables is given below;
Table 1: Description of the variables
Variable |
Description |
Values |
Variable Type |
mode |
Type of the public transport |
Bus, Train, Ferry and Light Rail |
Nominal Variable (qualitative) |
date |
Date of the tap on/off held |
Date/month/year |
Nominal Variable (qualitative) |
tap |
It is a tap on or off |
On and Off |
Nominal Variable (qualitative) |
loc |
Locations of stops. For bus postcodes and others name of the stations |
Postcodes and names of the stations |
Nominal Variable (qualitative) |
count |
Total number tap on or off on the certain location and the certain date |
Number |
Scale variable (quantitative) |
The possible cases used in this study are 1000 cases (number of observations).
The second dataset (dataset 2) is a primary data provided that was collected by the researcher. A random sample of 50 individuals was selected and the persons interviewed in regard to their gender, age and the mode of transport they prefer to use most. The data has a total of 50 observations with three variables. Give a description of cases you consider for this data set.
For the dataset 2, a random sampling was employed to collect the data from individuals so as to understand the mode of transport they frequently use. This is a primary data since the data is collected directly from the subjects. The limitation of this data is the fact that only a small sample size of 50 cases was selected. The description of the variables is given below;
Table 2: Description of the variables
Variable |
Description |
Values |
Variable Type |
Mode |
Type of the public transport |
Bus, Train, Ferry and Light Rail |
Nominal Variable (qualitative) |
Age |
Date of the tap on/off held |
Number |
Scale variable (quantitative) |
Gender |
Gender of the respondent |
Male and female |
Nominal Variable (qualitative) |
In this section, we attempt to answer the research questions posed. To answer the research questions, we use dataset 1.
To answer this research question, we ran a frequency distribution test. Table 1 below gives the results.
Table 3: Frequency table for the mode of transport used
Row Labels |
Count of mode |
Percent |
Bus |
467 |
46.7% |
Ferry |
25 |
2.5% |
Light-rail |
24 |
2.4% |
Train |
484 |
48.4% |
Grand Total |
1000 |
100.0% |
As can be seen, the top most used modes were use of bus and train. Train however came out as the most frequently used with 48.4% (n = 484) of the participants having used it in the last 1 week. The second most commonly used mode was the bus with 46.7% (n = 467) having used it in the last one week. Ferry and Light-rail were among the least used with only 2.4% (n = 24) having used light-rail in the last one week and 2.5% (n = 25) said to have used ferry in the last one week.
Figure 1: Bar chart on mode of transport used
To answer the given research question, the following hypothesis was tested.
H0: The proportion of transport users who use train is not significantly different from 50%.
HA: The proportion of transport users who use train is significantly different from 50%.
To test this, a One-Sample t-test was used and it was tested at 5% level of significance. The results are given below;
Table 4: One-Sample Statistics
N |
Mean |
Std. Deviation |
Std. Error Mean |
|
Train |
1000 |
.4840 |
.49999 |
.01581 |
Table 5: One-Sample Test
Test Value = 0.5 |
||||||
t |
df |
Sig. (2-tailed) |
Mean Difference |
95% Confidence Interval of the Difference |
||
Lower |
Upper |
|||||
Train |
-1.012 |
999 |
.312 |
-.01600 |
-.0470 |
.0150 |
A one-sample t-test was run to determine whether the proportion of NSW transport users who rely on train as the mode of transport is more than 50%. The proportion of those who used train transport (0.484 ± 0.5) was not significantly different from 50% (95% CI, -0.05 to 0.02), t(999) = -1.012, p = .312.
NSW Government need to decide on whether they have to build an underground Railway line from either Parramatta, Bankstown or Gosford to central. To prepare a recommendation for this;
In this section we first consider the number times the train left the three mentioned locations. This information is given in the table below;
Table 6: Frequency of train from the three locations
|
Count |
Percent |
Parramatta Station |
7 |
53.8% |
Gosford Station |
2 |
15.4% |
Bankstown Station |
4 |
30.8% |
Figure 2: Bar chart for the count of times the train leaves the stations
Considering the data with trains only, it was established that the average number of counts was 103.38 with the standard deviation of the counts being 226.14
Table 7: Descriptive statistics for the variable count
count |
|
Mean |
103.379 |
Standard Error |
7.151282 |
Median |
53 |
Mode |
18 |
Standard Deviation |
226.1434 |
Sample Variance |
51140.84 |
Kurtosis |
238.9731 |
Skewness |
13.04214 |
Range |
4955 |
Minimum |
18 |
Maximum |
4973 |
Sum |
103379 |
Count |
1000 |
The mode of counts was found to be 18 with the median count being 53. The skewness value indicated that the data is highly and heavily skewed. This is evident from the fact that the minimum count was 18 while the maximum count was 4973. This presents a very huge range which suggests a probable presence of outliers in the dataset hence bringing about the skewness observed.
The histogram presented below further shows that the data is skewed. The shape of the histogram indicates that the data is skewed to the right (longer tail to the right).
Figure 3: Histogram of the variable count
To answer this, the following the hypothesis was tested at 5% level of significance.
H0: There is no significant difference in the mean counts of taps on and taps off
HA: There is significant difference in the mean counts of taps on and taps off.
To test this, an independent samples t-test was used. The results are given below;
Table 8: Group Statistics
Tap |
N |
Mean |
Std. Deviation |
Std. Error Mean |
|
count |
On |
481 |
106.65 |
269.081 |
12.269 |
Off |
519 |
100.35 |
177.530 |
7.793 |
Table 9: Independent Samples Test
Levene’s Test for Equality of Variances |
t-test for Equality of Means |
|||||||||
F |
Sig. |
t |
df |
Sig. (2-tailed) |
Mean Difference |
Std. Error Difference |
95% Confidence Interval of the Difference |
|||
Lower |
Upper |
|||||||||
count |
Equal variances assumed |
.083 |
.774 |
.440 |
998 |
.660 |
6.296 |
14.319 |
-21.802 |
34.394 |
Equal variances not assumed |
.433 |
821.5 |
.665 |
6.296 |
14.535 |
-22.233 |
34.825 |
We performed an independent t-test was in order to compare the average number of counts for the taps on and the taps off. Results showed that the average number of counts for the taps on (M = 106.65, SD = 269.08, N = 481) did not significantly differ with the average number of counts for the taps off (M = 100.35, SD = 177.53, N = 519), t (998) = 0.440, p > .05, two-tailed. The mean difference of 6.30 observed was insignificant at 5% level of significance. Essentially the results indicate that whether the taps are on or off does not really affect the number of counts.
We concluded that there is no significant difference in the average number of counts for the taps off and taps on. The chosen three stations also did not show much traffic. It is therefore recommended that the government’s plan to build an underground Railway line from either Parramatta, Bankstown or Gosford to central is not as ideal as would be required.
You are interested in finding whether there is a difference in preference between different gender in terms of their transport mode (Bus, Train, Ferry and Light Rail). By considering appropriate number of cases and variable, give a proper graphical display and use it to write a comments.
The results for this section are presented below;
Count of Gender |
Column Labels |
|
|
Row Labels |
Female |
Male |
Grand Total |
Bus |
16.7% |
42.3% |
30.0% |
Ferry |
20.8% |
7.7% |
14.0% |
Light Rail |
8.3% |
11.5% |
10.0% |
Train |
54.2% |
38.5% |
46.0% |
Grand Total |
100.00% |
100.00% |
100.00% |
As can be seen, most of the male commuters (42.3%, n = 11) said to use bus while most of the female commuters (54.2%, n = 13) said to use train.
Chi-Square test
A Chi-square test was performed to determine whether there is significant association between gender and the preferred mode of transport (Bagdonavicius & Nikulin, 2011). The hypothesis tested is given below;
H0: There is no significant association between gender and preferred mode of transport
HA: There is significant association between gender and preferred mode of transport
This was tested at 5% level of significance and the results are given below;
Table 10: Chi-Square Tests
Value |
df |
Asymp. Sig. (2-sided) |
|
Pearson Chi-Square |
5.072a |
3 |
.167 |
Likelihood Ratio |
5.239 |
3 |
.155 |
N of Valid Cases |
50 |
||
a. 4 cells (50.0%) have expected count less than 5. The minimum expected count is 2.40. |
The p-value for the test is 0.167 (a value greater than 5% level of significance), we therefore fail to reject the null hypothesis and conclude that there is no evidence that there is significant association between gender and preferred mode of transport.
Section 5: Discussion & Conclusion
The main purpose of this study was to present analysis of NSW transport system. We were provided with a secondary dataset (dataset 1) that comprised of 1000 cases with six variables. Apart from the provided secondary data on NSW transport system, we also gathered survey on 50 individuals. We sought to fight out the most commonly used mode of transport among the individuals. Results showed that the most commonly used mode of transport was train followed by bus though people used ferry and light rails, their usage was very minimal as compared to the use of bus and train. In regard to the comparison of the mode of transport in terms of the males and the females using dataset 2, we noted that majority of female respondents preferred to use the train while most of the male commuters preferred using bus as the mode of transport. In regard to the findings we would like to make the following recommendations to NSW government;
Future research should be broad enough to even understand the motivation behind the preference for the various mode of transports. This would help the management and the government to fully understand the needs and the desires of the people.
References
Bagdonavicius, V., & Nikulin, M. S. (2011). Chi-squared goodness-of-fit test for right censored data. The International Journal of Applied Mathematics and Statistics, 30–50.
Ryabko, B. Y., Stognienko, V. S., & Shokin, Y. I. (2004). A new test for randomness and its application to some cryptographic problems. Journal of Statistical Planning and Inference, 123, 365–376. doi:10.1016/s0378-3758(03)00149-6
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download