Analyzing NSW Transport Data: A Report On Public Transport Preferences And Recommendations For NSW Government

Dataset 1

The aim of this assignment is to test skills of collecting and analyzing data to answer a specific business problem. The assignment also seeks to present an opportunity to apply the theories learnt during the course such as finding numerical summaries, displaying with appropriate graphs and using statistical inferences to solve business problems, including constructing hypotheses, test them and interpreting the findings (Ryabko, Stognienko, & Shokin, 2004).

We are presented with data for NSW transport system in order to come up with decision based recommendations that aims at improving public transport system. The project presents a series of research questions which need to be answered based on the knowledge gained in the course of the study.

Dataset 1:

The first dataset (dataset 1) is a secondary data provided by NSW transport system. The data has a total of 1000 observations with six variables. The description of the variables is given below;

Table 1: Description of the variables

Variable Save Time On Research and Writing Hire a Pro to Write You a 100% Plagiarism-Free Paper. Get My Paper	Description	Values	Variable Type
mode	Type of the public transport	Bus, Train, Ferry and Light Rail	Nominal Variable (qualitative)
date	Date of the tap on/off held	Date/month/year	Nominal Variable (qualitative)
tap	It is a tap on or off	On and Off	Nominal Variable (qualitative)
loc	Locations of stops. For bus postcodes and others name of the stations	Postcodes and names of the stations	Nominal Variable (qualitative)
count	Total number tap on or off on the certain location and the certain date	Number	Scale variable (quantitative)

The possible cases used in this study are 1000 cases (number of observations).

Dataset 2:

The second dataset (dataset 2) is a primary data provided that was collected by the researcher. A random sample of 50 individuals was selected and the persons interviewed in regard to their gender, age and the mode of transport they prefer to use most. The data has a total of 50 observations with three variables. Give a description of cases you consider for this data set.

For the dataset 2, a random sampling was employed to collect the data from individuals so as to understand the mode of transport they frequently use. This is a primary data since the data is collected directly from the subjects. The limitation of this data is the fact that only a small sample size of 50 cases was selected. The description of the variables is given below;

Table 2: Description of the variables

Variable	Description	Values	Variable Type
Mode	Type of the public transport	Bus, Train, Ferry and Light Rail	Nominal Variable (qualitative)
Age	Date of the tap on/off held	Number	Scale variable (quantitative)
Gender	Gender of the respondent	Male and female	Nominal Variable (qualitative)

Section 2: Analysis of single variable in Dataset 1

In this section, we attempt to answer the research questions posed. To answer the research questions, we use dataset 1.

Which type of public transport was most used by the NSW people during 8^thto 14^th of August 2016?

To answer this research question, we ran a frequency distribution test. Table 1 below gives the results.

Table 3: Frequency table for the mode of transport used

Row Labels	Count of mode	Percent
Bus	467	46.7%
Ferry	25	2.5%
Light-rail	24	2.4%
Train	484	48.4%
Grand Total	1000	100.0%

As can be seen, the top most used modes were use of bus and train. Train however came out as the most frequently used with 48.4% (n = 484) of the participants having used it in the last 1 week. The second most commonly used mode was the bus with 46.7% (n = 467) having used it in the last one week. Ferry and Light-rail were among the least used with only 2.4% (n = 24) having used light-rail in the last one week and 2.5% (n = 25) said to have used ferry in the last one week.

Dataset 2

Figure 1: Bar chart on mode of transport used

Now to answer research question “whether the proportion of those using train is greater than 50%, the setup for an appropriate hypotheses is given below.

To answer the given research question, the following hypothesis was tested.

H₀: The proportion of transport users who use train is not significantly different from 50%.

H_A: The proportion of transport users who use train is significantly different from 50%.

To test this, a One-Sample t-test was used and it was tested at 5% level of significance. The results are given below;

Table 4: One-Sample Statistics

	N	Mean	Std. Deviation	Std. Error Mean
Train	1000	.4840	.49999	.01581

Table 5: One-Sample Test

	Test Value = 0.5
t	df	Sig. (2-tailed)	Mean Difference	95% Confidence Interval of the Difference
Lower	Upper
Train	-1.012	999	.312	-.01600	-.0470	.0150

A one-sample t-test was run to determine whether the proportion of NSW transport users who rely on train as the mode of transport is more than 50%. The proportion of those who used train transport (0.484 ± 0.5) was not significantly different from 50% (95% CI, -0.05 to 0.02), t(999) = -1.012, p = .312.

Section 3: Analysis of two variables in Dataset 1

NSW Government need to decide on whether they have to build an underground Railway line from either Parramatta, Bankstown or Gosford to central. To prepare a recommendation for this;

Give a numerical summary and an appropriate graphical display for the variables location, by only considering those three stations; and the variablecount by considering the data with trains only.

In this section we first consider the number times the train left the three mentioned locations. This information is given in the table below;

Table 6: Frequency of train from the three locations

	Count	Percent
Parramatta Station	7	53.8%
Gosford Station	2	15.4%
Bankstown Station	4	30.8%

Figure 2: Bar chart for the count of times the train leaves the stations

Considering the data with trains only, it was established that the average number of counts was 103.38 with the standard deviation of the counts being 226.14

Table 7: Descriptive statistics for the variable count

count

Mean	103.379
Standard Error	7.151282
Median	53
Mode	18
Standard Deviation	226.1434
Sample Variance	51140.84
Kurtosis	238.9731
Skewness	13.04214
Range	4955
Minimum	18
Maximum	4973
Sum	103379
Count	1000

The mode of counts was found to be 18 with the median count being 53. The skewness value indicated that the data is highly and heavily skewed. This is evident from the fact that the minimum count was 18 while the maximum count was 4973. This presents a very huge range which suggests a probable presence of outliers in the dataset hence bringing about the skewness observed.

The histogram presented below further shows that the data is skewed. The shape of the histogram indicates that the data is skewed to the right (longer tail to the right).

Figure 3: Histogram of the variable count

Perform a suitable hypothesis test at a 5% level of significance to test whether there is difference between mean counts of taps on and off.

To answer this, the following the hypothesis was tested at 5% level of significance.

H₀: There is no significant difference in the mean counts of taps on and taps off

H_A: There is significant difference in the mean counts of taps on and taps off.

To test this, an independent samples t-test was used. The results are given below;

Analysis of Single Variable in Dataset 1

Table 8: Group Statistics

	Tap	N	Mean	Std. Deviation	Std. Error Mean
count	On	481	106.65	269.081	12.269
Off	519	100.35	177.530	7.793

Table 9: Independent Samples Test

	Levene’s Test for Equality of Variances	t-test for Equality of Means
F	Sig.	t	df	Sig. (2-tailed)	Mean Difference	Std. Error Difference	95% Confidence Interval of the Difference
Lower	Upper
count	Equal variances assumed	.083	.774	.440	998	.660	6.296	14.319	-21.802	34.394
Equal variances not assumed			.433	821.5	.665	6.296	14.535	-22.233	34.825

We performed an independent t-test was in order to compare the average number of counts for the taps on and the taps off. Results showed that the average number of counts for the taps on (M = 106.65, SD = 269.08, N = 481) did not significantly differ with the average number of counts for the taps off (M = 100.35, SD = 177.53, N = 519), t (998) = 0.440, p > .05, two-tailed. The mean difference of 6.30 observed was insignificant at 5% level of significance. Essentially the results indicate that whether the taps are on or off does not really affect the number of counts.

Use the conclusion of the test in part b and the outputs in part a to write a recommendation to NSW government.

We concluded that there is no significant difference in the average number of counts for the taps off and taps on. The chosen three stations also did not show much traffic. It is therefore recommended that the government’s plan to build an underground Railway line from either Parramatta, Bankstown or Gosford to central is not as ideal as would be required.

Section 4: Collect and analysis Dataset2

You are interested in finding whether there is a difference in preference between different gender in terms of their transport mode (Bus, Train, Ferry and Light Rail). By considering appropriate number of cases and variable, give a proper graphical display and use it to write a comments.

The results for this section are presented below;

Count of Gender	Column Labels
Row Labels	Female	Male	Grand Total
Bus	16.7%	42.3%	30.0%
Ferry	20.8%	7.7%	14.0%
Light Rail	8.3%	11.5%	10.0%
Train	54.2%	38.5%	46.0%
Grand Total	100.00%	100.00%	100.00%

As can be seen, most of the male commuters (42.3%, n = 11) said to use bus while most of the female commuters (54.2%, n = 13) said to use train.

Chi-Square test

A Chi-square test was performed to determine whether there is significant association between gender and the preferred mode of transport (Bagdonavicius & Nikulin, 2011). The hypothesis tested is given below;

H₀: There is no significant association between gender and preferred mode of transport

H_A: There is significant association between gender and preferred mode of transport

This was tested at 5% level of significance and the results are given below;

Table 10: Chi-Square Tests

	Value	df	Asymp. Sig. (2-sided)
Pearson Chi-Square	5.072^a	3	.167
Likelihood Ratio	5.239	3	.155
N of Valid Cases	50
a. 4 cells (50.0%) have expected count less than 5. The minimum expected count is 2.40.

The p-value for the test is 0.167 (a value greater than 5% level of significance), we therefore fail to reject the null hypothesis and conclude that there is no evidence that there is significant association between gender and preferred mode of transport.

Section 5: Discussion & Conclusion

The main purpose of this study was to present analysis of NSW transport system. We were provided with a secondary dataset (dataset 1) that comprised of 1000 cases with six variables. Apart from the provided secondary data on NSW transport system, we also gathered survey on 50 individuals. We sought to fight out the most commonly used mode of transport among the individuals. Results showed that the most commonly used mode of transport was train followed by bus though people used ferry and light rails, their usage was very minimal as compared to the use of bus and train. In regard to the comparison of the mode of transport in terms of the males and the females using dataset 2, we noted that majority of female respondents preferred to use the train while most of the male commuters preferred using bus as the mode of transport. In regard to the findings we would like to make the following recommendations to NSW government;

The use of train is very common among the many commuters; it would therefore prudent to improve on this particular mode of transport to make more and more effective. The building of an underground Railway line from either Parramatta, Bankstown or Gosford to central would indeed be a blessing to the commuters.

Future research should be broad enough to even understand the motivation behind the preference for the various mode of transports. This would help the management and the government to fully understand the needs and the desires of the people.

References

Bagdonavicius, V., & Nikulin, M. S. (2011). Chi-squared goodness-of-fit test for right censored data. The International Journal of Applied Mathematics and Statistics, 30–50.

Ryabko, B. Y., Stognienko, V. S., & Shokin, Y. I. (2004). A new test for randomness and its application to some cryptographic problems. Journal of Statistical Planning and Inference, 123, 365–376. doi:10.1016/s0378-3758(03)00149-6

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Analyzing NSW Transport Data: A Report On Public Transport Preferences And Recommendations For NSW Government ”

Get high-quality paper

NEW! AI matching with writer