Many Holmes Institute instructors believe that students need to spend at least 2 hours studying outside of class for every hour of lecture. They believe that the number of hours students study to prepare for the exam affect students’ marks significantly. As opposed, few of the lecturers believe that the number of preparation hours do not essentially affect students’ marks while some other factors are to be considered. To study the relationship between the preparation time spent by each student (in hours) for the exam and the reported mark, a sample of 100 students were selected randomly from a large statistics class. The data are stored in the file named “ASSIGNMENTDATA” in the course website. Answer below 9 questions:
Cross-sectional survey; this is where the researcher collects data from the respondents at a single period in time uses the cross-sectional type of survey.
Simple random sampling could be used. This method would give the participants an equal chance of being included into the study and as such will reduce the chances of bias.
The dependent variable is the student’s marks while the independent variable is the number of hours students study to prepare for the exam. This is because number of hours students study to prepare for the exam is believed to influence the students marks hence it is the independent variable while the student marks is the dependent variable.
Using 8 classes and intervals of 20 – 30, 30 – 40, etc for both of the variables selected in question 3, develop a distribution tableincluding class intervals, frequency, relative frequency and cumulative relative frequency for each variable. Then, draw frequency histogram, relative frequency histogram and cumulative relative frequency histogram for each variable. Also, Comment on the shape of frequency histogram for each variable and provide reason(s) for your comment.
Class Interval |
Frequency |
Relative Frequency |
Cumulative relative frequency |
20-30 |
1 |
0.01 |
0.01 |
30-40 |
8 |
0.08 |
0.09 |
40-50 |
16 |
0.16 |
0.25 |
50-60 |
20 |
0.2 |
0.45 |
60-70 |
20 |
0.2 |
0.65 |
70-80 |
17 |
0.17 |
0.82 |
80-90 |
12 |
0.12 |
0.94 |
90-100 |
6 |
0.06 |
1 |
Class Interval |
Frequency |
Relative Frequency |
Cumulative relative frequency |
20-30 |
1 |
0.01 |
0.01 |
30-40 |
5 |
0.05 |
0.06 |
40-50 |
10 |
0.1 |
0.16 |
50-60 |
17 |
0.17 |
0.33 |
60-70 |
21 |
0.21 |
0.54 |
70-80 |
22 |
0.22 |
0.76 |
80-90 |
14 |
0.14 |
0.9 |
90-100 |
10 |
0.1 |
1 |
In the next three figures, we present the frequency histogram, the relative frequency histogram and the cumulative relative frequency histogram for the preparation time. The histogram help to visualize the distribution of the data.
Figure 1: Frequency Histogram for the preparation time
Figure 2: Relative Frequency Histogram for the preparation time
Figure 3: Cumulative Relative Frequency Histogram for the preparation time
The histogram (both frequency and relative frequency) of the preparation time shows that the distribution is left skewed (has longer tail to the left).
The next three figures below presents the frequency histogram, the relative frequency histogram and the cumulative relative frequency histogram for the student marks.
Figure 4: Frequency Histogram for the student marks
Figure 5: Relative Frequency Histogram for the student marks
Figure 6: Cumulative Relative Frequency Histogram for the student marks
The histogram for the student’s marks shows that the distribution is skewed to the left (longer tail to the left).
Draw and use an appropriate scatter plot to investigate the relationship between the two variables. Also, briefly explain the selection of each variable on the X and Y axes and the reason? Finally, draw the fitting line for the plotted observations.
Figure 7: A scatter plot of student’s marks against preparation time (number of hours)
As can be seen from the above plot, the X-axis is the preparation time while the Y-axis is the student’s marks. The X-axis is the independent variable hence the reason as to why preparation time was chosen for the x-axis while the Y-axis is the dependent variable hance the reason as to why student’s marks was chosen as the y-axis.
The above scatter plot shows evidence that there exists a positive linear relationship between the two variables (preparation time and student marks). This means that an increase in the number of hours spent by students to prepare for exam would result to an increase in the marks obtained by the student in that particular exam. Similarly, the it can also be inferred that a unit decrease in the number of hours spent by students to prepare for exam would result to a subsequent decrease in the marks obtained by the student in that particular exam.
The coefficient of the preparation time is 28.984; this means that a unit increase in the independent variable (preparation time) would result to an increase in the dependent variable (student’s marks) by 28.984. It also means that a unit decrease in the independent variable (preparation time) would result to a decrease in the dependent variable (student’s marks) by 28.984.
Table 3: Descriptive (summary) statistics for the preparation time and student marks
PREPARATION TIME |
MARK |
|
Mean |
63.04 |
65.74 |
Median |
64 |
68 |
Standard Deviation |
16.32 |
17.41 |
Sample Variance |
266.36 |
303.12 |
Range |
65 |
75 |
Minimum |
25 |
25 |
Maximum |
90 |
100 |
1st Quartile |
51 |
54 |
3rd Quartile |
76.25 |
78 |
Interquartile range |
25.25 |
24 |
30th percentile |
54 |
58 |
Table 3 above presents the descriptive statistics for both the preparation time and the student marks. As can be seen, the average preparation time for the 100 sampled students was found to be 63.04 hours with the median time being 64 hours. The lowest amount of time taken by student to prepare for the exam was 25 hours while the highest amount of time taken was found to be 90 hours. The standard deviation was 16.32 implying that the data is not widely spread out.
On the other hand, the average student marks was 65 with the highest score being 100 and the lowest score recorded being 25. The median marks scored by the students was 68. Again the standard deviation showed that the student marks are not widely spread out from the mean (SD = 17.41).
Compute a numerical measurement which measures the strength and direction of the linear relationship between the two variables. Also, interpret this value.
Table 4: Correlation coefficient table
PREPARATION TIME |
MARK |
|
PREPARATION TIME |
1 |
|
MARK |
0.546556 |
1 |
As can be seen from the above table, there is a moderate positive relationship between the two variables (preparation time and student’s marks). The correlation coefficient is 0.5466. The fact that the correlation coefficient is positive means that an increase in the number of hours spent by students to prepare for exam would result to an increase in the marks obtained by the student in that particular exam. Similarly, the it can also be inferred that a unit decrease in the number of hours spent by students to prepare for exam would result to a subsequent decrease in the marks obtained by the student in that particular exam.
To determine whether or not the height of sons is related to father’s height (x1) and mother’s height (x2), data were gathered and part of the multiple regression excel output is shown below. Fill the table and answer the following questions.
The missing values in the table have been filled in red colour.
SUMMARY OUTPUT
Regression Statistics |
|||||
Multiple R |
0.5169 |
||||
R Square |
0.2672 |
||||
Adjusted R Square |
0.2635 |
||||
Standard Error |
8.0683 |
||||
Observations |
400 |
||||
ANOVA |
|||||
df |
SS |
MS |
F |
Significance F |
|
Regression |
2 |
9421.58 |
4710.79 |
72.366 |
0.0000 |
Residual |
397 |
25843.41 |
65.097 |
||
Total |
399 |
35264.98 |
|||
Coefficients |
Standard Error |
t Stat |
P-value |
||
Intercept |
93.8993 |
8.0072 |
11.7269 |
0.0000 |
|
X1 |
0.4849 |
0.0412 |
11.7772 |
0.0000 |
|
X2 |
-0.0229 |
0.0395 |
-0.5811 |
0.5615 |
The standard error of the estimate is 8.0683. The statistics tells us how accurate the predictions are made from the regression line. And since this value is small enough, it clearly shows that the model is accurate in predicting the height of the son based on the father’s height (x1) and the mother’s height (x2).
The coefficient of determination is 0.2672; this statistic tells u that 26.72% of the variation in the dependent variable (height of son) is explained by the two independent variables (father’s height (x1) and mother’s height (x2)).
The adjusted coefficient of determination tells how great an additional variable predicts the dependent variable. This statistic (adjusted coefficient of determination for degree of freedom) and the coefficient of determination tells on the proportion of variation in the dependent variable is explained by the independent variables. The larger the values of these two statistics the better the model (the better the model fits the data).
As can be seen from the ANOVA table, the overall model is statistically significant at 5% level of significance [F(2, 399) = 72.366, p = 0.000].
The coefficient of father’s height (x1) is 0.4849; this means that a unit increase in the father’s height would result to an increase in the height of the son by 0.4849.
The coefficient of mother’s height (x2) is -0.0229; this means that a unit increase in the mother’s height would result to a decrease in the height of the son by 0.0229.
The intercept coefficient is given as 93.8993; this implies that holding all the other factors constant (zero values for the father’s height as well as the mother’s height) we would expect the height of the son to be 98.8993.
Yes the data allow the statistic practitioner to infer that the heights of the sons and the fathers are linearly related. This is based on the fact that the father’s height (x1) was found to be significant in the model (p = 0.0000).
No the data does not allow the statistic practitioner to infer that the heights of the sons and the mothers are linearly related. This is based on the fact that the mother’s height (x2) was found to be insignificant in the model (p = 0.5615).
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download