Statistical analysis is an important component for business intelligence. Statistical analysis assists in scrutinizing every data sample in data sets from which samples are drawn (Chen et al., 2012, p.16). The process of statistical analysis entail description of the nature of data that is chosen for analysis, investigating the relationships of data in the primary population, creation of models to assist in summarizing the comprehension of how data relates to the primary population, proving the data validity and employing analytics that is predictive to run the different states which will assist in upcoming events.
A data set is a collection of discrete items that contain related data which can be accessed exclusively or in combination or managed as a complete unit (Johnson &Wichen, 2014, p.5). Datasets are organized into some type of data structure such as a collection of business data.
On the other hand, a variable is a number, quantity or a characteristic which either increase or decreases over time (Johnson & Wichen, 2014, p.5). Moreover, it takes different values in different situations.
How to summarize a variable and the relationship between
Based on its nature, a variable can be summarized in various ways. A variable can either be discrete or continuous. Discrete variables can be summarized using frequency distribution tables and graphical summaries such as bar charts, histograms. On the other hand, continuous variables can be summarized using descriptive statistics which entail the means, standard deviations, variances, standard errors, range, mode, and median among others.
Consideration of two variables entails the nature of the variable. That is, whether the variable is categorical or quantitative. When comparing the relationship between two variables, which are categorical in nature, the analysis is made on the relationship during an assessment of conditional probabilities. Furthermore a graphical representation is made of the data using contingency tables. Such categorical variables can include class standings and gender. When both variables are quantitative, the analysis is made on how one of the variables, the response variable, changes with respect to the changes of the other variable, the explanatory variable. To show the relationship graphically, scatter plots are used. The last scenario is when one of the variables is categorical while the other is quantitative. A good example is between gender and height. In this scenario, the comparison is best made using side-by-side box plots which show whichever similarities or differences in the center and the changeability of the variable which is quantitative across the categories.
why is important to be able to find patterns in a dataset using a computer
Finding patterns in a dataset is very vital in numerous ways. For starters, the pattern can be used to find inherent regularities in a data set. On the other hand, the pattern can be used as a foundation for various essential tasks. Such tasks include correlation, association, causality analysis, mining sequential, pattern analysis in stream and time series data, structural patterns, for categorization through discriminative analysis that is based on patterns, and cluster analysis through subspace clustering that is based on patterns. As a result, pattern finding in a dataset using computers has a broad application in cross-marketing, catalog design, market basket analysis, web log analysis, sale campaign analysis, and biological sequence analysis.
From the figure 1 above, it can be seen that the base selling price of size 10 cars is $19,253.For every increase in distance traveled, the selling price of the size 10 cars decreased with 0.158 units all other factors held constant. Thus, there is a negative relationship between the selling price and the distance traveled.
Predicted selling price = -0.1585*30,000+19,253 = $14,498
Thus, when a car travels 30,000 km, the selling price of the car will be $14,498.
The average of all the 10,000 estimates is 14000 with standard deviation 392.
So the z-score for sample 1 estimate is (14,498-14000)/392= 1.265
Using wolframalpha.com P(Z<1.265)=0.8972
So if you compare sample 107 to the 10,000 samples then
Predicted rank = P(Z<z-score)*10000=0.8972*10,000=8,972
Thus, the estimate would rank at 8,972 out of 10,000.
Which sample? |
107 |
||
Count of Do they like it? (y=yes, n=no) |
Column Labels |
||
Row Labels |
N |
y |
Grand Total |
A |
11 |
67 |
78 |
B |
24 |
90 |
114 |
Grand Total |
35 |
157 |
192 |
Which sample? |
107 |
||
Count of Do they like it? (y=yes, n=no) |
Column Labels |
||
Row Labels |
N |
y |
Grand Total |
A |
14.10% |
85.90% |
100.00% |
B |
30.77% |
78.95% |
100.00% |
Grand Total |
18.23% |
81.77% |
100.00% |
So the z-score for that estimate in sample 107 is (0.0695 – 0.1) / 0.0505 = – 0.60
P (Z < z-score) = P( Z < – 0.60) = 0.274
H0: There is no difference between the proportions, p1 = p2
H1: There is a difference between the proportions, p1 ≠p2
Which sample? |
107 |
||
|
Values |
|
|
Row Labels |
Count of which machine? (A or B) |
Average of $ Casino profit from bet |
StdDev of $ Casino profit from bet |
A |
105 |
0.142857143 |
4.539206495 |
B |
95 |
-0.010526316 |
1.425413242 |
Grand Total |
200 |
0.07 |
3.425458925 |
From the table 2 above, it can be seen that machine A had an average profit of 0.14 while machine B made a loss averaging 0.01. On the other hand, the profits from machine A had a large spread of 4.54 while machine B had a lower spread of 1.42 compared to machine A.
So for sample 107 the estimate of the difference in the population means is the difference in the sample means;= 0.143 – – 0.011 = 0.154
H1: There is a difference in the sample proportions p1 ≠p2
Description of each variable
The figure above is a back to back histogram showing the frequency of terms used by students and the administration. The x-axis is a dummy variable that shows whether the variable is a student or n administrator. The x-axis answers the question, “is the participant a student or in administration?” Thus, it is categorical n nature.
Relationship between the variables
The y-axis is the frequency of the terms that are used by either the administration or the students. It answers the question, “How often do you use the following term?” Thus, it is quantitative in nature.
Based on the bars on the right side of the back to back histogram, one can see a curved pattern. On the left side, a jagged pattern is observed. Thus, it is evident that the terms used by the students and the administration staff do not quite match in frequency. However, long bars are at the bottom while the short bars are t the top thereby implying that the discrepancies are not very big.
Would a business be able to use back to back histogram?
No. A business would not use a back to back histogram. Though back to back histograms are visually strong and can compare to normal curves, they cannot read exact values of variables since the data is grouped into categories. As a result, it will be difficult to compare two data sets.
Section 6
Sample |
107 |
Row Labels |
Count of do you support proposed change? |
No |
80 |
Yes |
112 |
Grand Total |
192 |
So the z-score of the sample 700 estimate is = (0.5833 – 0.6) / 0.0357 = – 0.468
Reference:
Chen, H., Chiang, R.H. and Storey, V.C., 2012. Business intelligence and analytics: From big data to big impact. MIS quarterly, 36(4).
Johnson, R.A. and Wichern, D.W., 2014. Applied multivariate statistical analysis (Vol. 4). New Jersey: Prentice-Hall.
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download