Business intelligence and Data visualization are the most important scenarios in today’s world of businesses. Data visualization consists of the different techniques for exploration of the data by using different tools and techniques of statistical analysis. Here, we have to analyse the data set related to the fund collection for different types of projects. For the analysis of this fund data set, we have to use different statistical tools and techniques. After analysis of this fund data set, we have to find the facts that would be helpful for obtain money via crowd funding to fund a creative project. We will get advices for success of project and we will get general idea about the crowd funding for creation of different projects. This data analysis work will be useful for the people who want to create similar crowd funding projects. In terms of project succeeds, we have to find out the most important and significant attributes for crowd funding data set. We will use the software’s like BigML and SPSS for the analysis of the given data sets. For the statistical analysis of the given data set, we will use basic descriptive statistics, graphical analysis, and inferential statistical analysis by using SPSS and other software’s. Let us see this research study in detail.
For this research study, we have to analyse the crowd funding data set by using different statistical software’s. Data for this research study is downloaded from the blackboard. The data set PleaseFundThis.xlsx have many variables such as project name, date launched, duration days, goal $, percent raised, project state, amount pledged $, major category, minor category, etc. The list of all variables with scale of variables is summarised as below:
No. |
Variable |
Scale |
1 |
project_name |
Nominal |
2 |
date_launched |
Nominal |
3 |
duration_days |
Ratio |
4 |
goal_$ |
Ratio |
5 |
percent_raised |
Ratio |
6 |
project_state |
Nominal |
7 |
amt_pledged_$ |
Ratio |
8 |
major_category |
Nominal |
9 |
minor_category |
Nominal |
10 |
project_updated_count |
Ratio |
11 |
city |
Nominal |
12 |
region |
Nominal |
13 |
number_of_pledgers |
Ratio |
14 |
comments_count |
Ratio |
15 |
avg_amt$_per_pledger |
Ratio |
16 |
project_has_video |
Nominal |
17 |
project_has_facebook_page |
Nominal |
18 |
facebook_friends_count |
Ratio |
19 |
project_has_pledge_rewards |
Nominal |
20 |
lowest_pledge_level_$ |
Ratio |
21 |
highest_pledge_level_$ |
Ratio |
22 |
total_count_of_pledge_levels |
Ratio |
23 |
success |
Nominal |
We have to use descriptive statistical analysis and inferential statistical techniques for the analysis of above listed variables.
In this section, we have to see the graphical analysis of the variables included in the crowd funding data set. First of all we have to see some histograms for the variables included in the given data set. Required histograms are given as below:
From above histogram for the variable duration of project in days, it is observed that duration for most of the project is given as 30 days. Also, this histogram indicated that the median duration of the different projects is 30 days. So, it is recommended that the duration of the new project should be 30 days or near to 30 days. A 30 day project duration is most popular project duration for any project and most people preferred this time period for short as well as long projects. Given data is collected from all over the world for the different projects including short films, dramas, etc. and therefore this finding would be applicable all over the world. Also, it is observed that maximum duration taken by the projects is not more than 60 days. So peoples are abandoned to complete their projects within one or two months for succeed in their fund collection.
Now, we have to see the histogram for the variable project update count. Required histogram is given below.
From this histogram, it is observed that the frequency for the less project update count is more and as the project update counts are increasing the frequency is decreasing. This variable is right skewed in nature. From this histogram, it is revealed that the project update count would be minimize for getting highest frequency.
In this section, we have to see statistical analysis of the given data set for crowd funding. First of all we have to see some frequency distributions for the variables which are categorical in nature. The frequency distribution will provide us the general idea about the distribution of different categories under the given variables. The frequency distribution for the variable major category of the project is given as below:
Tally for Discrete Variables: major_category
major_category Count
Art 2577
Comics 886
Dance 378
Design 1475
Fashion 1265
Film & Video 5967
Food 1334
Games 2091
Music 6160
Photography 775
Publishing 3672
Technology 705
Theater 1162
N= 28447
From above frequency distribution for the variable major category for project, it is observed that the major categories used by the people for their projects are music, film and video, publishing, and arts. So, it is better to select the new project under these categories for getting more success.
Now, we have to see the frequency distribution for the variable minor category. Required frequency distribution is given as below:
Tally for Discrete Variables: minor_category
minor_category Count
Animation 268
Art 565
Art Book 272
Board & Card Games 294
Children’s Book 651
Classical Music 305
Comics 886
Conceptual Art 103
Country & Folk 589
Crafts 242
Dance 378
Design 195
Digital Art 80
Documentary 1634
Electronic Music 180
Fashion 1265
Fiction 1022
Film & Video 1135
Food 1334
Games 266
Graphic Design 166
Hardware 229
Hip-Hop 353
Illustration 154
Indie Rock 813
Jazz 261
Journalism 149
Mixed Media 290
Music 1885
Narrative Film 754
Nonfiction 887
Open Hardware 44
Open Software 115
Painting 272
Performance Art 296
Periodical 169
Photography 775
Poetry 134
Pop 462
Product Design 1114
Public Art 381
Publishing 388
Rock 1075
Sculpture 194
Short Film 1465
Tabletop Games 555
Technology 317
Theater 1162
Video Games 976
Webseries 711
World Music 237
N= 28447
Some more frequency distributions for the categorical variables included in the given data set are summarised as below:
project_has_video Count project_has_facebook_page Count
FALSE 4440 No 7969
TRUE 24007 Yes 20478
N= 28447 N= 28447
project_has_pledge_rewards Count project_success Count
Yes 28447 FALSE 14368
N= 28447 TRUE 14079
N= 28447
It is observed that about 4440 project don’t have video, while 24007 projects have video. From the given statistical analysis it is also revealed that about 7969 projects don’t have their own facebook page while 20478 projects have facebook page. So, it is important to create facebook page for our project for getting more success. So, it is recommended to create profiles on different social media sites for getting contacted with people. It is seen that all projects has pledge rewards. From the data analysis it is observed that about 14368 projects are categorized as failed, while about 14079 projects are categorized as success.
Now, we have to see some descriptive statistics for the variables included in the crowd funding data set. First of all we have to see the descriptive statistics for the variable duration in days. Required descriptive statistics for this variable is given as below:
Variable N Mean Median TrMean StDev SE Mean
duration 28447 32.750 30.000 32.383 10.980 0.065
Variable Minimum Maximum Q1 Q3
duration 1.000 60.000 30.000 35.000
Average number of days for completion of projects is given as 32.75 days with the standard deviation of 10.98 days.
Descriptive statistics for the variable goal amount in $ is given as below:
Descriptive Statistics: goal_$
Variable N Mean Median TrMean StDev SE Mean
goal_$ 28447 20575 5000 9186 241016 1429
Variable Minimum Maximum Q1 Q3
goal_$ 1 21474836 2000 12000
Some more descriptive statistics for the variables included in the given data set are summarised below:
Variable N Mean Median TrMean StDev SE Mean
percent_ 28447 121 73 68 1758 10
Variable Minimum Maximum Q1 Q3
percent_ 0 240716 5 113
Descriptive Statistics: amt_pledged_$
Variable N Mean Median TrMean StDev SE Mean
amt_pled 28447 10196 1710 3999 91367 542
Variable Minimum Maximum Q1 Q3
amt_pled 0 8596475 290 5675
Descriptive Statistics: project_update_count
Variable N Mean Median TrMean StDev SE Mean
project_ 28447 3.219 1.000 2.467 5.228 0.031
Variable Minimum Maximum Q1 Q3
project_ 0.000 147.000 0.000 4.000
Variable N Mean Median TrMean StDev SE Mean
number_o 28447 133.2 28.0 53.6 1124.9 6.7
Variable Minimum Maximum Q1 Q3
number_o 0.0 91584.0 6.0 80.0
Descriptive Statistics: comments_count
Variable N Mean Median TrMean StDev SE Mean
comments 28447 30.3 0.0 2.1 740.1 4.4
Variable Minimum Maximum Q1 Q3
comments 0.0 59463.0 0.0 3.0
Descriptive Statistics: facebook_friends_count
Variable N N* Mean Median TrMean StDev
facebook 17886 10561 479.22 221.00 354.24 777.86
Variable SE Mean Minimum Maximum Q1 Q3
facebook 5.82 0.00 5358.00 0.00 596.00
Descriptive Statistics: total_count_of_pledge_levels
Variable N Mean Median TrMean StDev SE Mean
total_co 28447 9.2036 8.0000 8.7629 5.2298 0.0310
Variable Minimum Maximum Q1 Q3
total_co 1.0000 31.0000 6.0000 11.0000
Now, we have to see some inferential statistics for checking some claims about the variables involved in the given data set. First of all we have to check the claim whether the average goal amount in $ same for different duration period in days or not. For checking this hypothesis or claim we have to use one way analysis of variance or one way ANOVA F test. The null and alternative hypotheses for this one way ANOVA F test are summarised as below:
Null hypothesis: H0: There is no any statistically significant difference exists between the average goal amounts for the different duration periods in days.
Alternative hypothesis: Ha: There is a statistically significant difference exists between the average goal amounts for the different duration periods in days.
We consider 5% level of significance for this test. Required ANOVA table for this test is given as below:
One-way ANOVA: goal_$ versus duration_days
Analysis of Variance for goal_$
Source DF SS MS F P
duration 59 6.836E+12 1.159E+11 2.00 0.000
The p-value for this ANOVA test is given as 0.00 < alpha value 0.05, so we reject the null hypothesis that There is no any statistically significant difference exists between the average goal amounts for the different duration periods in days.
There is sufficient evidence to conclude that there is a statistically significant difference exists between the average goal amounts for the different duration periods in days.
Now, we have to test one more claim or hypothesis whether the average number of count of pledges for failed projects and succeed projects are same or not. For checking this hypothesis we have to use the two sample t test for the population means. The null and alternative hypotheses for this test are summarised as below:
Null hypothesis: H0: There average number of count of pledges for failed projects and succeed projects are same.
Alternative hypothesis: Ha: The average number of count of pledges for failed projects and succeed projects are not same.
We consider 5% level of significance for this test.
Output for this test is given as below:
Two-Sample T-Test and CI: total_count_of_pledge_levels, project_success
Two-sample T for total_count_of_pledge_levels
project_ N Mean StDev SE Mean
FALSE 14368 8.37 4.76 0.040
TRUE 14079 10.06 5.54 0.047
Difference = mu (FALSE) – mu (TRUE )
Estimate for difference: -1.6859
95% CI for difference: (-1.8060, -1.5657)
T-Test of difference = 0 (vs not =): T-Value = -27.50 P-Value = 0.000 DF = 27652
The p-value for this test is given as 0.00 which is less than alpha value 0.05, so we reject the null hypothesis that there average number of count of pledges for failed projects and succeed projects are same.
There is sufficient evidence to conclude that the average number of count of pledges for failed projects and succeed projects is not same.
From the analysis of the given data set we find out so many facts regarding different variables. Some important results from this data analysis are summarised as below:
Benefits of Statistical Data analysis using different Software’s
As we know that peoples and organizations uses the excel spreadsheets for maintaining their data. Great Eastern University is a very big university located in Melbourne Australia also uses the excel spreadsheets for maintaining their data of 20000 current students and millions of past students. The Great Eastern University should used different statistical and analytics tools such as Power Pivot, Power BI, Tableau, BigML, SPSS, Geospatial tools, Google Analytics, Minitab, Matlab, SAS, R, IBM Watson, etc. These softwares provide much more reliability with statistical data analysis. Excel do not provide advanced analysis and it needs add on or extensions for advanced work. Excel spreadsheets do not provide suitable outputs or tables, but other statistical softwares provides very excellent outputs with proper tables. Excel spreadsheets unable to perform advanced statistical tests and most of the time we need to use manual commands for completion of analysis. We know that BigML, SPSS, etc. are premium software products that are used for a wide variety of statistical analysis. This analysis includes the data compilation, preparation, graphics, modelling and analysis. These statistical software products play an important role in the market research, surveying, healthcare and social sciences. If your business or organization is using Microsoft excel spreadsheet for market research or any other type of business related research, and then you would consider using SPSS instead. AS compared to excel spreadsheet, other statistical software products have an easier and quicker access to basic functions such as descriptive statistics in pull down menus. These software products consist of wide range of charts and graphs to choose from and also there is faster access to statistical tests. These statistical software products made machine learning easy and comfortable.
Advantages of using advanced data analytics tool at Great Eastern University
If we use the advanced statistical software products such as Power Pivot, Power BI, Tableau, BigML, SPSS, Geospatial tools, Google Analytics, Minitab, Matlab, SAS, R, IBM Watson, etc., there are so many benefits. If we use these software products in the Great Eastern University, then there would be so many benefits. The Great Eastern University will be save their time and cost by using these products. Also, they would represent the results in a proper and attractive way. Data analysis work will become more reliable and easy. Different tables for the analytical study would be easily available. Data keeping and handing would be easier as compared to spreadsheets. By using these software products, University will represent all types of information in a click.
By using the above discussed statistical software products; the university could be able to increasing the number of students. Also, university can analyse the results from different social media, analytic tools, and Geospatial tools for improving the student experience for all students. The university will understand the different facts after all types of data analysis related to the student. University may take decisions by using the results from these data analytics. So, these types of analytics work will help in increasing the student retention at university.
We know that, Ken Rudin, the Director of Analytics at Facebook, mentioned that organizations must “focus on impacts, not insights”. This statement explains the importance of focusing on impact rather than insights. During the data analytics work, it is necessary to focus on the impacts of the different factors or variables included in the data analytics and there is no need to focus on insights regarding different treatments, factors, variables, etc.
Implementation of Ken’s suggestion at Great Eastern University
According the Ken Rudin, organizations must focus on the impacts and not insights. For implementation of Ken’s suggestion at Great Eastern University, it is required to use the advanced statistical software products for the data analysis and management team or administration should be focus on the impacts of this analysis and more discussion other than the results obtained from this analysis should be avoided.
References
Antony, J. (2003). Design of Experiments for Engineers and Scientists. Butterworth Limited.
Babbie, E. R. (2009). The Practice of Social Research. Wadsworth.
Beran, R. (2000). React scatterplot smoothers: Superefficiency through basis economy. Journal of the American Statistical Association.
Bickel, P. J. and Doksum, K. A. (2000). Mathematical Statistics: Basic Ideas and Selected Topics, Vol I. Prentice Hall.
Casella, G. and Berger, R. L. (2002). Statistical Inference. Duxbury Press.
Cox, D. R. and Hinkley, D. V. (2000). Theoretical Statistics. Chapman and Hall Ltd.
Degroot, M. and Schervish, M. (2002). Probability and Statistics. Addison – Wesley.
Dobson, A. J. (2001). An introduction to generalized linear models. Chapman and Hall Ltd.
Evans, M. (2004). Probability and Statistics: The Science of Uncertainty. Freeman and Company.
Hastle, T., Tibshirani, R. and Friedman, J. H. (2001). The elements of statistical learning: data mining, inference, and prediction: with 200 full-color illustrations. Springer – Verlag Inc.
Hogg, R., Craig, A., and McKean, J. (2004). An Introduction to Mathematical Statistics. Prentice Hall.
Liese, F. and Miescke, K. (2008). Statistical Decision Theory: Estimation, Testing, and Selection. Springer.
Pearl, J. (2000). Casuality: models, reasoning, and inference. Cambridge University Press.
Ross, S. (2014). Introduction to Probability and Statistics for Engineers and Scientists. London: Academic Press.
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download