Answer:
Big data visualization is playing a very important role in today’s world. We can visualize data based on the scenarios which we need for business or for organizations. Data is picture of thousand words.
Data Visualization is an important role to helping big data to get a pictorial view of data and values of the data. Relational data visualization integrated with applications so that the work can be done is real time.
Current data visualization role are predict to good result in case of applications and technology.
The visualization techniques behind with data generator in a myriad of disciplines is rapidly increasing, typically faster than the techniques available to manage and use the resultant data.
Fitzgerald’s use of “old sports”throughout the novel suggests that Gatsby considered Nick Carraway a close friend (2004).
Computer based visualization is changing rapidly over the time. Tools and systems which support that is typically evolved rather then being formally designed.
As per the survey the current state of research on relational data visualization we have reviewed in various fields related to data visualization, information visualization, statistics, graphic design and human computer interaction.
Data Visualization is a rapidly growing industry in the current decade to visualize the data in the form of information and graphics.
Visualization is not a part of any fixed industry, it is using in every scenarios which can be education, social media, information technology, data science, artificial intelligence, automobile, construction and communication (Agrawal, R. & Ailamaki, A. 2008). People is going to visualize and predict the data on dashboard and in form of reports.
The Data Visualization technique is changed over the time and this situation is the fact of visualization, that involves the integration of graphics, images, data management and human perception.
Relational Model : Structure is the most important ingredient in any data model. One of the major contributions is Codd’s relational model is the focus on the importance of functional dependencies (Anthes, G. 2010). In fact normalization is driven by a modelers desire for a relation where strict functional dependencies applies.
Dataset :
importing dataset into Gephi, it shows 120 Nodes and 978 Edges are presented. It means the connection is about 120 Nodes and 978 Edges, which is connected each others.
Dataset having multiple column which is in relational data format.
The whole overview of the BigMart sales dataset using Gephi data visualization tool. In this case all the relationship is based on Nodes which is connected together.
Used Tools and Techniques :
The techniques which is used in report is Gephi.
Gephi is a software for data visualization for relational database, which is based on java and Netbeen platform. The application is an open source used for network analysis and visualization.
Below graph is containing Big Mart sales relational data through nodes and branches.
Gephi is special visualization tools which directly shows linked nodes attached together and no linked nodes are far from that.
The black dot shows the highest sales market stores which is directly connected through each other. The stores is connected to each through light branches but the dark black branches is connected to highest sales stores.
Gephi has all the filter and scaling parameter which we can apply connection and degree to the nodes.
Sales View : User model is the sub-part of computer communication which describes the process of connecting relation and modifying a basic understanding of the user. This is role model to interact with users details and data activities.
Above Graph shows two types of market – supermarket 1 and supermarket 2. Supermarket 1 where sales higher than supermarket 2. The red dot represents supermarket 2 where as blue dot represents supermarket 1.
Computation Sales View : It takes the form of an algorithm, that is precise description of the steps that are carried out.
The algorithm takes set of inputs and eventually turns them into output. We can implement computation model using python, c, c++, Fortran and many more language. (Thomsen, E. 2006).
We just need to write algorithm to process the job work flow in the system.
The graph shows the visibility of the stores based on the location and sales. Here filter and parameter are applied on the dataset to get actual sales values based on the outlet or stores.
Dataset for Visualization technique :
We have to take “BigMart sales prediction” dataset. Inside we will discuss about below topics –
Hypothesis generation : It is very important step in the process of analyzing the data. It involves understanding of the questions problem and making some hypothesis test about what we need to change to get good impact on the outcome.
So we need to do hypothesis testing on the dataset which we have and find out the insights from the dataset.6
We have collected sales data for year 2013 which have 1600 products and near by 10 stores in different cities.
So the target is to build a predictive model which can find out the sales of each product at a particular store using visualization techniques.
Dataset variables which we have to define –
City type : Where is the store located ? In urban or tier 1 cities.
Population Density : Store located in density populated area because it gives higher sales for more requirements.
Store Capacity : Stores which are very big in size should have higher sales.
Competitors : Stores should have less due to more competitors in market.
Product Marketing : Stores which have good marketing devision should have good sales.
Location : Stores should be located in popular marketplace will be on higher sales because of better connectivity with customers.
Customer behaviors : Store will have a right to design the products based on the customers behaviors.
Policy : Stores should have managed with rules and policy and politeness with people will have higher sales.
Brand : Good quality product should have good sales.
Packaging : Products with good packaging can attract to customers for sale.
Utility : Routine products should have higher sales.
Display area : Products which are selling should be displayed to catch the customers attention to buy more products.
Visibility of Store : The location of the stores should be impact on higher sales.
Advertisement : Better advertisement of products will gives higher sales.
Promotional Offer : Gives the discounts on the selected products will increase sales.
Data Exploration :
In this phase we will do some data exploration to get the inferences about the data.
Data Exploration is the technique to identify predictor and target variables from the data.
(Stolte, C. & Tang, D. 2009)
Now we will invariable find features which we hypothesized. We will combine all data training and testing into one,performing feature extraction algorithm and then combine into data frame (Mansmann, S. & Scholl, M.H. 2007).
Below is the procedure which summarize the data :
training [‘data’] = ‘training’ | testing [‘data’] = ‘testing’
data_set =pd.concat([training,testing]),ignore_index = True)
The main challenges in any data set is missing values. Which can impact on the sales and target customers.
Now we will check missing values using some functions.
data.apply(lambda x:sum(x.isnull()))
In BigMart sales our target variable is Item_Outlet_Sales and Missing_Values are one in the testing set. So we’ll impute Missing_Values in Item_Weight and Outlet_Size in the data cleaning process .
Data plays an important role in every relational database system to predict the future results (Bhattacharya, I. & Getoor, L. 2006).
Now we need to check basic statics in our data. The variables which we are using in the dataset will predict the solution and output.
The below table gives and clear picture about the sales and target variables.
data.describe()
from the dataset now we can observe that :
Data Cleaning :
In this phase the missing values also imputing with data and outliers. Though the outliers removal is important in visualization techniques.
In our dataset some missing values is their so first will apply data cleaning technique, then will use machine learning model to predict insights from data and finally we can visualize the data on dashboard
Model Building :
As of now we have ready data so that we can apply some machine learning algorithm on data and get some output based on model selection. Here we will use best outfit machine learning model random forest.
So the model verified most informative features from the dataset. We can see that Item_MRP is the most insightful features which can express the data and sales.
Advantages of Data Visualization :
As per the study managers using visualization technique is companies is getting 30% more accurate and timely information (Schulz, H. – J. & Treevis 2011).
Disadvantages of Data Visualization :
From the model selection we get that the most informative features from data which can predict the higher sales is Item_MRP, Outlet_Type and Outlet_Location_Type. It means if we can make relationship on these data then we could find out the higher sales.
We could get higher sales if store is in top location of the city.
We stores is in top location then we can increase the MRP of the products which gives higher sales
11
Data management can be defined as the deliberate application of data mining techniques for the purpose of data management and improvement (Hipp & Grimmer,2001).
The higher sales will depends on the Outlet_Type.
So from the BigMart sales prediction we can analyze that higher sales in terms of revenue and features selection.
References
Agrawal, R. & Ailamaki, A. (2008). The Claremont report on database research, University of California at Berkeley. 12 May, pp. 16.
Anthes, G. (2010). Happy birthday, RDBMS. Comm. ACM. pp. 20.
Thomsen, E. (2006). Olap Solutions : Building Multidimensional Information Systems, 2nd ed. New York, USA. pp. 145.
Stolte, C. & Tang, D. (2009). Multi scale visualization using data cubes. pp. 187.
Bhattacharya, I., and Getoor, L. (2006). Mining Graph Data.
Heer,J., Card, S.K. and Landay, J.A. (2005). Prefuse : A Toolkit for Interactive Information Visualization, Human Factors in Computing Systems. pp. 421.
Mansmann, S. and Scholl, M.H. (2007). Exploring OLAP aggregates with hierarchical visualization techniques, in ACM SAC. pp. 1067.
Schulz, H. – J.and Treevis (2011). A tree visualization references, IEEE CGA. pp. 11.
Chaudhri, S. and Ganjam, K. (2007). Robust and Efficient Fuzzy Match for Online Data Cleaning. Proc. ACM SIGMOD. pp. 213.
Hipp & Grimmer,2001. Exploratory Data Mining and Data Cleaning. pp. 342.
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download