What is Data Science?
Now day’s data science is fastest and significant growing demand for the data operator’s professional different non-public and public organizations etc. It also highlighted about the limited supply of data at scale which also reflects the fastest rising of salaries for all data analyst, statisticians, data engineers etc. As this is one the new field emerges in today’s world the main challenges is the technique to use all data in more effective manner.
According to IBM estimation, what is the percent of the data in the world today that has been created in the past two years?
As per the report of 2017, more than 90% of data from today’s world will be created in last two years. However, according to report, different new sensor techniques, devices arises the data growth rate which will be more accelerated. The main challenges faced by marketers is the increasing demand of customers to know all their needs, expectation , preferences as per each interaction as well as transaction.
What is the value of petabytestorage?
In the context of enterprise storage, the system mainly started to leave all terabyte behind, moving to petabyte; towards Exabyte storage. The value of petabyte (PB) storage is byte 1015 data, 1000 terabytes as well as 1,000,000 Gigabyte (GB). In addition, some of vendors who sell different associated storage system are IBM Scale Out Network Attached Storage (SONAS), Hitachi NAS Platform (HNAS), Panasas ActiveStor etc.
For each course, both foundation and advanced, you find state (in 2 to 3 lines) what they offer?Based on the given course description as well as from the video. The purpose of this question is to understand the different streams available in Data Science.
As per this article, foundation course mainly offers knowledge as well as proficiency in different object oriented programming foundation course, different units of advanced as well as foundation coursework. in the context of advanced courses, they mainly offers better causality as well as experience knowledge, knowledge related to both Human and Values data. It also describes about different statistical techniques in the context of time series, panel data as well as discrete responses.
Read the following research paper from IEEE Xplore Digital Library
Ali-ud-din Khan, M.; Uddin, M.F.; Gupta, N., “Seven V’s of Big Data understanding Big Data to extract value,” American Society for Engineering Education (ASEE Zone 1), 2014 Zone 1 Conference of the , pp.1,5, 3-5 April 2014 and answer the following questions:
Summarise the motivation of the author (in one paragraph)
In this particular article, () discussed about the context of motivation to prepare this particular paper by doing the proper outlining based on all related arguments from the context of BigData. However, in this it also discusses about driving better result from the raw materials of big data in both the Internet and Technology world. Apart from that, large amount of data to be processed as well as diagnosed based on all related queries along with different tradition techniques like using SQL etc.
What are the 7 v’s mentioned in the paper? Briefly describe each V in one paragraph.discussed are the 7 v’s mentioned and discussed below from the article
Volume: In the context of big data volume, it mainly refers to the day sizes including audio, video, different calamities of natural disaster, weather forecasting etc. The main importance of big data is discussed about the differences along with traditional d which can be accessed by doing proper SQL query.
Explore the author’s future work by using the reference [4] in the research paper. Summarise your understanding how Big Data can improve the healthcare sector in 300 words.
Below discussion would provide an overview on how the implementation of big data can improve the healthcare sector:
Big data implementation is creating a huge hype in the healthcare sector since the world of technology is ever-changing and the data storage is increasing exponentially every passing day. It has become quite necessary to implement the big data and data analytics even in the healthcare sectors. The amounts of use cases that are prevailing in the healthcare industry are well suited for the implementation of big data and data analytics. For example, EMRs alone collect a humongous amount of data but there is certainly a variety of data noticed in these. Since the amount of data is probable of getting out of control, big data and data analytics are well required to notice a pattern and collect similar data objects.
Without the utilization of big data in healthcare industry, mane healthcare organizations are seen getting swamped by pedestrian problems like regulatory reporting and operational dashboards. Before the basic ones are even noticed, it gets piled up with newer used cases. Therefore, implementation of Big Data is essential in healthcare industry to make the situation better and arrange the huge pile of data in a much more organized way reducing time and resources.
Exercise 3: Big Data Platform(1 mark)
In order to build a big data platform – one has to acquire, organize and analyse the big data. Go through the following links and answer the questions that follow the links: Check the videos and change the wordings
Please note: You are encouraged to watch all the videos in the series from Oracle.
How to acquire big data for enterprises and how it can be used?
_Before the implementation of big data, the acquisition phase is one of the main challenges as per the infrastructure development. Due to the reference of big data, the data is mainly streamed by using greater variety and greater velocity. However, the particular infrastructure is mainly required to provide better big data acquisition delivering predictable latency, low to capture as well as execution of both simple and short queries. In addition, NoSQL database is mainly used to store as well as collect different social media data from different available databases.
How to organize and handle the big data?
To organize as well as handling of big data allows of doing all filtering, transforming as well as sorting of data from data warehouses. However, Oracle also enables the end-to-end controlling of both un-structured as well as structured contents.
What are the analyses that can be done using big data?
The infrastructure needed to analyse big data required to have the capacity to support more complex statistics like data mining of vast range and different core statistical analysis. It also helps to automate the whole decision process as well as increasing the faster delivery response time.
Part B (4 Marks)
Part B answers should be based on well cited article/videos – name the references used in your answer.For more information read the guidelines as given in Assignment 1.
Exercise 4: Big Data Products (1 mark)
Google is a master at creating data products. Below are few examples from Google. Describe the below products and explain how the large scale data is used effectively in these products.
Google uses the algorithm page rank in order to rank websites by order in the results that are being put forward by their search engine. This is done to measure the importance of web pages. However, PageRank is not the only algorithm used by Google for ranking pages according to their importance, but it the first and best-known algorithm. For large-scale data and searching of full texts a simple iterative algorithm is used which is corresponding to the principal eigenvector of the normalised link matrix on a page of the web.
Google Spell Checker is a tool by which any kind of misspelled word is identified and corrected based on user behaviour. Google uses both indexing and query processing algorithm in order to find out the refinement required in a typed word. However, spell checking is not related to search index. It has a close proximity of searching more than 107,000 results at one go. The query processing algorithm is based on the calculation of query rank (QR) and the frequency of query (QF) along with the user satisfaction (US) with query.
Google Flu Trends is a tool that provides influenza activity trends encompassing 25 odd countries. It uses a CDC monitor that helps in collection of data from multiple resources from FluSurv-NET surveillance system. It collects data from usable information broken into five categories, such as, Viral Surveillance, Mortality, Hospitalizations, Output Illness Surveillance and Geographic Spread of Illness.
The online search tool Google Trends helps a user to find out the frequent use of specific keywords, phrases or subjects that have been queried over the specified amount of time. However, it does not present the specific search numbers and works the best with the utilization of Keyword Planner. It shows a ‘normalized’ or almost related level of interests for a possible phrase or keyword and also allows comparing the level of interests with respect to the phrases and keywords that has the potentiality of being the target phrases.
Like Google – Facebook and LinkedIn also uses large scale data effectively. How?
Facebook and LinkedIn also have the potentiality of using large scale data in an effective way just like Google. However, the primary difference between them from Google is that Google uses the large-scale data based on the individual searches of any user and the ‘best guessed’ data that are assumed from the sites the user visits and the search terms being used. Nevertheless, Facebook or LinkedIn do not assume the user’s surfing styles and specifically asks the users about their zone of interests and other specific details.
Exercise 5: Big Data Tools
Briefly explain why a traditional relational database (RDBS) is not effectively used to store big data?
Relational Database Management System or RDBMS served as the solution for database needs but, the rapid change in the volume and the velocity of data in business has limited its use in effectively storing the humongous volume of big data. The big data has a tremendously large range of petabytes which is equal to 1024 terabytes. RDBMS is not qualified enough to handle petabytes and needs the use of adding much more Central Processing Units or CPUs, that is, more memory to scale up the database management system vertically.
What is NoSQL Database?
NoSQL Database or Not Only SQL Database is a database design approach used to accommodate a huge variety of data models that includes document, key-value, graph and columnar formats. It forms an alternative to the traditional relational databases where data are placed in tables and care is given to the data schema before the database is built. It is essentially used to work the large sets of distributed data.
Name and briefly describe at least 5 NoSQL Databases
What is MapReduce and how it works?
MapReduce is a a proposed framework in action for Apache Hadoop that processes large data sets in a parallel manner across a cluster of Hadoop.
It uses a ‘map and reduce’ process in two steps. The analysis is supplied by job configuration and the Hadoop framework help to provide the parallelization, distribution and scheduling services.
Briefly describe some notable MapReduce products (at least 5)
Apache Hadoop: An Open-source software for Big Data, which is distributed and scalable while computing.
Couchdb: It focuses in how to use a MapReduce open source database software and possess a scalable architecture.
Disco Project: Being one of the most distributed computer based system, this lightweight as well as open-sourced as a MaprReduce framework.
Infinispan: RedHat has developed this software that has the capability to store huge amount of data of a key-vale NoSQL in a distributed cache.
Riak: This is a NoSQL Database. It is scalable, easily available, and has a very easy mode of operation in a distributed environment.
Amazon’s S3 service lets to store large chunks of data on an online service. List some 5 features for Amazon’s S3 service.
1. Using BitTorrent with Amazon S3
Getting the concise, valuable information from a sea of data can be challenging. We need statistical analysis tool to deal with Big Data. Name and describe some (at least 3) statistical analysis tools.
Knime: It is the leading open solution for the innovation driven by data used for discovering the potential hidden in a data.
OpenRedine: It generally deals with messy data for cleaning, transforming and extending of data along with web services and the external data.
Orange: It is a data visualization tool generally focusing to be user friendly for the novice users. It provides interactive workflow with a large toolbox for creating interactive workflows for the analysing and visualizing of data.
Exercise 6: Big Data Application (1 mark)
Name 3 industries that should use Big Data – justify your claim in 250 words for each industry using proper references.
There are few industries that have not included big data or data analytics in their business process yet. Amongst them, three of them are listed below with justifications for the claims. These are:
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download