In this briefing paper, the ICT topic ‘Big Data’ will be reviewed from literatures. Big data is the new IT buzzword that refers to the voluminous data processed in different business processes in different industries around us. There is explosive growth of the volume of structured and unstructured data in last decades. In the early days of implementation of information technology in different kind of organizations had implemented databases for working with data related to their business processes, but in last few years, emergence of social media and ecommerce have accelerated the growth of data outside organization and from individuals. For example, over social media like Facebook, people uploads and shares heavy volumes of images, texts, videos from different parts of the world at the same time their location details, machine details etc. are also being circulated. Analysis of these type of data reveals several interesting information about people’s lifestyles, choices etc. and businesses are very interesting for these type of information.
Working with these kind of data using typical database management software and information technologies is difficult. ‘Big data’ and technologies have given rise of a new dimension in this case. It covers all technologies that helps to address the volume and complexities of these voluminous data. (Madden, 2012)
In this briefing paper, it will discuss about the technology, current research and trends on big data in details.
As, it has been already told that, processing of volumes of structured and unstructured data, using traditional database management systems or data processing systems were difficult. This was the problem that led to the invention of big data concept and related technologies. (Madden, 2012)
There is common confusion around the term ‘big data’, that is, whether it refers to technology or volume of data. When vendors use the term ‘big data’ they generally refers to the technologies like processes and tools that helps in working with volumes of data efficiently. So, the term encompasses any collection of complex and larger data set that is difficult to handle by typical database management or data processing applications. It also covers up the collection of tools to support processing of such data through different kind of operations like searching, curation, sharing, transfer, storage, visualization etc. (Zikopoulos, 2011)
There are some characteristics of a dataset that makes it a ‘Big data’ data set. Those characteristics are,
Volume refers to the quantity of data, generated from a process or system. The potential and value of a dataset is directly proportional to its volume. This characteristic is the first criteria to classify a dataset as big data or not. Even this concept has been reflected on the term ‘Big data’ itself. (Marz & Warren, 2014)
This characteristic is related to the rate of generation of data or how fast data is getting generated or processed to provide desired outcome.
Big data takes data from heterogeneous sources into consideration while processing the same. Variety in data sets helps to analyze it from different aspects and in deriving different outcomes.
Veracity of data sets refers to the captured quality of those. Veracity of a dataset plays significant role in the accuracy of the outcomes from analysis of the data sets. (Zikopoulos, 2011)
Variability refers to the levels on inconsistencies present in the data and that show up any time during processing. This may be problematic for data analysts.
Managing the processing of big data, is a complex process by itself. It becomes more complex when data comes from heterogeneous sources and larger in volume. These kind of data is needed to be interlinked, correlated and connected. Otherwise it is difficult to work on these data.
Computation power and storage for larger datasets are not a serious problem now a days. Advancement in electronics and digital technologies have made these solutions more efficient, easily available and cheaper. This has helped in emergence of big data. There is a ‘paradigm shift’ from computer architecture to the mechanisms in data processing. There is a growing demand for data mining and analysis applications for big data. (Barlow, 2013)
There are wide range of tools and technologies that supports the concept of big data and analysis, processing of the same. There are technologies like crowdsourcing, A/B testing, data fusion etc. along with machine learning, natural language processing, time series analysis, integration, simulation, genetic algorithms, signal processing, visualization etc.
Tensors are the representative of multidimensional big data. Tensor based technologies and computation methods like multi- linear sub space learning helps in this case.
Other than that there are database related technologies, parallel processing support, search based application, distributed file systems and databases, data mining, cloud computing etc. and Internet that supports big data revolution.
There are big data analytics that processes the big data and helps in finding out different ‘patters’ out of it. These patterns gives critical insights into data sets.
Storage is an important issue for big data. A proposed solution is distributed and shared storage. Storage area network or SANs, Networked Area storage or NAS etc. come into these categories. However, big data practitioners are not quite interested in these solutions. There are RDBMS based storage solutions for big data that is capable of storing petabytes of data. (Madden, 2012)
All these technologies supports big data in analysis of data from web, analysis of network monitoring logs, click stream analysis etc. There are ‘data science’ applications like simulations for massive scale analysis of data, deployment of sensors etc.
Parallel database systems like Vertica, Teradta, Greenplum etc. are powerful but expensive and hard to administer. There are lack of fault tolerance levels in case of longer queries. Hadoop is a popular big data technology accepted worldwide. (Roebuck, 2011)
There are number of phases in data processing in big data. Those are explained as,
Big data takes data that are evolving from different industries and scientific researches, demographics, social media and ecommerce. However, all data is not equally important for a particular goal so after collecting data, it will be filtered. Data are collected from systems, social media and numerous sources. There can be operational or transactional data, structured and unstructured data. When it comes to big data them all types of data irrespective or format and type are collected. Later on these data are filtered and compressed before processing.
The most challenging part of data acquisition is, filtering out the unnecessary data. It must be done in a way so that useful information don’t get discarded.
Data science deals with numerous issues that helps to define different filters to ensure, accuracy and relevancy of collected data. (Marz & Warren, 2014)
For streaming data from online sources, it is not always possible to store and process those data to filter those later on. Rather it needs an ‘on the fly’ approach to work on such streamlines of data from web. There are online data analytics applications and systems that helps in filtering and collected data from online streaming data.
Next big challenge in to create metadata from acquired data. This is not easy. Meta data should give details about the sources and structure of data. There are metadata acquisition systems that can automatically record metadata without any human intervention.
However, there are lots of things to do with metadata after recording those correctly. There is a pipeline for analysis of big data. Metadata is required in every stage of the pipeline.
Thus acquisition of data refers to the collection of technologies, tools and processes of collecting data, filtering it and recording metadata of data at the same time without storing and processing data every time.
Data analysis needs some level of uniformity of data. Thus, after acquiring data, it is needed to be cleaned and ready for processing. Data analysis will require data in correct formant otherwise the results of the analysis will not be accurate and effective.
It needs an information extraction process that will bring out the required information from the piles of data from heterogeneous sources. Then it should present the extracted data in a structured form. The process is technically challenging. For example, there are data like images and videos. Extracting information from these formats of data and presenting the same in structured format are really hard.
A common misconception is, big data always provides ‘truth’. This is not the case all the time. The ‘truthfulness’ of big data and analysis depends on these extraction steps. It depends on how effectively ‘truth’ is getting extracted from raw data.
There are different constraints on valid data and error models that are well recognized. However, till now there are many domain of big data where these constraints are still not available.
It has been already discussed that data comes from different heterogeneous sources. Those are no structured and in right format even. It is not possible to acquire and clean data then store the same in data repositories. There are processes like integration, aggregation of those data and then representing those in the right format to sore and process in future.
Data analysis is a complex process. For large scale data analysis it is needed to have effective analysis and the process should be automated. In data analysis process, different semantics and data structures are needed to be expressed in correct formats that are readable by computers and can be ‘resolved’ by robots.
Data integration is important and there are additional works for making the data error free using automated system.
There are different alternative solutions for storing data other than databases. Each of these alternatives have its own advantages and disadvantages. Designing database or correct storage solution is needed to be done very carefully. There are many decision making tools to provide assistance in designing databases.
Making query in traditional databases and processing of query in big data, are fundamentally different. Big data contain volumes of dynamic, interrelated, heterogeneous data. These forms larger networks of interrelated data. There are higher level of data redundancy. These redundancies can be explored through validation, crosschecking etc. There are inherent clusters and these clusters reveals relationships among collections of data. (Roebuck, 2011)
Data mining is a related topic here. It required, cleaned, integrated, trustworthy, easily accessible and effective data that can help in declarative query through data mining interfaces and computing environments.
Big data supports provisions of interactive data analysis in real time applications. Scaling of complex queries is also supported.
However, there is a problem with analysis of big data. That is lack of co- ordination in the systems that stores data, support SQL queries and analytics for performing non-SQL data processing, for example statistical analysis, data mining etc.
Obtaining only results from analysis is not enough. It needs to explain or provide enough explanatory details about those results so that someone can interpret the results from analysis. There are visualizations used in this case. (Marz & Warren, 2014)
There are number of challenges in big data. Some of those are already explained in related contexts. Still, most prevailing challenges are,
Conclusion
In this briefing paper, there is a discussion on an emerging topic in ICT, called big data. After the introduction, there is the problem statement that had given rise to the concept of big data. In the sub sequent sections there are discussions on different characteristics, technology etc. related to big data, finally a detailed description of processes in processing of big data. In the end there is a summary of challenges in big data.
References
Barlow, M., 2013. Real-Time Big Data Analytics: Emerging Architecture. s.l.:O’Reilly Media, Inc..
Boyd, D. & Crawford, K., 2011. Six Provocations for Big Data, s.l.: SSRN.
ene, O. & Polonetsky, J., 2013. Big Data for All: Privacy and User Control in the Age of Analytics. Northwestern Journal of Technology and Intellectual Property , XI(5).
Leskovec, J., Rajaraman, A. & Ullman, J. D., 2014. Mining of Massive Datasets. s.l.:Cambridge University Press.
Madden, S., 2012. From Databases to Big Data. IEEE Computer Society, 16(3), pp. 4-6.
Marz, N. & Warren, J., 2014. Big Data: Principles and Best Practices of Scalable Realtime Data Systems. s.l.:Manning Publications Company.
Roebuck, K., 2011. Storing and Managing Big Data – NoSQL, Hadoop and More: High-impact Strategies – What You Need to Know: Definitions, Adoptions, Impact, Benefits, Maturity, Vendors. s.l.:Emereo Pty Limited.
Zikopoulos, P., 2011. Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. s.l.:McGraw Hill Professional.
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download