Big Data Analytics And High Performance Computing: Challenges And Opportunities

The Importance of Technology in the Development of Digital Data

Technology has been playing an important role in development of digital data. Technology has been a medium for data communication over the internet. Technology has been used at large scale in different sectors and industries in the digital form. Various resources including sensors, mobile devices, log files, remote sensing technologies, wireless sensors and other social networking have been using digital form of data at large extent (Hashem et al. 2015). Internet has been helping in providing a lot of exposure to the world within a few seconds. Therefore, IT systems have been depended in the internet seed and versatility. Advance technology has been maintaining the edge of the devices in order to connect with the internet and share data and digital form (Wu et al. 2014). Therefore, data analytics has been becoming important in the market for maintaining and analyzing data.

High speed of data analytics has been classified with the high performance computing (HPC). There has been a huge flow of data and information from various sources social media, mobile devices and internet websites. Therefore, this huge volume of data can be recognized as Big Data (Assunção et al. 2015). Therefore, it became difficult for storing and analyzing large volume of data and information. Various types of information including structured and unstructured data are incoming in digital form. Therefore, it became difficult for system to store and analyze data. Therefore, in order to analyze data, big data Analytics is introduced in system (Reed and Dongarra 2015). Big Data Analytics is an analytical tool that helps in different are structured and unstructured data from large volume of data for analyzing them properly.

Big Data has been helping in storing such huge volume of data and information from various data sources. However, there have been some flaws in the algorithms and programs of the big data analytics for analyzing data (Kambatla et al. 2014). There has been various problems in analyzing and searching data from data storage. High Performance Computing has been facing problems in relating with the big data analytics. High Performance Data Analytics has been facing challenges in order to obtain actionable patterns for combining data driven decisions at higher level. The concept is not able to integrate big data analytics on same platform for high performance computing in the systems.

Big data has been becoming a buzzword in IT industry. Digital data has been stored in the traditional database systems. This database system has been slow in searching and not compatible for the searching criteria (Reyes-Ortiz, J.L., Oneto and Anguita 2015). There has been new database developed in the systems that have helped in storing huge amount of data and information in structured manner. Data storage and analyzing has been a major problem in the IT industry. Although traditional big data analysis has not been robust along with not secured.

In the modern scenario, the growth of data has been steadily increasing. According to the IDC Digital Universe Study reported by EMC, 130 Exabytes of data has been created and stored. This number has been increased to 7910 Exabyte in 2015 (Diamantoulakis, Kapinas and Karagiannidis 2015). Therefore, there has been increase in the number of data and information collected from various sources (Jin et al. 2015). This has been creating problems in the system for handling it. Around 95% of the digital data is unstructured data that cannot be handled and analysed by traditional approaches. Therefore, this has been becoming a problem for the researchers to handle such a huge amount of data.

Big Data and Data Analytics

This research will throw light on various dimensions of Big Data Analytics along with High Performance Computing (HPC) (Tsai et al. 2015). Evolution of big data will be discussed in research including deep learning concept in the big data. This research will help in mitigating challenges faced during big data analysis.

This research aims to analyze the impact of high performance computing over the big data computing.

The objectives of research are mentioned below:

To identify challenges faced by big data analytics in the IT field
To analyze the impact of high performance computing on big data analytics

The research questions are as follows:

What are challenges faced by Big Data Analytics in the IT field?
What is the impact of high performance computing on big data analytics?

Big data has been gaining attention in the IT market. The use of traditional database management system has not been able to store huge amount of data and information of the companies. Various evolutions have been faced by the big data concept including SQL, MySQL and spatial database. Databases including NoSQL, MongoDB helps in providing proper ways of representing of unstructured way data and information. Data Analytics and Data Science has been a part of mathematics for catering information and knowledge expectations of the domains (Wang et al. 2016). Various architectures have created in due process for the analyzing various approaches to the big data domain. Big data has been helping in analyzing and storing huge amount of data and information in organization. It is collection of large sets of information collected from various sources.

There are various aspects in the big data including volume velocity and variety. Large organizations used to have huge data sources that create problems in the database management (Chen and Zhang 2014). Data warehouse based solutions might not be necessary for having the ability for processing and analyzing data by lacking of parallel processing architecture. Data has been flowing all over the organization in digital form. Spatial and temporal data helps in absorbing data space. Big data has been gaining velocity in the market by the usage of the data storage in the companies (Cui, Yu and Yan 2016). Big data tools helps in making data query principle. Queries can be stored in the business intelligence software.

Data analytics can be categorized into three levels including descriptive, prescriptive and predictive. Descriptive analytics helps in providing details about data distribution and visualization. Predictive analysis focuses on predicting possibilities and trends. Prescriptive analytics has aimed to provide insights from data can be used for decision making (Wang et al. 2016). Descriptive analysis has been based on visualization and used in the business intelligence. Predictive analysis is used in statistical models. On other hand sophisticated machine learning techniques can be done on prescriptive analysis for providing optimal solutions.

Traditional analysis has been becoming obsolete from the market due to its issues in the analytical tool. Data transaction over the internet can be done very fast that helps in proper communication in the organizations. Data applications helps in maintaining and processing over the internet (Akusok et al. 2015). Cloud computing has been an important matter in big data analytics. The use of various data analytics in computations are performing on continuous data streams. Applications of machine learning algorithms for extract patterns in prescriptive patterns on data models in various challenging fields of large data analytics. Analytical models has been continuous process in satisfying needs of customers in the market. The need of distinctive and high performance computing has been playing an important role in business development.

Challenges Faced by Big Data Analytics

High Performance Computing can be included in applications that can be of extreme computational needs. However, high performance computing (HPC) helps in storage and proper computing process ion the systems. However, proper analysis for the predictive analytics has not been found yet. The design of the HPC might be applied to problems that include large volumes of data. Storage is considered to be the most important part in every organization (Demchenko, De Laat and Membrey 2014). Therefore, this has been always shortage of storage spaces in the organization. In traditional systems, Big Table has been used the earliest models of storage and considered to be distributed systems of storage. However, it failed in providing performance to the systems.

Both HPC and Hadoop were intended to help diverse kinds of information and work process. Because of the ascent in the information concentrated applications and the changed needs of them, the development of both the ideal models is required. More register requesting workloads are being put on Hadoop groups which are again of heterogeneous in nature. This has prompted change in Hadoop with presentation of YARN and Mesos (Bonomi et al. 2014). Essentially to deal with vast scale information parallel document frameworks should be advanced to help the information administration issue.

Parallel document frameworks like Luster record framework can be incorporated with Hadoop for giving a proficient answer for the information serious applications which are expanding in number (Wang, Kung and Byrd 2018). This would move the neighborhood stockpiling situation of Hadoop towards an appropriated record framework. It would add on the use to the worldview where in the highlights of dispersed record frameworks are implanted. Expanded throughput, practical capacity execution and dispersed shared memory would give a positive advantage to the preparing of the applications.

Parallel processing worldview can be considered as the spine for every huge datum investigation which focuses at greatest asset use along these lines conveying huge time productivity changes. Guide diminish is thought to be the true arrangement huge information taking care of and has been overwhelming the huge information world (Belle et al. 2015). Specialists are investigating the potential outcomes of upgrading this most prevalent parallel programming model. Greatly parallel programming structures for huge information examination can be can be instrumental in giving answers for capacity and correspondence bottlenecks. Heterogeneity and un-structured of the information requires suitable parallel figuring models (Salehan and Kim 2016). While there are endeavors to acquire parallelism while handling absolutely unstructured information like, cost and time productive parallel processing for huge information investigation is still a long way from development.

Profound learning of huge information is a rising field which goes for use of machine learning calculations for getting important bits of knowledge from huge information. These applications are intrinsically parallel in nature. As specified before the sheer volume of information postures numerous difficulties in utilization of profound huge information examination (Sandryhaila and Moura 2014). Perception bolster is one of the essential goals of investigation which can likewise be accomplished effectively just with the assistance of fitting parallel handling. Outline of compelling hugely parallel frameworks for high volume, assortment and high speed information is greatly intricate and remains a test in enormous information examination.

The Growth of Data and Information Collection

This chapter will focus in methodology followed by the research in order to complete aims and objectives of study. Research methodology has been science of developing various approaches son the study fir the development of outline of the study. The use of research methodology in the research has been able to provide a keen approach to research question and objectives of research (Wamba et al. 2017). This research will use positivism research philosophy in order to maintain a focus scientific approach to the study. The use of the positivism research methodology will help in analyzing a better approach to the theories and models related to the research topic including big data analytics. Other research philosophes are interpretivism and realism that will not be used in this study. Interpretivism is the opposite of the positivism. The researcher will have to depend on the answers ad views of the people in the research topic. Therefore, it might be wrong and irrelevant to the context of the topic. Therefore, the researcher will only choose positivism research philosophy.

The research will follow and descriptive research design in the study. Descriptive analysis depends in the objective of the study and help in fulfilling the objectives of the study (Hu et al. 2014). Therefore, in this research, researchers will use descriptive research design for analyzing different aspects if the big data analytics. There are two other research design including explanatory and exploratory research design. However, explanatory research design focuses in explaining all type aspects included in the research paper. On the other hand, explanatory research design help in explaining marketing criteria of goods in the market. Therefore, the researcher will use descriptive.

Two categories of data collection methods are primary and secondary data collection method. Primary data collection deals with raw data collected from conducting surveys with participants. Secondary data collection method focuses on collecting data and information form online databases, journals, reports and books. In this research, researcher will collect data and information from secondary data sources including online journals, books, online databases and articles. These online journals will be related to the high performance computing in big data analytics.

Data will be analyzed using the qualitative data analysis method. Three themes will be created based on high performance computing in big data and thematic analysis will be done on collected data and information. Sample size will be three themes created in research topic. A simple random sampling method will used as data collection technique (Wang, Kung and Byrd 2018). The journals will be taken after 2012 and related to the research topic. The research will follow all the ethical consideration related to the study. Data and information about the high performance computing in big data will be collected from published journals with proper author name and after year 2012. Therefore, all the information will be current and analysed properly. The Data protection Act 1998 will be followed in the study for securing data and information of the study. It is expected that research will fulfil all research questions and objectives. All data and information will be properly analyzed in order to fulfil the research questions.

Objectives and Research Questions

References

Akusok, A., Björk, K.M., Miche, Y. and Lendasse, A., 2015. High-performance extreme learning machines: a complete toolbox for big data applications. IEEE Access, 3, pp.1011-1025.

Assunção, M.D., Calheiros, R.N., Bianchi, S., Netto, M.A. and Buyya, R., 2015. Big Data computing and clouds: Trends and future directions. Journal of Parallel and Distributed Computing, 79, pp.3-15.

Belle, A., Thiagarajan, R., Soroushmehr, S.M., Navidi, F., Beard, D.A. and Najarian, K., 2015. Big data analytics in healthcare. BioMed research international, 2015.

Bonomi, F., Milito, R., Natarajan, P. and Zhu, J., 2014. Fog computing: A platform for internet of things and analytics. In Big data and internet of things: A roadmap for smart environments (pp. 169-186). Springer, Cham.

Chen, C.P. and Zhang, C.Y., 2014. Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Information Sciences, 275, pp.314-347.

Cui, L., Yu, F.R. and Yan, Q., 2016. When big data meets software-defined networking: SDN for big data and big data for SDN. IEEE network, 30(1), pp.58-65.

Demchenko, Y., De Laat, C. and Membrey, P., 2014, May. Defining architecture components of the Big Data Ecosystem. In Collaboration Technologies and Systems (CTS), 2014 International Conference on (pp. 104-112). IEEE.

Diamantoulakis, P.D., Kapinas, V.M. and Karagiannidis, G.K., 2015. Big data analytics for dynamic energy management in smart grids. Big Data Research, 2(3), pp.94-101.

Hashem, I.A.T., Yaqoob, I., Anuar, N.B., Mokhtar, S., Gani, A. and Khan, S.U., 2015. The rise of “big data” on cloud computing: Review and open research issues. Information Systems, 47, pp.98-115.

Hu, H., Wen, Y., Chua, T.S. and Li, X., 2014. Toward scalable systems for big data analytics: A technology tutorial. IEEE access, 2, pp.652-687.

Jin, X., Wah, B.W., Cheng, X. and Wang, Y., 2015. Significance and challenges of big data research. Big Data Research, 2(2), pp.59-64.

Kambatla, K., Kollias, G., Kumar, V. and Grama, A., 2014. Trends in big data analytics. Journal of Parallel and Distributed Computing, 74(7), pp.2561-2573.

Reed, D.A. and Dongarra, J., 2015. Exascale computing and big data. Communications of the ACM, 58(7), pp.56-68.

Reyes-Ortiz, J.L., Oneto, L. and Anguita, D., 2015. Big data analytics in the cloud: Spark on hadoop vs mpi/openmp on beowulf. Procedia Computer Science, 53, pp.121-130.

Salehan, M. and Kim, D.J., 2016. Predicting the performance of online consumer reviews: A sentiment mining approach to big data analytics. Decision Support Systems, 81, pp.30-40.

Sandryhaila, A. and Moura, J.M., 2014. Big data analysis with signal processing on graphs: Representation and processing of massive data sets with irregular structure. IEEE Signal Processing Magazine, 31(5), pp.80-90.

Tsai, C.W., Lai, C.F., Chao, H.C. and Vasilakos, A.V., 2015. Big data analytics: a survey. Journal of Big Data, 2(1), p.21.

Wamba, S.F., Gunasekaran, A., Akter, S., Ren, S.J.F., Dubey, R. and Childe, S.J., 2017. Big data analytics and firm performance: Effects of dynamic capabilities. Journal of Business Research, 70, pp.356-365.

Wang, G., Gunasekaran, A., Ngai, E.W. and Papadopoulos, T., 2016. Big data analytics in logistics and supply chain management: Certain investigations for research and applications. International Journal of Production Economics, 176, pp.98-110.

Wang, S., Wan, J., Zhang, D., Li, D. and Zhang, C., 2016. Towards smart factory for industry 4.0: a self-organized multi-agent system with big data based feedback and coordination. Computer Networks, 101, pp.158-168.

Wang, Y., Kung, L. and Byrd, T.A., 2018. Big data analytics: Understanding its capabilities and potential benefits for healthcare organizations. Technological Forecasting and Social Change, 126, pp.3-13.

Wu, X., Zhu, X., Wu, G.Q. and Ding, W., 2014. Data mining with big data. IEEE transactions on knowledge and data engineering, 26(1), pp.97-107.

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Big Data Analytics And High Performance Computing: Challenges And Opportunities ”

Get high-quality paper

NEW! AI matching with writer