In the recent years, there has been a huge demand in order to store and process more and more amount of data, especially in sectors like science, finance and also in government work. Effective process and methods are required for maintenance and management of such data.
Big data is the solution that is responsible for successful storing of data and processing over it. On the other hand cloud provides a scalable and elastic environment so that the performance of big data can take place in a reliable and fault tolerant manner. Big Data along with Big Data Analytics are considered alongside in the areas related to business and science in order to correlate the data. Therefore both the technologies have great importance as they can provide competitive advantages in business and ways to specifically aggregate and summarize the scientific data.
In this paper we are going to emphasize on cloud computing and big data systems. We will mainly focus on challenges and benefits of these technology based on the organization “Facebook” and on analysis of these tools as source of business intelligence.
Cloud computing refers to an on demand model designed for various provisions; it’s mainly based on the concept of distributed and virtual technologies of computing. The architecture of cloud computing comprises of:
Cloud computing can be categorized as;
Big data refers to the large distinctive volumes of variable data created by various sources like human beings and machine. It requires new innovative technology to scale, host and process the data analytically to derive the insights of any real time business which relates to the customer or risk, performance, management over the productivity.
All the information and data gathered from social media, devices enabled through internet are a part of big data. Big Data is categorized through four “Vs”:
Cloud computing uses the concept of virtual hardware system in order to provide, elastic, scalable and fault tolerable environment for the storing and processing of bigger volumes of data. Thus, cloud provides availability, scalability and fault tolerance features to big data. It can be said that both big data and cloud computing works together hand in hand and are a compatible concept. Today big data is regarded as a valuable opportunity in business. Many new companies like Teradata, Cloudera have started focusing on delivering database or big data as a service(DBaaS, BDaaS) other organizations like Amazon, Google, General Electric, Microsoft, IBM also provides methods for the consumption of big data by their customers according to the demand (Neves et al, 2017).
Definitions of big data based on an online survey of 154 global executives in April 2012
Facebook is the world’s largest social networking platform which according to a report performs about 2.5 billion processes on its content section which incorporates more than 600 terabytes of data every day. It comprises of nearly 2.7 billion actions of like, photos generated every day comprises of about 300 million data. It works on 105 terabytes of day every half an hour.
Facebook revealed that it operates the largest Hadoop system with capacity of storing over 100 petabytes of data on a single Hadoop cluster in 2012.
Few facts about Facebook Mobile Platform according to sources (2017)
More than 100 million people use Facebook and more than one billion pages are viewed every day that results in accumulation of massive amount of data at Facebook. One of the biggest challenges faced by Facebook was to develop a scalable way which can be used to store and process these bytes as in order to improve the user experience Facebook needs to completely evaluate this historical data.
Another problem was of Data architecture because in the earlier phase Facebook used to use a centralized architecture where single computer system was used to solve all the complex problems. These single centralized systems are ineffective in processing of big bundles of data and are very costly.
Types of data was also an important drawback as Facebook generates various different kinds of data including text, images, videos with various extensions because the centralized data systems were based on structured data i.e. data used to be stored in fixed formats.
Defining of data relationships among such a large amount of data was not possible in case of traditional systems. There were several other problems like accuracy and confidentiality scaling etc.
Hadoop refer to an open source framework based on java programming which supports storing and processing of extremely large sets of data in a distributed environment. Apache Software Foundation sponsored this product under the project named Apache (Shvachko et al, 2010).
Facebook used Hadoop framework on a distributed system for large scale processing and the features of map reducing paradigm. Hadoop provided the facility of writing map-reduced programs in any language according to their choices. Facebook started using SQL as a paradigm to operate and address large piles of data. The data stored in these Hadoop file system in Facebook is mostly published as tables which gives advantage to the developers as they can easily explore and operate the required data sets by using small subsets of SQL. Facebook operates on this data sets using map-reduce program or by using standard query operators.
(McAfee and Brynjolfsson, 2012)
Hive is also an open source; it’s a peta byte scale data warehousing framework that is completely based on Hadoop which is developed by the Facebook Data Infrastructure Team. Hive has been very popular amongst Facebook users from starting. It is heavily used for summarization of jobs, machine learning and for business intelligence purpose.
Hive provided the facility of analysis of the large data sets scalable as scalability is the core function of Facebook and several engineering and non-engineering team continuously work on it. Analyst at Facebook uses ad hoc analysis along with several business intelligence applications. Several products like reporting applications related to the Facebook Ad Network or the Facebook’s Lexicon product are completely based on analytics (Fan et al, 2014). Technology like Hive and Hadoop are responsible for providing a flexible infrastructure needed by its diverse users and applications and providing a cost effective manner for the scaling of the amount of data that is generated at Facebook at an increasing rate.
Hive system architecture
(Thusoo et al, 2010)
The above figure illustrates the flow of data through a source system to the Facebook warehouse. As shown, Facebook consist of two sources of data – the federated MySQL tier which comprises of entire data related to Facebook site while the log data is stored in the web tier.
The data from the web tier is placed in set of cluster named Scribe-Hadoop. These cluster consist of Scribe servers that are designed to run on these Hadoop clusters. Logs coming from different web servers are aggregated and are placed in the Hadoop cluster by these Scribe servers in the form of HDFS files. There exist a trde off which occurs between the latencies and compression which arises because of the exploration of possibility to compress the data present in web tier before it is transferred to scribeh clusters. After which periodically this data Is further compressed by copier jobs and placed within the associated Hive-Hadoop clusters.
As shown above once the data is stored in these clusters it is available for consumption using the process called the down stream process.
There are two Hive-Hadoop clusters available-
All the replication jobs performed by the user on the data stored in these clusters rely on logging of hive commands which were submitted to the production Hive-Hadoop clusters (Antonopoulos and Gillam, 2010).
Finally, the results of these jobs are either placed in the cluster for further analysis in future or are loaded back to be used by the Facebook user in the federated MySol.
Facebook generates large amount of data every day along with the existing historical data to support historical analysis. The production cluster is responsible for storing the data worth of one month and the data beyond that period is stored by the ad hoc cluster. Since the size of data is very large, all the data is generally compressed by the factor of 6-7 in most cases. Hadoop also allows with the feature to compress the data according to the need of the user through specified codecs (Jin et al, 2015).
Variety of data in Facebook (Minelli, Chambers and Dhiraj, 2012)
PAX [8] compression scheme is also introduced along with the gzip method which compresses the rows and columns of the tables within the hive with 10%-30% more compression as compared to gzip.
Facebook generates 3 copies of each HDFS file in order to save the data from the case of nodes failure. Nowadays Facebook is using erasure codes which reduces it to 2.2- by storing only two copies of data and the rest two copies for error correction codes for that data.
(Constine, 2012)
Big data provides both opportunity as well as challenges to a business. Big data should be processed and analyzed in a proper manner from time to time to extract positive values to make positive changes or to influence the decisions related to the business.
Definition- Analytics refers to the finding the meaningful patterns present in the data, for business analytics is defined as the use of data extensively in order to derive facts that are based on decisions and actions related to business (Gandomi and Haider, 2015).
Analytics helps in optimizing the process, aggregate internal data with external one. It helps a firm to meet the demands of stakeholders, manage the risk, and manage the large amount of data sets and enhancing the overall performance of the organization by transforming its information into intelligence.
(Sivarajah et al, 2017)
Issues |
Existent solutions |
Advantages |
Disadvantages |
Security |
SLAs and data encryption |
Data is well encrypted |
Querying regarding encrypted data remains time consuming |
Heterogeneity |
Big data offers the ability to deal with different variety of data coming with different velocity |
Most of the variety of data is covered |
Handling of variety of data along with different velocities is very difficult |
Privacy |
– User consent – De-identification |
User is provided with reasonable privacy |
Most of these De-identification mechanism can be reverse engineered |
Data Governance |
Documents regarding data governance |
-defines policies regarding data access -specification of role -defines the life cycle of data |
-defining of data cycle is not easy -counterproductive effects may arise due to enforcement of data governance policies |
Disaster Recovery |
Plans regarding recovery |
Defines the methods and location for recovery of data |
Generally there exist only a single place to secure data |
Data uploading |
-internet to upload data -providing HDDs to cloud providers |
Providing of HDDs to cloud provider is faster than other but is also more unsecure |
Sending HDDs can be risky as it may incur physical damages, while uploading process is very time consuming |
elasticity |
Techniques like resizing, replication and live migration |
Makes the system capable to accommodate data peaks |
Assessments of load variation is mostly manual rather than automatized |
(Yang et al, 2017)
Conclusion
With increasing of data daily, systems supporting big data and in particular analytic tools provide a way to store these huge data. Cloud leverages the solution of big data by providing fault-tolerant and scalable environment. Although, big data is a powerful system that results in improved decision making of any organization and is a source of business intelligence. But, it still faces certain challenges regarding security mechanisms defining of various data types, implementing of elasticity and special efforts must be made to overcome these challenges.
References:
Antonopoulos, N. and Gillam, L., 2010. Cloud computing. London: Springer.
Constine, J., 2012. How big is facebook’s data? 2.5 billion pieces of content and 500+ terabytes ingested every day, 22 August 2012.
Fan, J., Han, F. and Liu, H., 2014. Challenges of big data analysis. National science review, 1(2), pp.293-314.
Gandomi, A. and Haider, M., 2015. Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), pp.137-144.
Hashem, I.A.T., Yaqoob, I., Anuar, N.B., Mokhtar, S., Gani, A. and Khan, S.U., 2015. The rise of “big data” on cloud computing: Review and open research issues. Information Systems, 47, pp.98-115.
Jin, X., Wah, B.W., Cheng, X. and Wang, Y., 2015. Significance and challenges of big data research. Big Data Research, 2(2), pp.59-64.
Marr, B 2015, ‘7 Amazing companys that really get Big Data’, big data case study collection, accessed 16 August 2017, <.file:///C:/Users/ASUS/Desktop/project%202/bigdata-case-studybook_final.pdf>
McAfee, A. and Brynjolfsson, E., 2012. Big data: the management revolution. Harvard business review, 90(10), pp.60-68.
Minelli, M., Chambers, M. and Dhiraj, A., 2012. Big data, big analytics: emerging business intelligence and analytic trends for today’s businesses. John Wiley & Sons.
Neves, P.C., Schmerl, B., Bernardino, J. and Cámara, J., Big Data in Cloud Computing: features and issues.
Shvachko, K., Kuang, H., Radia, S. and Chansler, R., 2010, May. The hadoop distributed file system. In Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on (pp. 1-10). IEEE.
Sivarajah, U., Kamal, M.M., Irani, Z. and Weerakkody, V., 2017. Critical analysis of Big Data challenges and analytical methods. Journal of Business Research, 70, pp.263-286.
Thusoo, A., Shao, Z., Anthony, S., Borthakur, D., Jain, N., Sen Sarma, J., Murthy, R. and Liu, H., 2010, June. Data warehousing and analytics infrastructure at facebook. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (pp. 1013-1020). ACM.
Yang, C., Huang, Q., Li, Z., Liu, K. and Hu, F., 2017. Big Data and cloud computing: innovation opportunities and challenges. International Journal of Digital Earth, 10(1), pp.13-53.
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download