Data Science And Prediction: Analysis, Transformation, And Feedback Loop

Introduction to Data Analytics

Data analytics and management

Source: Dhar, V., 2013, ‘Data science and prediction’, Communications of the ACM 56.

The data analytics is the modern approach in the data management. This has grown in all areas with the increase in the usage of the computers and the information technology. The data analytics also termed as the analysis of the data or the data analysis. This is a kind of process in which the data is processed, cleaned, inspecting the data, transforming the data and in some areas the data is modeled according to the goals assigned by the organization. The main aim of the data analytics is to get the quick and good information. The data analytics is applied for any field. One can apply this to social, science, technology and engineering also. The information should be very much relevant to the field of work. One has to know what kind of data must be kept for the given work (Yu, 2013). The principles of data analytics will improve the quality of data which has been collected by the clients. Initially the data is collected and the data is called as the raw data. The raw data is processed and then the data is presented on the clean data sheet. Then this data is analyzed. Some data is kept in the data models and used in the data algorithms. The data is communicated for the next level to have feed back of the data quantity and the data quality also. Once the data is accepted then the data is presented in the final format.

Data Science Process

Characteristics of Data Analytics

Source: Wang, 2012, Integrity Verification of Cloud-Hosted Data Analytics Computations, in: Proceedings of the 1st International Workshop on Cloud Intelligence (Cloud-I 2012), ACM, New York, NY, USA.

The data is used in various ways by the educational institutions, research lab, medical industries, doctors and various other fields (Dhar, 2013). Let us consider the medical applications of the data analytics. In this the doctors need to maintain the records of the patients. When once they are having the previous records they complete the case easily and it benefits the patient also. The similar application is also seen in the research labs for the sample collection and the analysis.

Reference

Dhar, V., 2013, ‘Data science and prediction’, Communications of the ACM 56.

Wang, 2012, Integrity Verification of Cloud-Hosted Data Analytics Computations, In: Proceedings of the 1st International Workshop on Cloud Intelligence (Cloud-I 2012), ACM, New York, NY, USA.

Some of the characteristics of the data analytics are discussed below. They are the accuracy, completeness of the data, consistency of the data, the uniqueness of the data and the timeliness of the data. The accurate data is necessary from the reliable source which can improve the data quality. If there are any flaws in the data system like the storage, collection the quality and accuracy will be missed (Witten, Frank and Hall, 2011). The data completeness is also a very important phenomenon. If the data is presented partial or some kind of data missing will leads to incomplete data. Hence one must have completeness in the data. The data which is present must be consistent which means the data should not contain any kind of errors. This makes the data more reliability.

The data must be very high in the quality this is one of the unique features of the data. The high degree of uniqueness in data will improve the efficiency of the data. The timeline is also very important. As the data must be time bound and the expired or outdated data must not be presented. The data input is basically divided into 3. They are Basic source of data, advanced source of data and the custom source of data. The basic source of data can be obtained from the computers and the systems or the other kind of online sources. The advanced sources are one which the data is collected from the paid or work bound. Here the data is reliable (Xu, Li, Guo and Chen, 2012). The next one is the custom source where data is user defined. The output of the data is accessed by the clients who requested the data. They take data after several feedbacks and the correction after ensuring the data is reliable. The environment in present day globalization is became an online or the internet platform. Some of the parameters were discussed for the data analytics topics

Data Inputs for Analytics

There are different kinds of the inputs to the data. Some of the data connections are mentioned below for the data input in the data analytics. There are many set of data streams. These stream of data depends on the input of the systems. The data source is also very important in the data systems mainly. There is part in data collection known as the stream analytics which provides the stream of data to the system as input. This is also known as the jobs input. There are also many other data streams in the system variations. The input data will also have the subscription option. The data subscription is varied from vendor to vendor. Generally the inputs are divided into two types. They are the data stream and the reference data. The data stream is referred to as unbounded sequence. The unbounded sequence data will be given as input to the system. They should contain one stream of data minimum. The next one is the reference data. The reference data is a kind of auxiliary data. This data is static or dynamic. Both the static and dynamic data can be used. Reference data is mostly used for the correlation of the data.

The data transformation is basically a mathematical function. There are many reasons for the data transformations. In which the data should not be transformed and it should be kept as it is. The need for the data transformation is that the data presented as the input or the initial data will be a raw data. Hence they must transform the data into a preferred type. The data should be normalized into a desired system data or output. Sometimes the static data has to be converted into the dynamic data. Here in the data transformation, the input data is converted into a particular form of present application. In which the some bits of data might be added or deleted. The transformed data will not be same as the input data in the physical appearance but the logical meaning of the input and the output are the same. Discrete Fourier transform and the discrete wavelet transform mathematical tools used for the matrix data transformation.

The data analytic output will be in the form of data with well arranged form. Initially the words of data or the data lines will be taken as output after the data analysis. Before getting the output it has to pass through many filters. There will be many data comparisons from the transformed data and it will be finally compared logically and taken as output. As the data is transformed and analysed before coming to the output one must clearly compare the data. Sometimes the new data has to be linked for the output. The relational data base also will be greatly helpful. The output data must also be interlinked.

Data Transformation Techniques

The data boundary is very important in the data analytics. There are many types of data. Some data is boundary oriented and boundary free data. In the boundary oriented data the data is limited. As the data has some domains, hence it is collected from the domain itself. If the data is out of the domain or boundary it will not accept the data as input or if the output is provided out of domain it will not produced the data at the output. Hence the boundary data is very useful in the filtering the data. The next kind is boundary free data. In this, any kind of data is taken as the input and any kind of data is accepted as the output.

The environment of the data analytics will be in terms of data. It will be generally having the data in terms of bits or bytes. The digital data is used as the input and then the data is transformed or analyzed into a useful structure. The data is then sent to output as a modified or transformed data. The data integration is one of the environments. Here the data is collected from different systems and made as a single data unit or a project. The next environment is the data transformation and the data filters. The encryption of data is also applied.

Witten, I. H., Frank, E., Hall, M. A., 2011, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann.

Xu, H., Li, Z., Guo, S. and Chen, K., 2012, CloudVista: Interactive and Economical Visual Cluster Analysis for Big Data in the Cloud, Proceedings of the VLDB Endowment 5 (12), 1886–1889.

The Feedback Loop

Source: Brumfiel, Geoff, 2011, ‘High-energy physics: Down the petabyte highway,’ Nature 469. pp. 282–83. doi:10.1038/469282a

The feedback loop in the real time data is necessary for various reasons. The feedback loop will improve the efficiency of the system, gives the insight, the new opportunities will be found. The data comes in high speed and we must be in a position to select the data and mention the next requirements to the feedback section. This may be the data accepted, need more data, data rejected or the out of bound data. The selection of this data is very important (ReinhardKlette, 2014). If the data feedback is continuous then the data quality will be improved and the system rate also will be increased.

Data Output and Feedback Loop

The data in the loop must be in a meaningful way. Previously the data analysis used to take many days or many hours. Now the data systems are so sharp that the data analysis takes few seconds and the feedback must also be provided in the same rate. The data must have a specific path or the destination, then the data flow will be quick and the data feedback also can be accurate. But most the people are failing tom do this. Hence one needs to do it in an efficient way of the data flow. The data flow should be visualized in all the direction for getting the feedback path. It should cover all the areas. Once the data is available from the output of the feedback loop the client must give the input back or else the data flow will be slow down or the data flow may be out of time (Brumfiel and Geoff, 2011). The loop must be always on for various reasons. The first thing is that the flow of data will be continuous and then the system updates also good. The data must be real time in all the cases. This will improve the system efficiency in all the ways.

ReinhardKlette, 2014, Concise Computer Vision. Springer. ISBN 978-1-4471-6320-6.

Brumfiel, Geoff, 2011, “High-energy physics: Down the petabyte highway”. Nature 469. pp. 282–83. doi:10.1038/469282a

There is a major confusion in the data analytics based organizations to involve the outsourcing or not. Still the major conclusions were not given and every company or institutions have their own rules and principles in the outsourcing aspects. With the global demand for the data analytics and the big data there is a huge demand for the data analytics outsourcing. As the data is very precious the data can be extracted from the external agents also for the data management (Wang, 2012). The confidential data cannot be outsourcing at any cost. The external data source like the outsourcing will develop the company in all the areas and can give the competitiveness in all the areas. The long term goals can be developed with the outsourcing with not fixing a topic. The outsourcing may be costlier sometimes but the reliable data can be obtained with outsourcing. The data which is collected from the different analytical teams may be difficult to arrange in order but the data is so important.

The knowledge of the data is very important in most of the cases. If the basic knowledge is lagging it is very difficult to arrange the collected data. The online data analytics outsourcing also increased in a great way. But choosing the one reliable source is really important. In the present day situation there are many outsourcing companies increased and the bond type written agreements making the things easier. Due to the internet technology the people are also recruiting the external country outsourcing. BPO industries are the best sources (Rosenblum, 2015). The core intellectual properties must not be lost. These things should be hided and the new things should be obtained in the process. The off shore providers will help to get the different idea than the existing inside data providers. The new funding sources can also be identified in the outsourcing.

Real-time Data Systems

Wang, H., 2012, Integrity Verification of Cloud-Hosted Data Analytics Computations, in: Proceedings of the 1st International Workshop on Cloud Intelligence (Cloud-I 2012), ACM, New York, NY, USA.

David S. Rosenblum, 2015, The Pros and Cons of the ’PACM’ Proposal: Counterpoint. Communications of the ACM, Vol. 58 No. 9, Pages 44-45, ACM.

There are many disadvantages associated with the outsourcing also. The first problem of the outsourcing is the data leakage. The project needs some data which might be confidential. But if you want to get the unknown data from the external out sourcing we have to convey our requirements. This is the most problems in the research field and the technology field (Konstan and Davidson, 2015). As the new products need to get many modifications, one must get many new requirements. Hence they relied on the external source. So the requirements make give scope to leakage of data to the new persons through the outsourcing. The good remedy for this kind is to avoid the sharing of the core values or the core ideas. This will definitely protect you from the strangers and the insiders also.

The next disadvantage is the cost. The cost of the data is very high these days. It is not easy to get the data for the lower price. They will charge higher rates depend upon the depth of the subject and the time of its origin. If it is a technical aspect then the rates will be so high that we need to think twice before requesting the data from the external sources. The next problem is associated with the time. The important characteristics of the data are time (Kathryn, 2015). The data is completely changed with the time. One needs to be aware of this fact. Because the people are getting the outdated or the expired data it will not help them in any way. So the data must be always checked before it received. The data request must be done by keeping the time factor in mind. The data sharing with outsourcing must have limit. Beyond this one should not expose the data. Hence the disadvantages also present in the outsourcing.

Kathryn S. McKinley, 2015, The Pros and Cons of the ’PACM’ Proposal: Point. Communications of the ACM, Vol. 58 No. 9, Pages 43-44, ACM.

Konstan, J. A., Jack, W. and Davidson, 2015, Should Conferences Meet Journals and Where?: A Proposal for ’PACM’. Communications of the ACM, Vol. 58 No. 9, Page 5, ACM.

Importance of Data Quality

The outsourcing is best suitable for the field of research. In the field of research one need to publish the articles or the patents without the plagiarism. It is not that easy to resolve this issue. Hence the outsourcing will benefit the researchers in a great way (Harzing, 2014). Here the researchers need initially the literature survey. In this they need to collect the new ideas and the old data. The new ideas can be developed by the self. But the innovation to the ideas can only be done with the help of others. The old data can be extracted from the self. But the data process and analyzing the data need to be done with the help of the others, hence the data analytics really help in the research field and the outsourcing will definitely proves a great idea for the researchers. After receiving the data from the outsourcing the people must analyze the data. This will tell us whether the data is sufficient or not. If it is not sufficient they can request for the new data. After gathering all the possible information people must make it a good draft. The draft also can be checked for the quality (Khabsa and Giles, 2014)). But the process must take place in really quick pace or else the data expiry or the duplication takes place. After getting the quality assurance people need to project into the publication or the final product type. Once it is done with the product the data from the external revivers will help to modify the existing one or to release the new version in the coming days. The research field needs lots of quality assurance. This cannot be done with the single person. And it even needs the team work helped by the good quality data. The data analytics is always necessary for the research domain.

Khabsa, M. and Giles, C. L., 2014, The Number of Scholarly Documents on the Public Web. PLoS ONE 9(5).

Harzing, A., 2014, A longitudinal study of Google Scholar coverage between 2012 and 2013. Scientometrics, Volume 98, Issue 1, pp 565- 575.

The students need to be updated in the field of their interest. They must also know the current best topics and the emerging topics. This will help them to decide the future of work. If one need to select their stream of work in the interested filed they must need all kinds of things in that field. Hence they must get a clear idea of the future. This cannot be done on his own. Hence he need to get an idea of the future from the external source. He cannot rely on the friends or family for the professional activity. They must depend on the good data analytics to get highly refined and the processed data. If the student had a technical query he cannot solve on his own.

Hence the data analytics is necessary. Sometimes students need to anticipate few things, this will really help to anticipate the many aspects of the given problem. The ideas will he has to be implemented must be realistic and practically possible. If the ideas are not worth doing in a practical way then he has to rethink. The other problem is the data expiry (Vitek and Gibbons., 2015). The data in which the student posses must be fresh and the applicable must be up to the idea. Many of the people will struggle in this aspect. When they are not having the fresh data or the out dated ideas they will certainly struggle in the near future. The data analytics will solve such kind of problems (Vardi, 2009). Sometimes the student needs to check his product or idea. This is related to the quality. The quality of the product can improved only when they are tested before the final delivery or test. So the data of the product can be checked in this aspect. The quick data can also be acquired by the data analytics.

Vardi, M. Y., 2009. Conferences vs. Journals in Computing Research. Communications of the ACM, Vol. 52 No. 5.

Vitek, J. and Gibbons, J., 2015, Who owns your research? Results of SIGPLAN Open Access survey SIGPLAN Executive Committe.

The university model needs the data analytics. The university is very broad system it contains various fields. The basic thing is the infrastructure maintenance. In this section the university cannot use the same types of materials for upgrading time to time. Hence they need to employ the new staff and the materials to get the best look and the best productivity. They need to get the data for the new models in the market for the infrastructure upgrading. It also needs the fitting model in the system. If the university management takes the decisions independently without proper data it may not click or it may fail some times. In the aspect of the teachers they need to update with the best technology in the teaching methodology and the teaching resources. The class room coursers are modified from year to year. The open course model is also made its mark. Hence people must follow and adopt the new technology with the assistance of the fine data. The teachers also need very fine data with quality for the best teaching practices (McAfee and Brynjolfsson, 2012).

The administration data maintenance is not an easy task. The data alignment and the data management is done by the data analytics. It also had a facility to manage the fee payments and the audits of the many accounts in the university (Shao, Anthony, Borthakur, Jain, Sarma, Murthy and Liu, 2010). The researchers in the university needs best data to extract fine results. The quality of the university research will improve with the best data and the implementation to a best model. This can be checked with the data analytics. The Lab models in the university is changing rapidly. With the increase in the computers the lab models have changed. The simulation labs in the universities were increased a lot. Hence the simulation labs need many experimental and theoretical data which can done with the data analytics.

Shao, T. Z., Anthony, S., Borthakur, D., Jain, N., Sarma, J.S., Murthy, R. and Liu, H., 2010, Data warehousing and analytics infrastructure at Facebook, in: Proceedings of the 2010 International Conference on Management of Data, ACM, New York, NY, USA, pp. 1013–1020

McAfee, A. and Brynjolfsson, E., 2012, Big data: The management revolution, Harv. Bus. Rev, 60–68.

The data analytics has both the positive and the negative outcomes. The positive aspects were already studied, let us now discuss the negative ways in the data analytics for the students. The primary problems for the students in the data analytics is the ease of getting data. Everything is related to money these days, as the money is spent the data is available (Brumfiel and Geoff, 2011). Here the problem is the students will lose the data accessing ability. Once should not go for the data analytics or the data request for other unless and until he did not get the require data. The students must be skill in the data extraction. If they are not having this skill, then he will face many problems in the present day world. The second problem associated with the students and the analytics is the data duplication. The data duplication is not only problem for the students but also problem for the many other persons like researchers, professors and management. The duplication of data caused many problems to any fields and the students are not exception. The next big problem is the loss of core ideas.

The students will have many valuable ideas. But they are not aware of what to request in the data analytics. If they are exposing the core ideas or the new fields they will be in the verge of losing the idea to the new people or researchers in the world. The next big problem is the laziness due to data request. They never work for the required data instead they will every time depend on the other to get the data for the projects or the own ideas (Meyer, Choppy, Staunstrup and Leeuwen, 2009). The data analytics is known as the two ended sword. It will benefit the students and it will also affect the students thinking. The students must be careful in choosing the data analytics.

Brumfiel, Geoff, 2011, ‘High-energy physics: Down the petabyte highway’, Nature 469. pp. 282–83. doi:10.1038/469282a

Meyer, B., Choppy, C., Staunstrup, I. and Leeuwen, J. V., 2009. Research Evaluation for Computer Science. Communications of the ACM, Vol. 52 No. 4, Pages 31-34, ACM.

The ethics are applied to all the fields. The data analytics is not an exception. The ethics should be strictly followed in the data analytics for the beneficiary of the both ends. The data request from the clients is basically due to lack of knowledge or the lack of data. Hence in both the cases the ethical aspects must be followed by the supplier. The data must be kept secret and the data request must be revealed to the third person. This will not only improve the trust but also the improve the good relationship. This is necessary for the good communication. The progressive data communication can be done with the with good ethics. It should always have good communication among the clients to maintain the data flow in either direction. Only the ethical thinking and the ethical principles will help us to do that. The identity of the person must be always disclosed for the data safety. It is necessary to follow this ethical step.

The data protection is also one of the ethical aspect of the data analytics. The data protection is not a easy task (Konstan and Davidson, 2015). But if the data provider is following the good ethics then it is possible have good data protection. Data privacy is also a good ethics. One must not lose the trust of the clients. The privacy is must in the field of the data analytics. Another hurdle facing in the ethical aspects of the data are the data timeline. This may be like the expiry of the data. The clients must not provide the expiry data to the persons who requested data (Khabsa and Giles, 2014). This is completely unethical point of view. Hence it is necessary to follow the data timeline. The ethical aspects are gaining much importance in the digital age and also in the improvement of the data quality.

Konstan, J. A., Jack, W. and Davidson, 2015, Should Conferences Meet Journals and Where?: A Proposal for ’PACM’. Communications of the ACM, Vol. 58 No. 9, Page 5, ACM.

Khabsa, M. and Giles, C. L., 2014, The Number of Scholarly Documents on the Public Web. PLoS ONE 9(5).

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Data Science And Prediction: Analysis, Transformation, And Feedback Loop ”

Get high-quality paper

NEW! AI matching with writer