Chapter 2Literature Survey and Problem DefinitionThis chapter provides the related work done so far in this section giving some preliminary definitions that will be used during this dissertation and focusing on conceptual aspects of distributed association rule mining systems. Section 2.1 gives a description of existing techniques and their methodology for distributed data mining. Section 2.2 describes the problem definition and objectives for secure distributed data mining system.2.1 Literature SurveyHere surveys of various existing approaches for secure distributed data mining are observed. Data mining technology has tremendous applications in the field of identifying patterns among tremendous quantities of data.
Data mining techniques and data ware- housing are used in almost every application of database, most of the data mining tools operate by collecting all the data into a centralize site, then applying data mining algorithm on that data. However, privacy concerns can prevent building a centralized warehouse, In case of distributed system data may be distributed among several custodians, none of which are allowed to transfer their data to another site.
Here homogeneous databases are assumed. All sites have the same schema, but each site has distributed information. The goal is to produce association rules that hold globally, while limiting the information shared from each site.Previous work in data mining has considered two settings. In first setting the data owner and the miner are two different entities, and second, which consist of data distribution among several sites or players, these players or sites jointly performs mining on the data held by those sites or players.
Kantarcioglu and Clifton [8] have proposed the protocol for securely computing union of each private subsets held by the different sites. The private subset of a given site includes the itemsets which are s-frequent in his own database. This implementation of is costly and its implementation depends upon cryptographic techniques such as commutative encryption, oblivious transfer.Yao [9] proposed the protocol for securely computing the union of private sub- sets at each site. The authors proposed a multi-party computation, which is the costly part of the system and in its implementation cryptographic techniques like encryption, decryption, commutative encryption, and hash functions are used. The use of such cryptographic techniques improves communication cost and computation cost. In the existing systems discussed so far these techniques causes some leakage of information. Thus there is need for preventing information leakage. The proposed system overcomes this problem of information leakage.In the existing systems [1], [5] the protocol for securely computing the union of private subsets at each site in the transaction is suggested. Here a multi party computation is considered and in it’s implementation cryptographic techniques like encryption, decryption, commutative encryption, and hash functions are used. In these systems it is hard to mine association rules through security assumptions in addition it reveals the data during the mining process. It is not possible to mine globally valid results from distributed data without revealing private information. Secure distributed association rule mining is costly in terms of computational cost and communication.In UNIFY-KC algorithm the fake itemset is added and then removed from item- sets [1] . It adds overhead in computation, where as this overhead is reduced in AES algorithm [6-8]. In this paper for experimentation the data has been partitioned horizon- tally so that it can be distributed on different sites. Data partitioning techniques are suitable for dealing with the problems in handling large data set. Round robin partitioning, range partitioning and hash partitioning are some of available horizontal data partitioning techniques [14]. Round robin is the partitioning strategy that partitions data set with balanced class distribution. Using RR data skew is also reduced.Kantarcioglu and Clifton [6] implemented the sub-protocol for the secure computation of the combination or union of private subsets which are held by the different players or sites. (The private subset of a given player or site, includes the itemsets that are s-frequent in his partial database.)The use of such cryptographic techniques improves communication cost and computation cost. In the existing system these cryptographic techniques are used it causes some leakage of information, therefore it is not perfectly secure. Thus the union of private subsets is not perfectly calculated, so the system is proposed to overcome with this problem.In the existing system the protocol for securely computing the union of private subsets at each site in the transaction is implemented. Thus knowledge discovery on this high volume data becomes slow. In these systems it is hard to mine association rules through security assumptions in addition it reveals the data during the mining process. As discussed earlier it is impossible to mine globally valid results from distributed data without disclosing private information. Secure distributed association rule mining is costly in terms of computational cost and communication.As per Alex and Freitas [7] the speedup of data mining system improves form O(n) to O(n/k) through implementation of distributed approach. Also due to use of distributed or parallel association rule mining techniques the speedup can be achieved. As Rakesh Agrawal [8] proposed mining of association rules. 2.2 Problem Definition2.2.1 Problem statement The aim is to design an algorithm to enable handling of large data sets using available computing facilities. To design the algorithms to accelerate a mining process on large data sets. Implementation of secure Distributed data mining algorithm. To design secure multi-party algorithms that computes the union of private subsets that each of the interacting sites hold, and to design algorithm to test the inclusion of an element held by one site in a subset held by another.2.2.2 Objective Implementation of Distributed data mining algorithm for mining on large data set. Acquiring speedup in computation process by utilizing resources available in distributed system. Provide security in distributed computing environment. To design a system to improve simplicity, efficiency and privacy. To design a system that does not depend on commutative encryption and oblivious transfer.2.2.3 Proposed SystemIn the proposed system the problem of secure computation of union of private subsets of sites is addressed. In the proposed system the database is distributed horizontally among various sites in transaction. Round robin technique is used for Horizontal distribution of Data sets to reduce the data skew. 2.2.4 Statement of the Scope Advances in computers and their use in all walks of life has resulted in tremendous growth of data. This data is available from transactions due to computerization of banking, businesses, hospitals and other government organizations. This explosion of data needs new techniques to transfer this data into valuable knowledge. The business need for tools to handle the data flood has risen piercingly in the last years to extract useful data patterns that can be used to assist doctors in better understanding effects of a treatment, analyze a scientific experimentation; improve a business process with data mining to reduce costs, enhance research and increase sales. It also helps businesses in detecting fraud, measuring and improving program performance. The proposed system is concerned with maintaining security while mining of association rules in distributed database, where input is transaction database and output will be set of association rules. Figure 2.1: Proposed system including AES encryption and decryption techniques
Remember! This is just a sample.
You can get a custom paper by one of our expert writers.
Get your custom essay
Helping students since 2015
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download