For a department, the provided data contains login and access data for all the 20 users.This project’s aim is to develop a profile for each user in a department that is profiling the user for authorization and authentications. Based on the data, the users need to develop a user profile in the department. Generally, the profile indicates the login, login off, access patterns and session time patterns. Each profile contains start time, duration, resource accessed and performedoperations types. Here, the resources indicate the printer, network, file and computer. The access pattern indicates the executed user programs; file accessed for update and read, and printer usage, file size and library programs. The user profiles are determined by using any association rules with sufficient support. The development of user profile needs to avoid the outliers, it over fits and also uses the data mining security techniques to provide effective user profiles, for a department. All these will be analyzed and discussed in detail.
The aim of this project includes to develop a profile for each user in a department that is profiling the user for authorization and authentications. The development of user profile is done by using Weka tool. In Weka data mining tool, the association rules are used to develop the user profile in a department. Also, the analysis of the data mining security techniques to provide the effective user profile for a department will be covered.
The provided data set is divided into three data sets like login pattern, e-mail pattern and resource usage pattern and the data set contains the following attributes such as, user ID, user program ID, library program ID, library utility ID, File ID, printer ID, e-mail program ID, Time, date and host machines.
Weka is a stage to apply the machine learning method, to deal with separatingthe information. Weka refers to Waikato Environment for Knowledge Analysis. It is an open source programming and is affirmed with GNU and overall population License. To do information investigation and data mining process, Weka instrument is favored. The primary reason is that, it has five creators which are as follows(Bifet, 2010):
The data mining process comprises of extraction of data from direct database assets. All through this procedure, the concealed data is additionally recovered. The information digging instruments are utilized for foreseeing the future patterns, information conduct social developments, enabling business to settle on decisions(Bouckaert, 2004).
The Decision tables resemble the neural sets and decision trees. This kind of characterization calculation is utilized to foresee the information. This model is incited in Weka along with the machine learning algorithms. The hierarchical table is utilized inside the decision table and the information is entered. Each section of information will be put away as the key esteem matches. In the abnormal state tree, the extra information traits were put away in another table. The structure of decision table looks like dimensional stacking. To acknowledge the model and permit the characteristics, the representation strategy is connected to oversee new properties. To do representation plans, various collaboration types were utilized. Thus, it is considered as the most valuable representation procedure instead of other static plans.
In light of the given condition, the given activities will be completed outwardly or graphically. It is called as Decision tables. These calculations admission the programming dialects, for example, switch case articulations and if-then-else conditions. Each decision is individually displayed to the variable. These qualities are connected and anticipate the conceivable qualities through given imperatives. The activity is to be performed by the individual activities. As per the given limitations, the activities to be performed and each passage of key information esteem combining will be finished. When taking decisions, the condition isn’t relevant, at that point they don’t give it a second thought and image is utilized. So in the decision table, the esteem can be clear or hyphen. It means that, the decision isn’t taken or fragmented based on basic leadership process. A portion of the decision tables utilizes thetrue or false incentive to speak to the decision condition. It is considered as adjusted or inadequate(Kaluza, 2013).
The practical work on Decision table rule is to analyze the User profiles and develop the user profile.
Decision Table:
Number of training instances: 213
Number of Rules: 28
Non matches covered by Majority class.
Best first.
Start set: no attributes
Search direction: forward
Stale search after 5 node expansions
Total number of subsets evaluated: 45
Merit of best subset found: 73.709
Evaluation (for feature selection): CV (leave one out)
Feature set: 3,6,8,10,2
Time taken to build model: 0.11 seconds
=== Evaluation on training set ===
Time taken to test model on training data: 0.05 seconds
Correctly Classified Instances 178 83.5681 %
Incorrectly Classified Instances 35 16.4319 %
Kappa statistic 0.8265
Mean absolute error 0.0732
Root mean squared error 0.1675
Relative absolute error 73.3946 %
Root relative squared error 75.0281 %
Total Number of Instances 213
a b c d e f g h i j k l m n o p q r s <– classified as
12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = U02
0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | b = U03
0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = U05
0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | d = U07
0 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | e = U09
0 0 0 0 0 4 0 0 0 0 0 3 4 0 0 0 0 0 0 | f = U11
0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 | g = U04
0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 | h = U06
0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 | i = U08
2 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 | j = U10
0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 | k = U01
0 0 0 0 0 1 0 0 0 0 0 5 5 0 0 0 0 0 0 | l = U12
0 0 0 0 0 0 0 0 0 0 0 3 8 0 0 0 0 0 0 | m = U13
0 0 0 0 0 0 0 0 0 0 0 3 2 6 0 0 0 0 0 | n = U14
1 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 | o = U15
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 0 0 | p = U16
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 | q = U17
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 | r = U18
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 | s = U19
Decision Table:
Number of training instances: 388
Number of Rules: 27
Non matches covered by Majority class.
Best first.
Start set: no attributes
Search direction: forward
Stale search after 5 node expansions
Total number of subsets evaluated: 43
Merit of best subset found: 67.784
Evaluation (for feature selection): CV (leave one out)
Feature set: 3, 2
Time taken to build model: 0.76 seconds
Time taken to test model on training data: 0.05 seconds
Correctly Classified Instances 281 72.4227 %
Incorrectly Classified Instances 107 27.5773 %
Kappa statistic 0.7079
Mean absolute error 0.0654
Root mean squared error 0.1565
Relative absolute error 65.7274 %
Root relative squared error 70.1906 %
Total Number of Instances 388
a b c d e f g h i j k l m n o p q r s <– classified as
22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = U03
0 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | b = U05
0 0 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = U07
0 0 0 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | d = U09
1 0 0 0 7 0 0 0 6 0 5 3 0 0 0 0 0 0 0 | e = U11
0 0 0 0 0 22 0 0 0 0 0 0 0 0 0 0 0 0 0 | f = U04
0 0 0 0 0 0 22 0 0 0 0 0 0 0 0 0 0 0 0 | g = U06
0 0 0 0 0 0 0 22 0 0 0 0 0 0 0 0 0 0 0 | h = U08
1 0 0 0 6 0 0 6 9 0 0 0 0 0 0 0 0 0 0 | i = U10
0 0 0 0 0 0 0 0 0 22 0 0 0 0 0 0 0 0 0 | j = U02
1 0 0 0 5 0 0 0 6 0 5 5 0 0 0 0 0 0 0 | k = U12
1 0 0 0 5 0 0 0 6 0 4 6 0 0 0 0 0 0 0 | l = U13
0 0 0 0 5 0 0 0 6 0 4 2 5 0 0 0 0 0 0 | m = U14
0 0 0 0 0 0 0 0 0 0 0 0 0 22 0 0 0 0 0 | n = U16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 0 0 0 0 | o = U17
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 0 0 0 | p = U01
0 0 0 0 5 0 0 0 6 0 5 5 0 0 0 0 0 0 0 | q = U15
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 | r = U18
Decision Table:
Number of training instances: 383
Number of Rules: 28
Non matches covered by Majority class(Stahlbock, Crone &Lessmann, 2010).
Best first.
Start set: no attributes
Search direction: forward
Stale search after 5 node expansions
Total number of subsets evaluated: 49
Merit of best subset found: 72.585
Evaluation (for feature selection): CV (leave one out)
Feature set: 3, 2
Time taken to build model: 0.31 seconds
=== Evaluation on training set ===
Time taken to test model on training data: 0.02 seconds
Correctly Classified Instances 297 77.5457 %
Incorrectly Classified Instances 86 22.4543 %
Kappa statistic 0.7621
Mean absolute error 0.0642
Root mean squared error 0.1541
Relative absolute error 64.6374 %
Root relative squared error 69.181 %
Total Number of Instances 383
=== Confusion Matrix ===
a b c d e f g h i j k l m n o p q r s <– classified as
19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = U05
0 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | b = U07
0 0 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = U09
0 0 0 13 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 | d = U11
0 0 0 0 28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | e = U02
0 0 0 0 0 28 0 0 0 0 0 0 0 0 0 0 0 0 0 | f = U04
0 0 0 0 0 0 28 0 0 0 0 0 0 0 0 0 0 0 0 | g = U06
0 0 0 0 0 0 0 28 0 0 0 0 0 0 0 0 0 0 0 | h = U08
3 0 0 2 0 0 0 14 8 0 0 0 1 0 0 0 0 0 0 | i = U10
0 0 0 0 0 0 0 0 0 18 0 0 0 0 0 0 0 0 0 | j = U01
0 0 0 0 0 0 0 0 0 0 18 0 0 0 0 0 0 0 0 | k = U03
0 0 0 6 0 0 0 0 0 0 0 13 0 0 0 0 0 0 0 | l = U12
0 0 0 6 0 0 0 0 0 0 0 10 12 0 0 0 0 0 0 | m = U13
0 0 0 6 0 0 0 0 0 0 0 6 0 7 0 0 0 0 0 | n = U14
0 0 0 6 0 0 0 0 0 0 0 10 0 0 3 0 0 0 0 | o = U15
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 0 0 0 | p = U16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 0 0 | q = U17
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 | r = U18
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 | s = U19
This project suggests an approach for developing a user profile by using association rules. The association rules are used to offer new rules from a data set and it finds connection between the data items in a data set. It is used to discover the frequency association in a provided data set and it produces the frequent item sets. The association rules are very useful for decision making and effective marketing. It significantly increases various application’s area such as it obtains the user profiles for web system personalization, knowledge extraction from software engineering and for finding the patterns in biological data bases. It easily evaluates the importance of the attributes on the classifications. It produces the most useful rules for predictions to discriminate the user profiles and it contains different values of the class attributes.
The data set totallycontains 213 instances.
Correctly Classified Instance |
178 |
83.5681% |
Incorrectly Classified Instances |
35 |
16.4319 |
If we observe the above instances, out of 213 instances 178 instances are correctly classified, which is a good sign of result and 35 instances are incorrectly classified.
Results on Login Pattern
The data set totally contains 388 instances.
Correctly Classified Instances |
281 |
72.4227 % |
Incorrectly Classified Instances |
107 |
27.5773 % |
If we observe the above instances, out of 388 instances 281 instances are correctly classified, which is a good sign of result and 107 instances are incorrectly classified.
Results on Resource Pattern
The data set totallycontains 383 instances.
Correctly Classified Instances |
297 |
77.5457% |
Incorrectly Classified Instances |
86 |
22.4543% |
If we observe the above instances, out of 383 instances 297 instances are correctly classified, which is a good sign of result and 86 instances are incorrectly classified.
According to the results, the email pattern, resource pattern and login pattern are developed and it provides effective user profiles in a department and it avoids the outliers and over fits in a department (Witten, Frank, Hall & Pal, 2017). The user profiles contain login pattern, machine usage pattern, email pattern, File access pattern, print usage pattern and program access pattern. The decision table is successfully determined and evaluates the total number of subsets in the user profiles. This result is able to refine the user profiles by adding a list of words to each pattern preferred by a specific user. Our goal was to integrate the prototype in an already existing personalization system. It could limit the new user effort asked for to another user by getting her to the fun part, while as yet learning data valuable to make great suggestions. We plan to apply data extraction procedures to find data about another user from the discourse she had with the specialist. In addition, we are assessing the likelihood of utilizing ontologies in catching information of user inclination, so as to get profiles that allude unequivocally to ideas of a standard metaphysics, and not only a rundown of words.
Conclusion
This project successfully developed the profile for each user in a department. The provided data contains, login and access data for all the 20 users in a department. Based on the data, the users’ user profile in a department issuccessfully developed,by using the data mining security techniques.
References
Bifet, A. (2010). Adaptive stream mining. Amsterdam: IOS Press.
Bouckaert, R. (2004). Bayesian network classifiers in Weka. Hamilton, N.Z.: Dept. of Computer Science, University of Waikato.
Kaluza, B. (2013). Instant Weka How-to. Birmingham: Packt Publishing.
Stahlbock, R., Crone, S., &Lessmann, S. (2010). Data mining. New York: Springer.
Witten, I., Frank, E., Hall, M., & Pal, C. (2017). Data mining. Amsterdam: Morgan Kaufmann.
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download