A Biotech Company specializing in the production of vaccines needs a system upgrade. Being the appointed Senior Software Engineer, I’m assigimplement the organization’s new enterprise search system. The system upgrade aims to control the rising number of internal documents.
The System Necessities Include:
In This Report, Three Possible Solutions Are Analyzed:
Solution 1: Creation of an in-house search engine implementing Boolean retrieval.
Solution 2: Come up with an in-house search engine based on the Vector Model.
Solution 3: A website that allows the crawler of commercial Web Search Engines. It can capture and index the documents.
Currently, we are living in a digitalized era where retrieving information from a compilation of resources has become critical to all aspects of life (Azad and Deepak, 2019). Information retrieval (IR)is a discipline that concerns itself with structuring, analyzing, organizing, storing, searching, and retrieving of information.
The information retriva is a systematic approach introduced by some (information need) query (Bahri, 2021). Highlighted below are the steps tp the process.
IR model can either be Best Match Models (The Vector Space Model) documents retrieved based on relevance (DeepAI, 2019) or be Exact match model (The Boolean Model) documents are either retrieved or not (Borisov, et al, 2018).
IR is evident in Information filtering, Recommender system Search Engines, and digital libraries.
The Boolean Retrieval Model is among the ancient retrieval models known. According to Azad and Deepak (2019), the Boolean Retrieval model is a mode for information retrieval that accepts any question in a Boolean expression format in which terms are combined with operators (AND, OR, and NOT). The Boolean Retrieval model applies the exact matching method (George, 2018) to satisfy any user’s question matching (locating) all documents that are deemed relevant to the words specified in the question.
The purpose of the Exact-match model is to classify a group of documents into two categories; one is those matching the criteria of the query, two ones failling to match the query (Günther, et al 2019) The documents retrieved are not graded rather they are categorsed by a specific criteria like document Id number. Exact match retrieval is an easy and effective way of retrieving information. However, there exists scalability (increase in size of collection) limitations to this model.
Gysel, et al, (2018) defined the Vector Space Model also called the Term Vector model is an algebraic model used in representing text documents or any other objects, as vectors of identifiers, for example index terms. Leung, Lee, and Song, (2019) took in a further detailed approach in definition of the vector space model founded on its process. Vector-space model demonstrates everything (queries and documents) as vectors of weight. The purpose of the model is to classify vectors to queries based on their resemblance.
There are several approaches to measuring similarities including: Cosine Similarity, inner product, Dice similarity, Jaccard similarity, Bahri, (2021).
According to Azzopardi, et al, (2018). the aim of a general-purpose search engine is indexing a sizeable share of the Web, that is independent of the topic and domain.
There are three main components in a search engine working towards retrieving a user’s information need. These are: Crawler, Indexer and Query Engine.
It is aso know as robot or spider. It heps in collectin of documents by repeatedly collecting links from start pages (Azad and Deepak, 2019). Pages retrieved and their components are reduced and stored in a page repository. The URLs and their respective links are transferred to the crawler control model.
The pages previously collected by the crawler are then precessed by the indexer (Lin, and Ma, 2021).
The query engine is used to process user queries and retrieving ranked matching answers (documents).
The documents retrieved from the query engines are then ranked in order of importance using these three approaches collectively: anchor, link, and content (similarity).
Pros
Cons
Pros
Cons
Pros
Cons
The Boolean retrieval technique follows the Exact-Match model, yet this is a precise method of retrieval (Azad and Deepak, 2019). Its concept either matches or does not match the query. Aso, there is the risk of the system returning either too many or very little results depending on the operator used, which results potentially in an ineffective query.
The vector space model uses weights to classify documents according to their similarities (relevance) (Abu-Salih, 2018). This approach eradicates the issue (retrieval of too many or very little documents) with Boolean retrieval model by use of a ranking system inorder to retrieve the most appropriate documents to the user’s query.
There are several problems that exist in regard to the search effectiveness of Web Search Engines. They include:
Even if employees upload high standard and particular documents to the Web, there is no affirmation that when an employee needs to extract a document;
Boolean Retreival Model
Due to the structure of Boolean querying in with respect to operator usage, this method would need users to have some eve of skill or idea of formulating queries in such a way that they know specifically what document(s) they are looking for (Azzopardi, et al, 2018). However, it isn’t sure that contempated users (employees at the biotech company) can effectively formuate Boolean queries that bring back the documents they are looking for when needed.
Vector Space Model
Users written down queries in nature language. An approach that is more user friendly to querying that formulating Boolean queries catering to all employees.
Web Search Engine
It is user friendly and intuitive, given that all employees are familiar with how common search engines like Google work.
Boolean Retreiva Model
Creation and maintainance of such a system would attract high development costs.
Vector Space Model
High development costs would be inccured in creating a VSM IR system (Priyadarshini, et al, 2010).
Web Serach Engine
It is the greatest cost efficient model.
Boolean Retreival Model
In-house search engine gets rid of the security dangers of information retrieval through the web. Similarly, systems such as Diaog and Westlaw, all feature a data role based identity features like user authentication ensuring secure collection of documents from all unwanted users (Rehma, Awan, and Butt, 2018). All these must be implemented during the system development.
Vector Space Model
In-house search engine does away with the security risks of information retrieval through the web (Zamani, et al, 2020). Similry, systems such as Elasticsearch feature data role based identity features like user authentication ensuring secure documents collection from unwanted users. All of which has to be implemented during development
Web Research Engine
Due to the reliance on retrieval of information through the web current day, itis no surprise that there are security matters. Using popular search engines such as google returns mutitudes of URLs from a sole query. However, some of which encompass your security, for example, phishing sites and Trojan sites (Priyadarshini, et al., 2010). While there are few ways to moderate the danger of attack by uploading private information to a website for the purpose of retrieval through the web makes you prone to many security breaches.
Scoring scale, 1- possible lowest score and 5- highest possible score.
Possible Solutions |
|||||||
Criteria |
Weighting |
Solution A |
Solution B |
Solution C |
|||
Score |
Total |
Score |
Total |
Score |
Total |
||
Search Effectiveness |
5 |
3 |
15 |
5 |
25 |
4 |
20 |
Usability |
5 |
2 |
10 |
4 |
20 |
5 |
25 |
Budget |
4 |
3 |
12 |
4 |
16 |
4 |
16 |
Security & Role-based identity |
5 |
5 |
25 |
5 |
25 |
3 |
15 |
Total |
62 |
86 |
76 |
On the basis of the outcome from the Weighted Decision Matrix above Solution B – The Vector Space Model acquired the greatest score. It met all the set requirements getting perfect scores in Usability and Security, Search Effectiveness, and Role-based identity. Because of the following reasons, it is ideal to develop an in-house search engine that is based on the Vector Space Model for the biotech company:
Conclusion
The aim of this report was to evaluate the three possible Information Retrieval solutions namely Solution A – in-house search engine that is based on Boolean Retrieval Model, Solution B – in-house search engine that is based on the Vector Space Model and Solution C – Uploading internal documents to a Website by use of use of a Commercial Web Search Engine for information retrival to upgrade the system for a biotech company due to an increase in internal documents.
The systems solution had to satisfy the following four criteria: Search effectiveness, Budget and Security, Usability, and Role-based identity. The report further features a practical background of Information retrieval and every solution highlighting the key attributes. Then, it analysed each solution on the basis of specified system requirements. It positioned each solution utilizing the weighted decision matrix in order to determine the best suited solution.
In conclusion, developing an in-house search engine based on the vector space model of information retrieval was most appropriate considering the system needs of the biotech company.
References
Abu-Salih B, (2018), Applying Vector Space Model (VSM) Techniques in Information Retrival for Arabic Language.
Azad, H.K. and Deepak, A., 2019. Query expansion techniques for information retrieval: a survey. Information Processing & Management, 56(5), pp.1698-1735.
Azzopardi, L., Thomas, P. and Craswell, N., 2018, June. Measuring the utility of search engine result pages: an information foraging based measure. In The 41st international acm sigir conference on research & development in information retrieval (pp. 605-614).
Bahri, S., 2021. Aplikasi Pencarian Bahan Pustaka Di Perpustakaan Menggunakan Metode Vector Space Model. JIMP-Jurnal Informatika Merdeka Pasuruan, 5(2).
Billal S, (2018), Development of search engine based on vector space model, A Dissertation in Fulfillment for the Requirements of the Degree of Master in Computer Science.
Borisov, A., Wardenaar, M., Markov, I. and de Rijke, M., 2018, June. A click sequence model for web search. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (pp. 45-54).
(2019). Cosine Similarity. [online] Available at: https://deepai.org/machine-learningglossary-and-terms/cosine-similarity [Accessed 4 Nov. 2021].
George, M., 2018, September. Unsupervised Topic Detection based on 2D Vector Space model using Apriori Algorithm and NLP. In 2018 Thirteenth International Conference on Digital Information Management (ICDIM) (pp. 279-283). IEEE.
Günther, F., Rinaldi, L. and Marelli, M., 2019. Vector-space models of semantic representation from a cognitive perspective: A discussion of common misconceptions. Perspectives on Psychological Science, 14(6), pp.1006-1033.
Gysel, C.V., De Rijke, M. and Kanoulas, E., 2018. Neural vector spaces for unsupervised information retrieval. ACM Transactions on Information Systems (TOIS), 36(4), pp.1-25.
Leung, C., Lee, W. and Song, J.J., 2019. Information technology-based patent retrieval models. In Springer Handbook of Science and Technology Indicators (pp. 859-874). Springer, Cham.
Lin, J. and Ma, X., 2021. A few brief notes on deepimpact, coil, and a conceptual framework for information retrieval techniques. arXiv preprint arXiv:2106.14807.
Priyadarshini, S. Aishwarya and A. Ajaaz Ahmed, “Search engine vulnerabilities and threats – a survey and proposed solution for a secured censored search platform,” 2010 International Conference on Communication and Computational Intelligence (INCOCCI), 2010, pp. 535-539.
Rehma, A.A., Awan, M.J. and Butt, I., 2018. Comparison and evaluation of information retrieval models.
Zamani, H., Dumais, S., Craswell, N., Bennett, P. and Lueck, G., 2020, April. Generating clarifying questions for information retrieval. In Proceedings of The Web Conference 2020 (pp. 418-428).
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download