XML has going the standard manner for stand foring and transforming informations over the World Wide Web. The job with XML paperss is that they have a really high ratio of redundancy, which makes these paperss demanding big storage capacity and big web band-width for transmittal.
Because of their widely used, XML paperss could be retrieved harmonizing to obscure questions from naif users with hapless background in composing a good question.
This study tries to get by with the old two jobs by planing a system with two phases.
The first phase is the design of a new compaction technique called XQPoint. This technique separates the XML papers into containers and compresses each container utilizing a back-end compressor which is suited to the type of the information in this container. The other portion of the proposed system is to plan the obscure question processor which separates each question into different sub-queries and recover relevant information from the tight XML paperss consequently.
Merely the most relevant information will be decompressed and returned to the user. This research expects that the XQPoint will accomplish better compaction ratio and the question processor will be the first processor to cover with tight XML paperss to recover information harmonizing to obscure questions.
The extensile Markup Language ( XML ) is a W3C criterion which adopted and sustained by both the industry and the research community. In the recent old ages, we have witnessed an increasing volume of XML digital information that either created straight as an XML papers or converted from another type of informations representation.
The importance of XML comes from different factors, its ability to stand for different informations types in one papers, work outing the job of long-run handiness, and to going the solutions to interoperability job. ( Al-Hamadani et al. , 2009 )
Due to the reproduction of the XML scheme in each record, XML papers is considered to be one of the self-describing information files, which means that these sorts of files have a batch of informations redundancy in both of its tickets and properties ( Ray, 2001 ) . For all the above grounds the demand to compact XML paperss going progressively dramatic. Furthermore, an extended demand evolved to recover information straight from the tight paperss and so uncompress merely the retrieved information. ( Ferragina et al. , 2006 )
Because the broad scope of XML paperss use and with different sorts of users, it is become an of import issue to cover with all sorts of questions. Some of these questions may hold imprecise restraints which can non be processed straight due to the grammar restriction of the question languages. However, these types of questions, which are known as obscure questions, appear to be common when the users of the XML paperss have a small cognition about the papers construction, or they lack the accomplishments on how to compose a precise and meaningful question.
The remedy to the old two quandary is to plan a compaction technique that has the capableness to recover information from the compressed version harmonizing to obscure questions. Two types of XML compressors have been used. The first type is the non-queriable compressors which used to compact XML paperss for archival intents. The 2nd type is the queriable compressors which used to question the tight XML paperss. All the compressors belonging to the 2nd type did non work out the job of obscure questions.
This study proposes a new XML compressor technique called “ XQPoint ” which consists of two phases. In the first phase, it separates the informations portion of the XML papers from the construction portion, so compresses the informations portion utilizing suited compressors depending on the type of the informations, while the construction portion is compressed utilizing the fixed-point dictionary-based compressor. The 2nd phase is to treat the obscure questions by break uping them into multiple sub-queries, retrieve information from the tight XML papers harmonizing to each sub-query, unite the retrieved consequences into groups, and eventually return merely the most relevant groups.
This subdivision describes the background of the research. It composes of three parts ; XML compaction techniques, a brief definition to XPath and NEXI, question types, and replying obscure questions.
The first sub-section describes the differences between XML compressors, gives a brief description of some of these compressors, and sets a comparing between them. Since our system will utilize NEXI question linguistic communication, a brief description will be given about the construction of this type of quires. We will discourse the types of questions and consternate on the obscure question type.
Holocene, big Numberss of XML compaction techniques have been proposed. Each of which has different features. This subdivision discusses the differences between these compressors and their chief characteristics.
XML compressors can be classified into two categories either to be XML-blind or XML-conscious compressors. XML-blind or general purpose compressors deal with the XML papers as a traditional text papers disregarding its construction and use the general intent text compaction techniques to compact them. These techniques can be classified into two chief categories, either to be arithmetic compressors or dictionary compressors ( Augeri et al. , 2007, Augeri, 2008 ) . The arithmetic compressors represent each twine of characters utilizing a fixed figure of spots per character. PPM, CACM3, and PAQ are illustrations of this sort of compressors ( Moffat. , 1990, Cleary and Witten, 1984, Alistair et al. , 1998 ) . On the other manus, dictionary compaction techniques substitute each twine in the input by its mention in a dictionary maintained by the encoder. WinZip, GZIP, and BZIP2 are illustrations of this compaction category ( BZip2, 1996, GZip, 1992, WinZip, 1990 ) .
On the other manus, XML-conscious compressors try to use the structural behavior of XML paperss in order to accomplish better compaction ratio and less clip in comparative with the XML-blind type and to bring forth a useable XML compressed paperss without the demand to uncompress these paperss. XML-conscious compressors can be classified harmonizing to their ability to questioning the tight paperss into two chief sub-classes ; these are queriable and non-queriable compressors.
This sort of compressors showed good compaction public presentation but the resulting papers can non be queried without uncompressing it. The chief intent of these processors it to accomplish highest compaction ratio for archival intents. Examples of this type are:
XMill ( Liefke and Suciu, 2000 ) : This technique depends on compacting the construction ( i.e. tickets and properties ) of the XML papers individually from its informations by encoding the construction in a dictionary-based manner and so go throughing it to a back-end compressor. All elements and properties name are assigned with an whole number figure to be considered as a key to the lexicon. The informations portion grouped into containers depending on the type of that informations and its way from the root. Each container is compressed individually utilizing an appropriate compaction technique suited for the informations type in that container.
Millau ( Girardot and Sundaresan, 2000 ) : In order to compact the construction of the XML paperss, Millau takes advantage of the papers scheme if this scheme is available. It is an extension of the WBXML ( Wireless Application Protocol Binary XML ) format, which is designed to cut down the size of XML paperss for transmittal intents.
XWRT ( Skibinski et al. , 2007 ) : This technique has similar thoughts of XMill with a little difference. The ticket and attribute names in an XML papers, which are usually have high frequence within the same papers, are encoded by utilizing semi-dynamic lexicon. The XML papers foremost scanned to find the frequent ticket and properties and set them in the lexicon. Another scan to the papers should be processed in order to replace all the happenings of the words in the lexicon with their dictionary index.
RNGzip ( League and Eng, 2007 ) : This technique depends on compacting the XML papers on Relax NG scheme by [ 13 ] . First the scheme should be accepted by the transmitter and the receiving system. It acts as a key in the encoding and decoding procedure. Using this scheme, RNGzip builds a tree zombi for the specific XML papers. Merely small information should be transmitted and the receiving system so reconstructs the complete XML papers.
The chief end of this type of compressors is to supply the ability to the tight version of the XML papers to be queried without decompress them. The compaction ratio for these compaction techniques is much lower than the blind-XML or the non-queriable techniques. However, these techniques are of import when covering with resource-limited applications and with Mobiles. In the following subdivision a brief description to some of these techniques will be given.
XGrind ( Tolani and Haritsa, 2000 ) : In 2000, Tolani et Al. introduced the first queriable XML compressor that has the ability to question the tight file without full decompress it. It is considered to be a homomorphic compressor in which the compressed XML papers can be viewed like the original XML papers except that its ticket, elements and attribute names are replaced with their corresponding encryption, which is a dictionary-based encryption. The informations portion of the papers is encoded utilizing Huffman encoding. For the intent of questioning the tight papers, XGrind ‘s question processor finds the simple way to look into whether it satisfies the way in the given question. The chief drawback with XGrind is that it can manage merely exact-match and prefix lucifer questions on the tight paperss.
Xpress ( Min et al. , 2003 ) : This technique uses the contrary arithmetic encoding method to encode the labels and waies of the XML papers. Alternatively of stand foring each ticket as a alone identifier, as XGrind did, Xpress encodes a label way as a distinguishable interval between 0.0 and 1.0. To encode the informations portion of the XML papers, Xpress utilizations different compaction techniques depending on the type of the informations and without the demand to the human interface.
XQZip ( Cheng and NG, 2004 ) : Unlike XQueC, XQZip groups the XML information into blocks and so applies gzip compressor on them. To treat questions, it decompresses a specific block in order to recover its contents. This technique removes the extra structures occur in an XML papers in order to better question public presentation. Although XQZip procedures different types of XPath questions, it is slower than other compressors because of its partial decompression.
XSeq ( Lin et al. , 2005 ) : This technique adapts Sequitur, which is a grammar-based text compaction, to compact the containers. Sequitur is a linear-time algorithm that makes a context-free grammar for the input twine. XSeq uses this grammar to treat the information values that match the given question and avoid scanning irrelevant informations. Furthermore, the context-free grammar gives the ability to XSeq to treat questions without even partial decompression.
XQueC ( Arion et al. , 2007 ) : This technique uses the separation between informations and construction of XML paperss. The information stored on containers harmonizing to their way location within the papers. Each container component is separately compressed. This procedure will positively impact the retrieval technique, since the complete container could be retrieved as a response to a question. With this thought, XQueC has the ability to treat more types of questions on the tight version without the demand of the partial decompression that has been used in some old compressors.
QXT ( Skibinski and Swache, 2007 ) : It is an extension of XWRT adding query-friendly constructs in order to treat questions by partial decompression. This technique scans the XML file twice. In the first base on balls, a dynamic-dictionary created with the frequences of its points. This dictionary is stored within the tight file. In the 2nd base on balls, QXT encodes the informations and topographic points them into the containers. When the size of a specific container exceeds a given threshold, the container should be compressed utilizing a all-purpose compressor and written to harrow. To treat a question, QXT- question processor foremost searches the lexicon to find which container should be decompressed. After uncompressing a specific container, merely the relevant information will be decoded to XML format.
XPath is an look linguistic communication non a scheduling or question linguistic communication per Se ( Kay, 2008 ) . Its chief object is to return a node or several nodes from an XML papers harmonizing to a specific look. XPath ‘s three informations theoretical account classs and the three operations classs are the chief edifice block of XPath.
Harmonizing to ( Kay, 2004 ) , a typical way look in XPath, consists of a sequence of stairss, separated by the «/» operator. Each measure works by following a relationship between nodes in the papers ( Holman, 2002, Andrew Watt, 2002 ) . Furthermore, there are several marks that are used in XPath look, such as:
Path looks therefore provide a really powerful mechanism for choosing nodes within an XML papers, and this power lies at the bosom of the XPath linguistic communication ( Kay, 2004, Sigurbjornsson and Trotman, 2003 ) .
Narrowed Extended XPath I ( NEXI ) is an XML question linguistic communication that follows the stairss of XPath with some alterations. First, the NEXI retrieval engine designed to infer the semantics from the question in contrary to XPath which has predefined semantics. Furthermore, NEXI extended the usage of the contains ( ) map, which is used by XPath to bespeak an component that is incorporate a specific content, to be about ( ) map to bespeak the component to be about the content. This alteration allows NEXI to treat fuzzed questions. The linguistic communication has extensions for inquiry answering, multimedia searching, and seeking heterogenous papers aggregations. ( Trotman and Sigurbj?ornsson, 2005 )
Which requires a certain context ( i.e. , way ) should be relevant to a specific content description ( i.e. , cont ) ( Trotman and Sigurbj?ornsson, 2005 ) .
There are different types of questions and most of them had been processed by the old compaction techniques in order to recover information from the tight XML paperss ( Lin et al. , 2005 ) . Table 2 shows that chief types of questions with brief description for each one.
Since obscure questions are the cardinal issue in our research, this subdivision will give a brief description on such questions and how can be appeared in information retrieval sphere
Many imprecise and unsure informations exist in the existent universe. Since it is of import to reply any user ‘s question with exact or approximative replies even if these questions have obscure conditions, the demand to treat obscure questions is increased quickly. ( Zhao and Ma, 2009 )
Vague logic is the generalisation of fuzzed logic ( Kumar and Biswas, 2009 ) . Harmonizing to obscure set theory by ( Gua W. L. and Buehrer, 1993 ) , obscure hunt is a combination of the undermentioned hunt techniques:
Vague set ( VS ) is a combination of two sets: ( 1 ) ‘evidence for ‘ , or truth rank tantalum ( x ) for the component ten in the vague set A, and ( 2 ) ‘evidence against ‘ , or false rank fa ( x ) for the component ten in the vague set A, such that:
Furthermore, each rank µ ( u ) in a obscure set A should be graded by the subinterval [ tau, 1-fAu ] , i.e 0=µ ( U ) =1. ( Liu et al. , 2008 )
There are several sorts of questions that considered being vague. Some of these sorts are:
Although obscure questions have been processed before, utilizing different attacks, all of them were covering merely with the original XML paperss. Some of these attacks depend on the tree form of the XML papers ( Sihem Amer-Yahia et al. , 2002, P. Mark Pettovello and Fotouhi, 2006 ) , while others depend on break uping the obscure question into two sub-queries and recover information depending on the nested tickets that distinguish XML paperss from other text paperss. ( Vojkan Mihajlovi?c et al. , 2006, Pehcevski, 2006, Andrew Trotman and Mounia Lalmas, 2006 )
Neither the tree construction nor the nested tickets still exist in the tight XML paperss. This makes it impossible for the bing techniques to recover information from the tight paperss.
Many systems presents are change overing the field XML paperss to a tight one before they answer the user ‘s question, but none of them handled the obscure question. Table 3 shows some illustrations of well-known XML compressors and the question types they processed. Figure 1 explains the experiment that has been done on all the queriable XML compressors.
For this ground, our research is concentrating on how to manage a obscure question in recovering information from a tight XML papers.
Harmonizing to the literature reappraisal, the chief purpose of this research is to develop a system that solves the job of recovering information from compressed files harmonizing to obscure questions. The aims drawn from this research are:
The proposed system consists of two chief phases. The first 1 is planing a new XML compaction technique named XQPoint which converts the normal XML paperss to a tight version. The 2nd is planing a retrieving technique that processes the NEXI obscure questions type in order to recover the relevant information from the tight papers consequently. The following two paragraphs describe the construction of the old two phases.
The undermentioned subdivision describes foremost how the XQPoint dainties each portion of the XML papers, so explicate its architecture, and the set of informations that will be used to prove the compressor portion of the system.
XQPoint compressor treats the construction portion of an XML papers in different mode than handling the informations portion of the papers. Figure 2 shows that the XML papers should be analyzed foremost in order to divide its constituent into different containers. Each component or property name is associated with a alone brace of Numberss [ IDpre, IDpost ] . This process is called structural identifiers, which has been used in some questioning techniques ( Al-Khalif A et al. , 2002, Halverson et al. , 2003, Grust, 2002, Paparizos et al. , 2003 ) . IDpre represents the order of the node under the preorder traverse of the tree, while IDpost represents the order of the node under postorder traverse. In this manner the place of each node within the complete XML tree become recognizable. Figure 3 shows a sample papers with the node ‘s structural identifiers. The list of all brace of identifiers so are encoded into a binary codification with 2* log2 ( N ) spots for each node, where N is the figure of elements in the papers. So, the entire size of spots needed to hive away all the elements is N*2* log2 ( N ) spots.
In order to compact the informations portion of the XML papers, XQPoint separates the information of the papers into different containers harmonizing to their way place ( an encoded way ) from the root and type of these informations. Each of these containers is compressed utilizing different encoding techniques as follows:
Is used to
Where n represents a information word length, ASC is the ASCII codification of any missive in the word, and I is the missive ‘s place in the word.
To prove XQPoint compaction technique, we should take a set of different types of XML paperss. These paperss should be in different sizes, figure of tickets, figure of nodes, the deepness of the longest way, and the informations ratio ( DR ) which is:
Where, DRd is the informations ratio for the XML papers ( vitamin D ) , ( D ) is the information, and ( Si ) represents the size of the XML papers.
Harmonizing to their chief features, XML paperss can be categorized into three types: ( Maneth et al. , 2008, Sakr, 2009 )
The 2nd portion of the proposed system is the obscure question processor. As shown in Figure 2, the question should be manipulated by the question processor portion of the XQPoint architecture. The construction of this portion is adopted from the question decomposition technique proposed in ( Al-Hamadani et al. , 2009 ) , which decomposed the obscure question into two parts, QCO which refers to Content-Only retrieval, and QCAS which refers to Content-And-Structure retrieval.
As you can see in Figure 4, the question processor manipulates each question through different stairss. The first measure is the query decomposition measure, which separated each question into different sub-queries. Figure 5 depicts an illustration of a NEXI obscure question that passed through this measure and be decompressed into four sub-queries.
The 2nd measure is the sub-query relaxation, where each sub-query is manipulated individually harmonizing to a specific XML papers. The relaxation procedure could be made either by altering the node sequence, adding more nodes, canceling some nodes, or altering some attribute values. A threshold should be attached to each sub-query to in order to find the degree of relaxation that is made to it. The lower threshold means low degree relaxation ( i.e. fewer alterations ) , while the higher threshold means high degree of relaxation.
The 3rd measure of the question processor is recovering the tight XML papers harmonizing to each relaxed sub-query. These retrieved paperss are ranked harmonizing to the attached sub-query threshold. The concluding measure is to group the retrieved paperss depending on the chief NEXI question. The Top-K graded paperss are decompressed and returned to the user.
As the increasing importance of XML use in hive awaying and reassigning informations via the World Wide Web, there is an increasing demand to diminish the size of XML paperss and to cover with these paperss in their tight manner. And as XML paperss are spread, their users are changing from an expert with strong questions to a naif user with obscure questions. Due to the old grounds, there is an increasing demand to plan a system that has the ability to accomplish both, compacting the XML papers and recovering the most relevant information harmonizing to obscure questions. This study proposes such a system that develops a new compaction technique ; XQPoint which separates informations from the construction of XML paperss and so compresses each portion as applicable. Following, the obscure question processor is used to break up the obscure question, procedure each sub-query individually, and so unite the retrieved consequences.
We expect that this technique achieve high compaction ratio and expeditiously retrieve information from the tight version.
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download