Bridge Performance Gap BetweenRelational and RDF

The framework of OWL generation from relational database | Download  Scientific Diagram

ABSTRACTA fascinating question which is to get greatest and appropriate consequence from querying the published HREF links on the web of documents are not comprehensible by using search engines along with advanced optimized options as well to find pages instead of just browsing like as navigation vs. Integrated and syntactic web of data is closed world assumption and it has very extensive unstructured data which is linked with means. This paper proposition an inkling of the two types of web of information one for the syntactic and other one of the semantic with the entire comprehensible necessity and feasibility of description and will be quick intro what is RDF moreover we will provide a recent description logic research queries will be checked as recursive (drill-up or drill-down) with RDF native querylanguages will be elaborated with semantic models, So that it does not only target as respective drawbacks of syntactic web of structured and semi-structured web but also important aspects of the RDF model and RDF notation

ABSTRACTA fascinating question which is to get greatest and appropriate consequence from querying the published HREF links on the web of documents are not comprehensible by using search engines along with advanced optimized options as well to find pages instead of just browsing like as navigation vs. Integrated and syntactic web of data is closed world assumption and it has very extensive unstructured data which is linked with means. This paper proposition an inkling of the two types of web of information one for the syntactic and other one of the semantic with the entire comprehensible necessity and feasibility of description and will be quick intro what is RDF moreover we will provide a recent description logic research queries will be checked as recursive (drill-up or drill-down) with RDF native querylanguages will be elaborated with semantic models, So that it does not only target as respective drawbacks of syntactic web of structured and semi-structured web but also important aspects of the RDF model and RDF notation.Keywords: RDF, Relational, Semantic Web, Syntactic Web, Jena, Virtuoso, URI, SPARQL,XML,un-structured, linked data.1.INTRODUCTIONThe Objective of Research is to take same data set(s) and transform into both RDF and SQL and check through same queries of performance comparison on both SPARQL and SQL regarding throughput and response time along with data size and achieve performance gap. Thispaper proposition an inkling of the two types of web of information one’s about the syntactic and other one is about of the semantic with the entire comprehensible necessity and feasibility of description and shall be quick intro what is RDF and what is it good for? Along with basic concepts of Resource, Properties, values, triples, statements triples, URIs and URIref with serializations of RDF graph and along with spanning the performance gap between relational and RDF data management, It depictsthat howlinked data between two resources and real world objects and what is an ontology mean semantic webof vocabulary and their alternative of stack mappings of the semantic and syntactic webof data how those stores data in (JENA, SEASAME, RDF BD, RED LAND, KOWARI, FORTH RDF SUITE, YARS, VIRTUSO) But most of them here will be simply used and configured Jena and Virtuoso for SPARQL query and in regards Relational Databases ( SQL, ORACLE, SQL LITE, MYSQL) here will be configured solely MySQL for query checking. In Methodology chapter will be targeted as to get same data set which is almost sized as 100M, and then convert that data set into with respective of SPARQL and SQL usage afterwards store data and check throughputtime on both one. After checking throughput than we check response time along with data sizes but being that response time we need respective data for the same queries to determine the performance and along with added indexed at both RDF data for SPARQL queries (JENA, VIRTUSO) and Relational data for SQL queries (MYSQL) .2.LITERATURE REVIEW2.1 Semantic Web ConceptRegarding conventional (Web of documents/pages), World wide web consortium is assisting to organize or built a technology to help a (WEB of Data) and according to Tim Bernslee he does scoped respective data named as semantic web which refers by W3C as a visualization of the (Linked data). The semanticshas connected with the meanings of words Statements are built with syntax rules, and relationships will be linked between data, things , resources but not among pages on the Semantic Web, it refersto the relationships between things such as: C has part of B and X has part of Z and properties such as: size, weight.2.2 RDF Concepts with distinct perceptionsIt was formerlystandardize and created in 1999 specially purpose was as XML for encoding metadata exactly as (data about data) after the modernized RDF specification in 2004, the scope of RDF has really turned into something better than before. The most thrilling uses of RDF are modernizednot just as encoding information but regarding relations between things, between Web of resources, between real world objects, concepts,places, etc., 2.3 Most of the key concepts uses of RDF are asGraph data model ,Vocabulary based as URIData-typesLiteral(s)Serialization syntax of XMLSimple facts ExpressionEntailment2.3.1 Graph Data Model A Collection of triples in RDF each one consists of (A subject), (A predicate and an object) a set of such triples called RDF graph. That can be depicted by as a node and directed arcsdiagram with a link RDF graph mostly it is conjunctions of (Logical AND) statements contains of all triple.2.3.2 URI-based vocabularyRDF uses the URI (uniform resource identifier) and how we identify things on the web since RDF is conceptually with basic triples or with Notations not it is a Syntax so we do know already that URL (Uniform Resource Locator) is like (http://www.dbpedia.org) of course not all URLs are URIs but the question is that how systems identify things through a web client agent over URI.2.3.3 Data typesData type consists the illustration of data as a floating points, integers, date(s) and also includes as a valuable space, comprises of (lexical space) and a (lexical to value) mapping.2.3.5 Simple Facts of RDF Expression RDF triple depicts the relationshipbetween two stuffs or things and also new blank node may have read: type ofproperty.Figure 1.Facts of RDF Expression.2.3.6 EntailmentThe entailment formal concept is expression as A is Said to be involved with an another expression B If both of the arrangement of things are possible in the domain then it make A true to beso A is Presumed then the truth B is inferred.Such as: in figure 1more triples will be added in RDF graph.2.4 OWL -Your Web ThesaurusThe OWL term on the semantic webis used as a richer description of the vocabularyof the language it proper classes and ties relations between (disjointness) classes as finality(exactly one) and equality, characteristics of properties such as: (symmetry), enumerated classes and richer type of property.2.5 Comparing RDF and SQL dataInitially we compare SQL Queries and structure with RDF Queries and see the difference but before that we understand the terminology that what is what. Both of languages give access to user can combine , Create consume structure data, as SQL does this in relational databases to access and RDF does this through a network of associated data (Using SQAPRQL can be done this) linked data can be disparate and merged source of data. Unlike semanticweb of data In Relational part of data it is made up of rows (composed into Objects) which mostly called in the terminologyof RDBMS as relations. Rows of data authorize to a set of data types and constraints by using schema generated for respective tables and subset called DDL which asserts that schema. How it works in SQL let see in the example2.6Structure of SPARQL and SQL Queries.Table 1. Structure of SPARQL and SQL Queries.SQLSPARQLSimple Select attribute listSELECT u.father_name, a.city FROM USERS AS u, address AS aWHERE U.address = a.ID AND a.state = `CHICAGO`;SELECT ?name ?cityWHERE{?Who <USERS#father_name > ?name ; < USERS#address > ?adrr .?adrr < Address#city > ?city ;< Address#state > `CHICAGO`}LEFT OUTER JOINSSELECT u.father_name, a.city FROM USERS AS uLEFT OUTER JOIN Address AS aON (u.addr = a.ID) WHERE a.state = `Chicago`;SELECT ?name ?cityWHERE {?who < Person#father_name > ?name.OPTIONAL{?who < Person#addr > ?adr.?adr < Address#city > ?city;< Address#state > `Chicago`}}father_namestatecityJason MuxlowCHICAGOUSAPeterChicagoNULL?father_name?state?cityJason MuxlowCHICAGOUSAPeterChicagoNow we checked in Table#1that in the SQL query statethat it has a same SELECT statementas in SPARQL in SQL In SQL conceptually Selecting a list of attributesfrom the tableand in where clause constraints capture relationship as U. address = a. ID and selection criteria is to choose specific statesof USA like a. state = `CHICAGO`;Itshowsterminator onthe last of Query but in SPARQL has terminator with respective statements SQL query has concatenation with dot and it is in a SPARQL showwith Question marks also SQLquery does not add tags in itaslike in SPARQL but rather than that worseor better SPARQL reuses some key words FROM, WHERE, SELECT, GROUP BY, UNION, HAVING and Aggregate functionnames too.2.6.1 LEFT OUTER JOIN and OPTIONAL, NULLIn SQL it uses Null to identify that data is not applicable or not available most of joins likeINNER join does not consider the NULL values it mean in INNER join NULL values of data will not be retrieved but in LEFT join it also shows NULL values in the left tableof data and it does not eliminate those columns of rows SPARQL uses keyword OPTIONAL as the placeof the SQL LEFT OUTER JOIN and in SPARQL it will not bind missing data.2.7 SQL -SPARQL Mapping using SPASQLSQL language is for querying relational data SPARQL is not designed to query relational data, but to query data as a graph-based onthe data model. RDF links built into it whereas the SQL queryexplicit primary and foreign key but instead of that SPARQL does as an implicit queryboth of SQL and SPARQL Queries can be tested on SPASQL it has the third toolforchecking the structureof queries.Table 2. SQL, SPASQL, Status.SQLSPASQLStatusFields/attributesRDF tripleRow/tupleNodeforeign key / primary keydata encoding detail by queryindexeslate-binding field nameSELECTSELECTimplementedSELECT COUNT(*) > 0ASKnotserialize RDF graph/triple patternsCONSTRUCTnotserialize RDF graphCONSTRUCTnottuple with attribute corresponding to ps p o data modelimplementedWHEREFILTERimplementedLEFT OUTER JOINOPTIONAL patternimplementedUNIONUNIONpartial, see UNION Limitationsnamed databases and federated querynamed graphsnotreturn tuple identifierDESCRIBETable Result ModifiersDISTINCTDISTINCTimplementedORDER BYORDER BY/GroupsimplementedLIMITLIMITimplementedOFFSETOFFSETimplementedOperatorssame|| && + -* / < < = > >=ImplementedIS NOT NULLBOUNDImplementedisIRIN/AisBlankN/AisLiteralN/AStrN/AlangN/Adatatypenot a dynamic questionlangMatchesN/Aregexregexnot3.METHODOLOGYInitially we took some open source data set(s) those were in format of Excel sheet and also in xml format we converted data through BSBM data generator [20] which has open source software to generate data and supports (N-Triples -snt, XML -s xml, (My-)SQL dump -s sql) formats, it has based on java language. But collected data was just in 25M limits size so we need more than that to benchmark therefore we explored and discovered some free open source data set(s) those which were sizes as 100M [20]. After that we got 10 Queries from Berlin SPARQL [19]. For RDF triple store data set but here we need also SQL same Queries toneed to be checked of MySQL results so we converted all 10 queries into SQL queryformat and then configured software MYSQL with assigned manually upload_max_filesize 700M /file size, post_max_size 800M, max_execution_time 700s, max_input_time 600, memory_limit 200M andthenalso configured Jena as bin/ directory path in an environment variableof Windows system and as wellas following commands in CLI mode.We took RDF data set(s) formatted and checked with SPARQL queries through both Jena and VirtuosoWe did run small sized data set as 50k, 250K, 1M, 5M, 25M but as growing data sizes of data sets Jena was getting too much time and on the 100M Jena was not applicable to respond therefore We did run 100M at Virtuoso and it has better result in huge data than Jena. We executed different data set sizes took first small 50k sized and counted average Query time execution and checked the same queryand same data set of performance at both Jena and MySQL and got the statisticsAfter that We got 250k data sizes and 1M data sizes and then 5M data sizes , 25M data sizes but here we got a problemin 25M of sizes data to run on a MySQL interfaceof Local host of phpMyadminof MySQL got the errorto responding and execution time exceeded than we run same Query on SqlYog interface but it was talking too much time anddidn’t respondand looks loading time out after that we decided to check on MySQL console directly than same query was responding good after that we decided to take all MySQL queries once againand check through MySQL console because interface results were so slow than We tried MySQL console here looks results were better than before and eventually we pulled data set of 100M into MySQL and We checked also throughput statistics data was huge it was calling for a long time and showing error of exaction time exceed and then we divided it into different sections and then imported toit and assigned indexes too product(producer),offer(product),offer(vendor)Review (product) and review (person) tables afterwards checked Queries results. 4.Schema Normalized/Demoralized of Jena5.MAP CONVENTIONAL XHTML WITH RDFSo we try to understand how RDF data simulate with XHTML (Extensible Hypertext Markup Language). Just like with human understands concept foaf (Friend –of-a-friend) vocabulary as Figure 4. RDF simulate with XHTML.Let the browser know how it understands in XHTML< Body xmlns: foaf=`http: //xmlns.com/foaf/0.1` >< span typeof=`foaf: person` property=`foaf: name` > Jason Muxlow < /span >< span about=`#peter` typeof=`foaf: person` property=`foaf: name` > Peter Hernandez < /span >< span about=`#jason` rel=`foaf: knows` resource=`#peter` > Knows < /span >< /body >5.1 Map conventional Html vs. RDFRDF has a means fordata whereas HTML is made up of link among or between pages or documents. RDF data are targetly madeto standardize the web of data which ought to be linked with data and HTML published documents are standardize as a to be designed tags but which cannot be able to understand the document data just it shows how it should be shown unlike RDF web page of data. 5.2 Map conventional XML vs. RDFRDF of data is shown as graph data model that makes use of URI(s) whereas XML is made for data about data and it hastree data model and it doesn’t care about the URIs. 6.RESULTSMY SQL DUMP Data set sizeTable 3. MY SQL DUMP Data set size.100M25M5M1M250K50k3.2 GB1.06 GB212.4 MB41.4 MB10.3 MB2.0 MBLoad TIME Table 4. Load time Mysql.100M25M5M1M250K50k1129213491770.9N-TriplesData set sizeTable 5. TriplesData set size.100M25M5M1M250K50k5.1 GB1.2 GB249.8 MB49.8 MB12.4 MB2.4 MBOverall Query Execution TimeTable 6. Query Execution Time of SPARQL.100M25M5M1M250K50k5.1 GB1.2 GB249.8 MB49.8 MB12.4 MB2.4 MBRunning mixes queries against different stores than wetook over all results of time (in seconds). And we got betterperformance among them those are the highlightsas bold.Table 7.Over All results. Data set SizeMySQLJenaVirtuoso50K66.59023.540162.040250K153.55072.968162.8071M484.534268.004201.31005M2188.1761406.690476.801025Mnot applicable7623.9622089.122100Mnot applicablenot applicable906.683*7.DISCUSSIONWe checked in Relational databases (MYSQL) that when we stored of big data sometimes execution time exceed or not applicable and then we sliced data into small chunks of data and imported for throughputand at the time of query response at big data used joins but could not retrieved data and got sometimeserror or so it was not applicable although we indexed on primary key(s) columns but at big data we could not get best results rather than average till 25M.Despite MYSQL results at the JENA performances were better at small data, it has fast response time then either MYSQL or VIRTUOSO but when we fetched big data till 25M, We got slow results and at 100M, We could not get results and time exceeded and it was not applicable rather than that VIRTUOSO has fast results at the response timeof retrieval even in 100M or either at 25M of data set as well.If you notice Virtuoso has fastest results against the RDF store of big data sets like in other one, such as: Jena or MySQL at 25M to 100M doesn’t show the results and are not applicable to showing results. Whether MySQL showed poorerresults overall performance of either big date or small chunks of data besides that Jena has good performance at small ones in compare of Virtuoso or MySQL but at big data Jena is slowing down their operation.8.CONCLUSIONThis article describes about the comparisonbetween RDF of data and relationalof data using the Semantic Web technologies JENA, VIRTUOSO and MYSQL for benchmark of query performance and workloads throughoutthe RDF store of data set and relationalof data setsIt is also described how to convert data for both Semantic and syntactic webs of information along with measuring throughput and performance gaps with different Querystructure, As comparedthe RDF stores Virtuoso has a faster retrievalof data for larger datasets while Jena showed good performance at small data sets, as comparedSPARQL with SQL( in MYSQL) database, Mysql showed poor performance regarding larger or small datasets as well. This is an indicator that there is still room for improving the rewriting algorithms. Comparing the overall performance (100M triple) of the data stores.REFERENCES[1] Zeng, Kai, et al. “A distributed graph engine for web scale RDF data.” Proceedings of the 39th international conference on Very Large Data Bases. VLDB Endowment, 2013.[2] Ming Fang; Sunderraman, R. “A hybrid approach to constraint reasoning in bio-ontologies”,Digital Information Management (ICDIM), 2012.[3] M.Farouk, M. Ishizuka,”Mapping DB to RDF with Additional Discovered Relations”, Stevens Point, Wisconsin, USA 2012. [4] J.Sequeda1, F.Priyatna2, and Boris Villazon-Terrazas2,”Relational Database to RDF Mapping Patterns”, Universidad Politecnica de Madrid, 2012.[5] M. Arenas, A. Bertails, E. Prud’hommeaux, and J. Sequeda. Direct map-ping of relational data to RDF. W3C Working Draft 29 May 2012, http://www.w3.org/TR/2012/WD-rdb-direct-mapping-20120529/.[6] Sequeda, Juan F., Marcelo Arenas, and Daniel P. Miranker. “On directly mapping relational databases to RDF and OWL.” Proceedings of the 21st international conference on World Wide Web. ACM, 2012.[7] Vicknair, Chad, et al. “A comparison of a graph database and a relational database: a data provenance perspective.” Proceedings of the 48th annual Southeast regional conference. ACM, 2010.[8] W3C OWL Working Group. OWL 2 Web ontology language document overview. W3C Recommendation 27 October 2009,http://www.w3.org/TR/owl2-overview/.[9] Ramanujam, S.Univ. of Texas at Dallas, Richardson, TX, USA Khadilkar, V. ; Khan, L. ; Seida, S. ; Kantarcioglu, M. ; Thuraisingham, Bhavani “Bi-directional Translation of Relational Data into Virtual RDF Stores”, Semantic Computing, 2010.[10] Ramanujam, S.; Gupta, Anubha; Khan, L.; Seida, Steven; Thuraisingham, Bhavani “R2D: A Bridge between the Semantic Web and Relational Visualization Tools”,Semantic Computing, 2009.[11] YuanAn Toronto Univ., Ont. Borgida, A. ; Miller, R.J. ; Mylopoulos, J. “A Semantic Approach to Discovering Schema Mapping Expressions”,Data Engineering, 2008.[13] Chen, Huajun, et al. “RDF/RDFS-based relational database integration.” Data Engineering, 2006.ICDE’06. Proceedings of the 22nd International Conference on. IEEE, 2006.[14] Broekstra, Jeen, Arjohn Kampman, and Frank Van Harmelen. “Sesame: A generic architecture for storing and querying rdf and rdf schema.” The Semantic Web—ISWC 2002. Springer Berlin Heidelberg, 2002.[15] Ramanujam, Sunitha, et al. “A relational wrapper for RDF reification.” Trust management III. Springer Berlin Heidelberg, 2009. [16] Neumann,T.; Tech.Univ.Munchen,Munich,Germany;Moerkotte,Guido Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins Data Engineering (ICDE), 2011 IEEE 27th International Conference on April 2011.[17] A.Szekely, A.Hejja, R.Andrei Buchmann Mapping a Relational Database into a RDF Repository in IEEE Computer SocietyWashington, DC, USA ©2011.[18] Christian Bizer1 and Andreas Schultz1 The Berlin SPARQL Benchmark Buchmann, USA ©2011.[19] Chris Bizer, Andreas Schultz “Berlin SPARQL Benchmark” July 2008, http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/V1/results/index.html [20] Chris Bizer, Andreas Schultz “Berlin SPARQL Benchmark” July 2008, http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/V1/results/index.html

Leave a Reply

Your email address will not be published. Required fields are marked *