relpipe-data/examples-rdf-sparql.xml
branchv_0
changeset 310 aeda3cb4528d
child 312 0a65e49a076f
equal deleted inserted replaced
309:71a627e72815 310:aeda3cb4528d
       
     1 <stránka
       
     2 	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
       
     3 	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
       
     4 	
       
     5 	<nadpis>Querying an RDF triplestore using SPARQL</nadpis>
       
     6 	<perex>use SQL-DK with Jena JDBC driver or a custom script to gather linked data</perex>
       
     7 	<m:pořadí-příkladu>04300</m:pořadí-příkladu>
       
     8 
       
     9 	<text xmlns="http://www.w3.org/1999/xhtml">
       
    10 		
       
    11 		<p>
       
    12 			In the Resource Description Framework (<a href="https://www.w3.org/RDF/">RDF</a>) world, there are no relations.
       
    13 			The data model is quite different.
       
    14 			It is built on top of triples: subject – predicate – object.
       
    15 			Despite there are no tables (compared to relational databases), RDF is not a schema-less clutter – 
       
    16 			actually RDF has a schema (ontology, vocabulary), just differently shaped.
       
    17 			Subjects and predicates are identified by <a href="https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier">IRI</a>s
       
    18 			(or formerly <a href="https://en.wikipedia.org/wiki/Uniform_Resource_Identifier">URI</a>s)
       
    19 			that are globally unique (compared to primary keys in relational databases that are almost never globally unique).
       
    20 			Objects are also identified by IRIs (and yes, one can be both subject and object) or they can be a primitive values like a text string or a number.
       
    21 		</p>
       
    22 		
       
    23 		<m:diagram orientace="vodorovně">
       
    24 			node [fontname = "Latin Modern Sans, sans-serif"];
       
    25 			edge [fontname = "Latin Modern Sans, sans-serif"];
       
    26 			subject	->	object [ label = "predicate"];
       
    27 		</m:diagram>
       
    28 		
       
    29 		<p>
       
    30 			This <em>triple</em> is also called a <em>statement</em>.
       
    31 			In the following statement:
       
    32 		</p>
       
    33 		
       
    34 		<blockquote>
       
    35 			<m:name/> tools are released under the GNU GPL license.
       
    36 		</blockquote>
       
    37 		
       
    38 		<p>we recognize:</p>
       
    39 		
       
    40 		<ul>
       
    41 			<li>
       
    42 				Subject: <i>					
       
    43 					<m:name/> tools</i>
       
    44 			</li>
       
    45 			<li>Predicate: <i>is released under license</i></li>
       
    46 			<li>Object: <i>GNU GPL</i></li>
       
    47 		</ul>
       
    48 		
       
    49 		<p>
       
    50 			This data model is seemingly simple: just a graph, two kinds of nodes and edges connecting them together.
       
    51 			Or a flat list of statements (triples).
       
    52 			But it can be also very complicated, depending on how we use it and how rich ontologies we design.
       
    53 			RDF can be studied for years and is a great topic for diploma thesis and dissertations,
       
    54 			but in this example, we will keep it as simple as possible.
       
    55 		</p>
       
    56 		
       
    57 		<p>
       
    58 			Collections of statements are stored in special databases called triplestores.
       
    59 			The data inside can be queried using the 
       
    60 			<a href="https://www.w3.org/TR/sparql11-overview/">SPARQL</a> language through the endpoint provided by the triplestore.
       
    61 			Popular implementations are 
       
    62 			<a href="https://jena.apache.org/">Jena</a>,
       
    63 			<a href="http://vos.openlinksw.com/owiki/wiki/VOS">Virtuoso</a> and
       
    64 			<a href="https://rdf4j.org/about/">RDF4J</a>
       
    65 			(all free software).
       
    66 		</p>
       
    67 		
       
    68 		<p>
       
    69 			Relational model can be easily mapped to RDF.
       
    70 			We can just simply add a prefix to the primary keys to make them globally unique IRIs.
       
    71 			The attributes will become predicates (also prefixed).
       
    72 			And the values will become objects (either primitive values or IRIs in case of foreign keys).
       
    73 			Of course, more complex transformation can be done – this is the most straightforward way.
       
    74 		</p>
       
    75 		
       
    76 		<p>
       
    77 			Mapping RDF data to relational model is bit more difficult.
       
    78 			Sometimes easy, sometimes very cumbersome.
       
    79 			We can always design some kind of EAV (entity – attribute – value) model in the relational database
       
    80 			or we can create a relation for each predicate…
       
    81 			If we do some universal automatic mapping and retain the flexibility of RDF and richness of the original ontology,
       
    82 			we usually lose the performance and simplicity of our relational queries.
       
    83 			Good mapping that will feel natural and idiomatic in the relational world and will perform well usually poses some hard work.
       
    84 		</p>
       
    85 		
       
    86 		<p>
       
    87 			But mapping mere results of a SPARQL query obtained from an RDF endpoint is a different story.
       
    88 			These results can be seen as records and processed using our relational tools,
       
    89 			stored, transformed or converted to other formats, displayed in GUI windows or safely passed to shell scripts.
       
    90 			This example shows how we can bridge the RDF and relational worlds.
       
    91 		</p>
       
    92 		
       
    93 		
       
    94 		<h2>Several ways of connecting to an RDF triplestore</h2>
       
    95 		
       
    96 		<p>
       
    97 			Currently there is no official <code>relpipe-in-rdf</code> or <code>relpipe-in-sparql</code> tool.
       
    98 			It will be probably part of some future release of <m:name/>.
       
    99 			But until then, despite this lack, we still have several options how to join the RDF world
       
   100 			and let the data from an RDF triplestore flow through our relational pipelines:
       
   101 		</p>
       
   102 		
       
   103 		<ul>
       
   104 			<li>SQL-DK + Jena JDBC driver + <code>relpipe-in-xml</code></li>
       
   105 			<li>ODBC-JDBC bridge + Jena JDBC driver + <code>relpipe-in-sql</code></li>
       
   106 			<li>A native SPARQL ODBC driver + <code>relpipe-in-sql</code></li>
       
   107 			<li>A shell script + <code>relpipe-in-csv</code> or <code>relpipe-in-xml</code></li>
       
   108 		</ul>
       
   109 		
       
   110 		<p>In this example, we will look at the first and the last option.</p>
       
   111 		
       
   112 		<h2>SQL-DK + Jena JDBC driver</h2>
       
   113 		
       
   114 		
       
   115 		<p>
       
   116 			Apache Jena is not only a triplestore,
       
   117 			it is a framework consisting of several parts
       
   118 			and provides also a special JDBC driver that is ready to use
       
   119 			(despite this <a href="https://issues.apache.org/jira/browse/JENA-1939">small bug</a>).
       
   120 			Thanks to this driver, we can use existing Java tools and run SPARQL queries instead of SQL ones.
       
   121 		</p>
       
   122 		
       
   123 		<p>
       
   124 			Such a tool that uses this standard API (JDBC)
       
   125 			is <a href="https://sql-dk.globalcode.info/">SQL-DK</a>.
       
   126 			This tool integrates well with <m:name/> because it can output results in the XML format (or alternatively the Recfile format)
       
   127 			that can be directly consumed by <code>relpipe-in-xml</code> (or alternatively <code>relpipe-in-recfile</code>).
       
   128 		</p>
       
   129 		
       
   130 		<p>First we download Jena source codes:</p>
       
   131 		
       
   132 		<m:pre jazyk="bash"><![CDATA[mkdir -p ~/src; cd ~/src
       
   133 git clone https://gitbox.apache.org/repos/asf/jena.git]]></m:pre>
       
   134 		
       
   135 		<p>
       
   136 			and apply the <a href="https://git-zaloha.frantovo.cz/gitbox.apache.org/repos/asf/jena.git/commit/?h=JENA-1939_updateCount&amp;id=bdb5439d22b80b2909258449d82fb7b5003fd64c">patch</a>
       
   137 			for abovementioned bug (if not already merged in the upstream).
       
   138 		</p>
       
   139 		
       
   140 		<p>n.b. As always when doing such experiments, we would probably run this under a separate user account or in a virtual machine.</p>
       
   141 		
       
   142 		<p>Then we will compile the JDBC driver:</p>
       
   143 		
       
   144 		<m:pre jazyk="bash"><![CDATA[cd ~/src/jena/jena-jdbc/
       
   145 mvn clean install]]></m:pre>
       
   146 
       
   147 		<p>
       
   148 			Now we will install SQL-DK (either from sources or from <code>.deb</code> or <code>.rpm</code> package)
       
   149 			and run it for the first time (which creates the configuration directory and files):
       
   150 		</p>
       
   151 		
       
   152 		<pre>sql-dk --list-databases</pre>
       
   153 		
       
   154 		<p>Then we will register the previously compiled Jena JDBC driver in the <code>~/.sql-dk/environment.sh</code></p>
       
   155 		
       
   156 		<m:pre jazyk="bash"><![CDATA[CUSTOM_JDBC=(
       
   157 	~/src/jena/jena-jdbc/jena-jdbc-driver-bundle/target/jena-jdbc-driver-bundle-*.jar
       
   158 );]]></m:pre>
       
   159 
       
   160 		<p>And we should see it among other drivers:</p>
       
   161 		
       
   162 		<pre><![CDATA[$ sql-dk --list-jdbc-drivers 
       
   163  ╭──────────────────────────────────────────────────┬───────────────────┬─────────────────┬─────────────────┬──────────────────────────╮
       
   164  │ class                                  (VARCHAR) │ version (VARCHAR) │ major (INTEGER) │ minor (INTEGER) │ jdbc_compliant (BOOLEAN) │
       
   165  ├──────────────────────────────────────────────────┼───────────────────┼─────────────────┼─────────────────┼──────────────────────────┤
       
   166  │ org.postgresql.Driver                            │ 9.4               │               9 │               4 │                    false │
       
   167  │ com.mysql.jdbc.Driver                            │ 5.1               │               5 │               1 │                    false │
       
   168  │ org.sqlite.JDBC                                  │ 3.25              │               3 │              25 │                    false │
       
   169  │ org.apache.jena.jdbc.mem.MemDriver               │ 1.0               │               1 │               0 │                    false │
       
   170  │ org.apache.jena.jdbc.remote.RemoteEndpointDriver │ 1.0               │               1 │               0 │                    false │
       
   171  │ org.apache.jena.jdbc.tdb.TDBDriver               │ 1.0               │               1 │               0 │                    false │
       
   172  ╰──────────────────────────────────────────────────┴───────────────────┴─────────────────┴─────────────────┴──────────────────────────╯
       
   173 Record count: 6]]></pre>
       
   174 
       
   175 		<p>The driver seems present so we can configure the connection in the <code>~/.sql-dk/config.xml</code> file:</p>
       
   176 		
       
   177 		<m:pre jazyk="xml"><![CDATA[<database>
       
   178 	<name>rdf-dbpedia</name>
       
   179 	<url>jdbc:jena:remote:query=http://dbpedia.org/sparql</url>
       
   180 	<userName></userName>
       
   181 	<password></password>
       
   182 </database>]]></m:pre>
       
   183 
       
   184 		<p>
       
   185 			This will connect us to the DBpedia endpoint (more datasources are mentioned in the chapter below).
       
   186 			We can test the connection:
       
   187 		</p>
       
   188 
       
   189 		<pre><![CDATA[$ sql-dk --test-connection rdf-dbpedia 
       
   190  ╭─────────────────────────┬──────────────────────┬─────────────────────┬────────────────────────┬───────────────────────────╮
       
   191  │ database_name (VARCHAR) │ configured (BOOLEAN) │ connected (BOOLEAN) │ product_name (VARCHAR) │ product_version (VARCHAR) │
       
   192  ├─────────────────────────┼──────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────┤
       
   193  │ rdf-dbpedia             │                 true │                true │                        │                           │
       
   194  ╰─────────────────────────┴──────────────────────┴─────────────────────┴────────────────────────┴───────────────────────────╯
       
   195 Record count: 1]]></pre>
       
   196 
       
   197 		<p>and run our first SPARQL query:</p>
       
   198 
       
   199 		<pre><![CDATA[$ sql-dk --db rdf-dbpedia --formatter tabular-prefetching --sql "SELECT * WHERE { ?subject ?predicate ?object . } LIMIT 8"
       
   200  ╭──────────────────────────────────────────────────────────────────────────────┬─────────────────────────────────────────────────┬─────────────────────────────────────────────────────────╮
       
   201  │ subject                                         (org.apache.jena.graph.Node) │ predicate          (org.apache.jena.graph.Node) │ object                     (org.apache.jena.graph.Node) │
       
   202  ├──────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────┼─────────────────────────────────────────────────────────┤
       
   203  │ http://www.openlinksw.com/virtrdf-data-formats#default-iid                   │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
       
   204  │ http://www.openlinksw.com/virtrdf-data-formats#default-iid-nullable          │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
       
   205  │ http://www.openlinksw.com/virtrdf-data-formats#default-iid-nonblank          │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
       
   206  │ http://www.openlinksw.com/virtrdf-data-formats#default-iid-nonblank-nullable │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
       
   207  │ http://www.openlinksw.com/virtrdf-data-formats#default                       │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
       
   208  │ http://www.openlinksw.com/virtrdf-data-formats#default-nullable              │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
       
   209  │ http://www.openlinksw.com/virtrdf-data-formats#sql-varchar                   │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
       
   210  │ http://www.openlinksw.com/virtrdf-data-formats#sql-varchar-nullable          │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
       
   211  ╰──────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────┴─────────────────────────────────────────────────────────╯
       
   212 Record count: 8]]></pre>
       
   213 
       
   214 		<p>
       
   215 			Not a big fun yet, but it proves that the connection is working and we are getting some results from the endpoint.
       
   216 			We will run some more interesting queries later.
       
   217 		</p>
       
   218 
       
   219 		<p>
       
   220 			When we switch to the <code>--formatter xml</code> we can pipe the stream from SQL-DK
       
   221 			to <code>relpipe-in-xml</code> and then process it using relational tools.
       
   222 			We can also use the <code>--sql-in</code> option of SQL-DK which reads the query from STDIN (instead of from command line argument)
       
   223 			and then wrap it as a reusable script that reads SPARQL and outputs relational data:
       
   224 		</p>
       
   225 		
       
   226 		<m:pre jazyk="bash">sql-dk --db "rdf-dbpedia" --formatter "xml" --sql-in | relpipe-in-xml</m:pre>
       
   227 		
       
   228 		<p>
       
   229 			For accessing remote SPARQL endpoint this is a bit overkill with lot of dependencies (so we will use different approach in the next chapter).
       
   230 			But Jena JDBC driver is not only for accessing remote endpoints – we can use it as an embedded database,
       
   231 			either an in-memory one or regular DB backed by persistent files.
       
   232 		</p>
       
   233 		
       
   234 		<p>
       
   235 			The in-memory database loads some initial data and then operates on them.
       
   236 			So we configure such connection:
       
   237 		</p>
       
   238 		
       
   239 		<m:pre jazyk="xml"><![CDATA[<database>
       
   240 	<name>rdf-in-memory</name>
       
   241 	<url>jdbc:jena:mem:dataset=/tmp/rdf-initial-data.ttl</url>
       
   242 	<userName></userName>
       
   243 	<password></password>
       
   244 </database>]]></m:pre>
       
   245 
       
   246 		<p>It runs fine, but <a href="https://en.wikipedia.org/wiki/Turtle_(syntax)">turtles</a> are not at home:</p>
       
   247 		
       
   248 		<pre><![CDATA[$ echo > /tmp/rdf-initial-data.ttl
       
   249 $ echo "SELECT * WHERE { ?subject ?predicate ?object . }" | sql-dk --db rdf-in-memory --formatter tabular-prefetching --sql-in 
       
   250  ╭──────────────────────────────────────┬────────────────────────────────────────┬─────────────────────────────────────╮
       
   251  │ subject (org.apache.jena.graph.Node) │ predicate (org.apache.jena.graph.Node) │ object (org.apache.jena.graph.Node) │
       
   252  ├──────────────────────────────────────┼────────────────────────────────────────┼─────────────────────────────────────┤
       
   253  ╰──────────────────────────────────────┴────────────────────────────────────────┴─────────────────────────────────────╯
       
   254 Record count: 0]]></pre>
       
   255 
       
   256 		<p>
       
   257 			If we are in a desperate need of turtles and have installed any <a href="https://lv2plug.in/">LV2</a> plugins,
       
   258 			we can find some and put them in our initial data file or reconfigure the database connection:
       
   259 		</p>
       
   260 		
       
   261 		<pre><![CDATA[$ find /usr/lib -name '*.ttl' | head
       
   262 /usr/lib/lv2/fil4.lv2/manifest.ttl
       
   263 /usr/lib/lv2/fil4.lv2/fil4.ttl
       
   264 /usr/lib/ardour5/LV2/a-fluidsynth.lv2/manifest.ttl
       
   265 /usr/lib/ardour5/LV2/a-fluidsynth.lv2/a-fluidsynth.ttl
       
   266 /usr/lib/ardour5/LV2/reasonablesynth.lv2/manifest.ttl
       
   267 /usr/lib/ardour5/LV2/reasonablesynth.lv2/reasonablesynth.ttl
       
   268 /usr/lib/ardour5/LV2/a-delay.lv2/manifest.ttl
       
   269 /usr/lib/ardour5/LV2/a-delay.lv2/presets.ttl
       
   270 /usr/lib/ardour5/LV2/a-delay.lv2/a-delay.ttl
       
   271 /usr/lib/ardour5/LV2/a-eq.lv2/manifest.ttl
       
   272 
       
   273 $ cat /usr/lib/lv2/fil4.lv2/manifest.ttl > /tmp/rdf-initial-data.ttl
       
   274 $ sed s@/tmp/rdf-initial-data.ttl@/usr/lib/lv2/fil4.lv2/manifest.ttl@g -i ~/.sql-dk/config.xml]]></pre>
       
   275 
       
   276 		<p>and look through Jena/RDF/SPARQL what is inside:</p>
       
   277 			
       
   278 		<pre><![CDATA[$ echo "SELECT * WHERE { ?subject ?predicate ?object . }" | sql-dk --db rdf-in-memory --formatter xml --sql-in | relpipe-in-xml | relpipe-out-tabular 
       
   279 r1:
       
   280  ╭───────────────────────────────────────┬─────────────────────────────────────────────────┬───────────────────────────────────────────╮
       
   281  │ subject                      (string) │ predicate                              (string) │ object                           (string) │
       
   282  ├───────────────────────────────────────┼─────────────────────────────────────────────────┼───────────────────────────────────────────┤
       
   283  │ http://gareus.org/oss/lv2/fil4#ui_gl  │ http://www.w3.org/2000/01/rdf-schema#seeAlso    │ file:///usr/lib/lv2/fil4.lv2/fil4.ttl     │
       
   284  │ http://gareus.org/oss/lv2/fil4#ui_gl  │ http://lv2plug.in/ns/extensions/ui#binary       │ file:///usr/lib/lv2/fil4.lv2/fil4UI_gl.so │
       
   285  │ http://gareus.org/oss/lv2/fil4#ui_gl  │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://lv2plug.in/ns/extensions/ui#X11UI  │
       
   286  │ http://gareus.org/oss/lv2/fil4#mono   │ http://www.w3.org/2000/01/rdf-schema#seeAlso    │ file:///usr/lib/lv2/fil4.lv2/fil4.ttl     │
       
   287  │ http://gareus.org/oss/lv2/fil4#mono   │ http://lv2plug.in/ns/lv2core#binary             │ file:///usr/lib/lv2/fil4.lv2/fil4.so      │
       
   288  │ http://gareus.org/oss/lv2/fil4#mono   │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://lv2plug.in/ns/lv2core#Plugin       │
       
   289  │ http://gareus.org/oss/lv2/fil4#stereo │ http://www.w3.org/2000/01/rdf-schema#seeAlso    │ file:///usr/lib/lv2/fil4.lv2/fil4.ttl     │
       
   290  │ http://gareus.org/oss/lv2/fil4#stereo │ http://lv2plug.in/ns/lv2core#binary             │ file:///usr/lib/lv2/fil4.lv2/fil4.so      │
       
   291  │ http://gareus.org/oss/lv2/fil4#stereo │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://lv2plug.in/ns/lv2core#Plugin       │
       
   292  ╰───────────────────────────────────────┴─────────────────────────────────────────────────┴───────────────────────────────────────────╯
       
   293 Record count: 9]]></pre>
       
   294 
       
   295 		<p>
       
   296 			Now we can be sure that LV2 uses the Turtle format for plugin configurations,
       
   297 			which is quite ingenious and inspirational – 
       
   298 			such configuration is well structured and its options (predicates in general) have globally unique identifiers (IRIs).
       
   299 			Also plugins are identified by IRIs which is great, because it avoids name collisions.
       
   300 		</p>
       
   301 		
       
   302 		<p>
       
   303 			Let us make some own turtles.
       
   304 			Reconfigure the database connection back:
       
   305 		</p>
       
   306 
       
   307 		<pre>sed s@/usr/lib/lv2/fil4.lv2/manifest.ttl@/tmp/rdf-initial-data.ttl@g -i ~/.sql-dk/config.xml</pre>
       
   308 		
       
   309 		<p>and fill the <code>/tmp/rdf-initial-data.ttl</code> with some new data:</p>
       
   310 		
       
   311 		<m:pre jazyk="turtle"><![CDATA[<http://example.org/person/you>
       
   312 	<http://example.org/predicate/have>
       
   313 	<http://example.org/thing/nice-day> .]]></m:pre>
       
   314 	
       
   315 		<p>
       
   316 			Turtle is a simple format that contains statements.
       
   317 			Subjects, predicates and objects are separated by spaces (tabs and line-ends are here just to make it more readable for us).
       
   318 			And statements end with <i>full stop</i> like ordinary sentences.
       
   319 		</p>
       
   320 		
       
   321 		<p>
       
   322 			To avoid repeating common parts of IRIs we can declare namespace prefixes:
       
   323 		</p>
       
   324 		
       
   325 		<m:pre jazyk="turtle"><![CDATA[@prefix person:     <http://example.org/person/> .
       
   326 @prefix predicate:  <http://example.org/predicate/> .
       
   327 @prefix thing:      <http://example.org/thing/> .
       
   328 
       
   329 person:you
       
   330 	predicate:have
       
   331 		thing:nice-day .]]></m:pre>
       
   332 		
       
   333 		<p>
       
   334 			This format is very concise.
       
   335 			If we describe the same subject, we use <i>semicolon</i> to avoid repeating it.
       
   336 			And if even the predicate is the same (multiple values), we use <i>comma</i>:
       
   337 		</p>
       
   338 		
       
   339 		<m:pre jazyk="turtle"><![CDATA[@prefix person:     <http://example.org/person/> .
       
   340 @prefix predicate:  <http://example.org/predicate/> .
       
   341 @prefix thing:      <http://example.org/thing/> .
       
   342 
       
   343 person:you
       
   344 	predicate:have
       
   345 		thing:nice-day, thing:much-fun;
       
   346 	predicate:read-about
       
   347 		thing:relational-pipes .]]></m:pre>
       
   348 
       
   349 		<p>
       
   350 			Jena will parse our file and respond to our basic query with these data:
       
   351 		</p>
       
   352 
       
   353 		<pre><![CDATA[$ echo "SELECT * WHERE { ?subject ?predicate ?object . }" | sql-dk --db rdf-in-memory --formatter xml --sql-in --relation rdf_results | relpipe-in-xml | relpipe-out-tabular 
       
   354 rdf_results:
       
   355  ╭───────────────────────────────┬─────────────────────────────────────────┬───────────────────────────────────────────╮
       
   356  │ subject              (string) │ predicate                      (string) │ object                           (string) │
       
   357  ├───────────────────────────────┼─────────────────────────────────────────┼───────────────────────────────────────────┤
       
   358  │ http://example.org/person/you │ http://example.org/predicate/read-about │ http://example.org/thing/relational-pipes │
       
   359  │ http://example.org/person/you │ http://example.org/predicate/have       │ http://example.org/thing/much-fun         │
       
   360  │ http://example.org/person/you │ http://example.org/predicate/have       │ http://example.org/thing/nice-day         │
       
   361  ╰───────────────────────────────┴─────────────────────────────────────────┴───────────────────────────────────────────╯
       
   362 Record count: 3]]></pre>
       
   363 
       
   364 		<p>Or if we prefer more vertical formats like Recfile:</p>
       
   365 		
       
   366 		<pre><![CDATA[$ echo "SELECT * WHERE { ?subject ?predicate ?object . }" | sql-dk --db rdf-in-memory --formatter xml --sql-in --relation rdf_results | relpipe-in-xml | relpipe-out-recfile 
       
   367 %rec: rdf_results
       
   368 
       
   369 subject: http://example.org/person/you
       
   370 predicate: http://example.org/predicate/read-about
       
   371 object: http://example.org/thing/relational-pipes
       
   372 
       
   373 subject: http://example.org/person/you
       
   374 predicate: http://example.org/predicate/have
       
   375 object: http://example.org/thing/much-fun
       
   376 
       
   377 subject: http://example.org/person/you
       
   378 predicate: http://example.org/predicate/have
       
   379 object: http://example.org/thing/nice-day]]></pre>
       
   380 
       
   381 		<p>Let us create some more data:</p>
       
   382 		
       
   383 		<m:pre jazyk="turtle" src="examples/rdf-heathers.ttl"/>
       
   384 		
       
   385 		<p>list them as statements:</p>
       
   386 		
       
   387 		<m:pre jazyk="text" src="examples/rdf-heathers.txt"/>
       
   388 		
       
   389 		<p>and run some more SPARQL queries…</p>
       
   390 		
       
   391 		<p>
       
   392 			Note:
       
   393 			<em>
       
   394 				we use <a href="https://tools.ietf.org/html/rfc4151">The tag: URI scheme</a> for our IRIs.
       
   395 				It makes URIs (IRIs) globally unique not only in space but also in time (domain owners change during time).
       
   396 				Which is great.
       
   397 				In the semantic web and linked data world, it is not common and locators (URLs) are used rather than pure identifiers (URIs, IRIs).
       
   398 				But here we want to emphasise that we work strictly with our local data
       
   399 				and make it clear that we do not depend on any on-line resources and nothing will be downloaded from remote servers.
       
   400 				And in a real project, we should use existing ontologies / vocabularies as much as possible instead of inventing new ones.
       
   401 				But we keep this example rather isolated from the complexity of the outer world and bit synthetic.
       
   402 			</em>
       
   403 		</p>
       
   404 		
       
   405 		<p>Find all quotes and names of their authors:</p>
       
   406 		<m:sparql-example name="examples/rdf-heathers-quotes"/>
       
   407 		
       
   408 		<p>List groups and counts of their members:</p>
       
   409 		<m:sparql-example name="examples/rdf-heathers-members"/>
       
   410 		
       
   411 		<p>Filter by a regular expression and list actor names rather than characters:</p>
       
   412 		<m:sparql-example name="examples/rdf-heathers-much"/>
       
   413 		
       
   414 		<p>Now imagine semantic model of Twin Peaks… How very!</p>
       
   415 		
       
   416 		<h2>Improvised relpipe-in-sparql tool</h2>
       
   417 		
       
   418 		<p>
       
   419 			Starting the JVM and creating always a new database from scratch on each query is quite… <i>heavy</i>.
       
   420 			We can keep Jena running in the background and connect to its SPARQL endpoint – or connect to any other endpoint on the internet.
       
   421 			So we will hack together a light script and name it <code>relpipe-in-sparql</code> (in some future release there will be such official tool).
       
   422 		</p>
       
   423 		
       
   424 		<p>
       
   425 			Because SPARQL endpoints accept plain HTTP requests, support besides XML also CSV and we already have <code>relpipe-in-csv</code>
       
   426 			the script can be very simple:
       
   427 		</p>
       
   428 		
       
   429 		<m:pre jazyk="bash"><![CDATA[curl \
       
   430 	--header "Accept: text/csv" \
       
   431 	--data-urlencode query="SELECT * WHERE { ?subject ?predicate ?object . } LIMIT 3" \
       
   432 	https://dbpedia.org/sparql | relpipe-in-csv | relpipe-out-tabular]]></m:pre>
       
   433 	
       
   434 		<p>
       
   435 			It becomes bit longer if we add some documentation, argument parsing and configuration:
       
   436 		</p>
       
   437 
       
   438 		
       
   439 		<m:pre jazyk="bash" src="examples/relpipe-in-sparql.sh" odkaz="ano"/>
       
   440 		
       
   441 		<p>
       
   442 			Here we have even two implementations that could be switched using the <code>RELPIPE_IN_SPARQL_IMPLEMENTATION</code> environmental variable.
       
   443 			The XML one is more powerful and can be customized (e.g. to specifically handle localized strings or add some new attributes to the relational output).
       
   444 			On the other hand, the CSV one has fewer dependencies and support streaming of long result sets (XSLT needs to load whole document first).
       
   445 		</p>
       
   446 		
       
   447 		<p>Both implementation should work:</p>
       
   448 		
       
   449 		<m:pre jazyk="bash"><![CDATA[export RELPIPE_IN_SPARQL_IMPLEMENTATION=xml
       
   450 export RELPIPE_IN_SPARQL_IMPLEMENTATION=csv
       
   451 echo 'SELECT * WHERE { ?subject ?predicate "Laura Dern"@en . } LIMIT 3' \
       
   452 	| relpipe-in-sparql \
       
   453 		--relation "jurassic" \
       
   454 		--endpoint "https://dbpedia.org/sparql" \
       
   455 	| relpipe-out-tabular]]></m:pre>
       
   456 
       
   457 		<p>and produce the same output:</p>
       
   458 
       
   459 		<pre><![CDATA[jurassic:
       
   460  ╭────────────────────────────────────────┬────────────────────────────────────────────╮
       
   461  │ subject                       (string) │ predicate                         (string) │
       
   462  ├────────────────────────────────────────┼────────────────────────────────────────────┤
       
   463  │ http://dbpedia.org/resource/Laura_Dern │ http://www.w3.org/2000/01/rdf-schema#label │
       
   464  │ http://www.wikidata.org/entity/Q220901 │ http://www.w3.org/2000/01/rdf-schema#label │
       
   465  │ http://dbpedia.org/resource/Laura_Dern │ http://xmlns.com/foaf/0.1/name             │
       
   466  ╰────────────────────────────────────────┴────────────────────────────────────────────╯
       
   467 Record count: 3]]></pre>
       
   468 
       
   469 		<p>And maybe somewhere nearby in the graph we will find:</p>
       
   470 
       
   471 		<blockquote>It's a Unix System… I know this!</blockquote>
       
   472 		
       
   473 		<h2>Sources of RDF data</h2>
       
   474 		
       
   475 		<p></p>
       
   476 		
       
   477 		<p>
       
   478 			The bad news are that we are not querying the real world. 
       
   479 			We are querying an imperfect, incomplete and outdated snapshot of the reality stored in someone's database.
       
   480 			The good news are that we can improve the content of certain databases like we improve articles in Wikipedia.
       
   481 		</p>
       
   482 		
       
   483 		<p>
       
   484 			Some addresses have already <i>leaked</i> in the <code>relpipe-in-sparql --help</code> above.
       
   485 			Here is brief description of some publicly available sources of RDF data
       
   486 			that we can play with.
       
   487 		</p>
       
   488 		
       
   489 		
       
   490 		<h3>Wikidata</h3>
       
   491 		
       
   492 		<p>
       
   493 			A free and open knowledge base, a sister project of Wikipedia.
       
   494 			Anyone can use and even edit its content.
       
   495 		</p>
       
   496 
       
   497 		<m:sparql-endpoint url="https://query.wikidata.org/sparql" website-url="https://www.wikidata.org/" website-title="Wikidata"/>
       
   498 		
       
   499 		
       
   500 		<h3>DBpedia</h3>
       
   501 		
       
   502 		<p>
       
   503 			They extract structured content from the information created in various Wikimedia projects.
       
   504 			And publish this knowledge graph for everyone.
       
   505 		</p>
       
   506 		
       
   507 		<m:sparql-endpoint url="https://dbpedia.org/sparql" website-url="https://wiki.dbpedia.org/" website-title="DBpedia"/>
       
   508 		
       
   509 		<h3>Czech government</h3>
       
   510 		<p>
       
   511 			Ministries and other institutions publish some data as open data and part of them as linked open data (LOD).
       
   512 		</p>
       
   513 		
       
   514 		<m:sparql-endpoint url="https://data.gov.cz/sparql" website-url="https://data.gov.cz/english/" website-title="Open data portal of the Czech Republic"/>
       
   515 		<m:sparql-endpoint url="https://data.cssz.cz/sparql" website-url="https://data.cssz.cz/" website-title="Open data portal of the Czech Social Security Administration"/>
       
   516 		<m:sparql-endpoint url="https://cedropendata.mfcr.cz/c3lod/cedr/sparql" website-url="https://cedropendata.mfcr.cz/" website-title="Open Data CEDR III"/>
       
   517 		
       
   518 		
       
   519 		<h2>Running SPARQL queries as scripts</h2>
       
   520 		
       
   521 		<p>Besides piping SPARQL queries through <code>relpipe-in-sparql</code> like this:</p>
       
   522 		<m:pre jazyk="bash"><![CDATA[cat query.sparql | relpipe-in-sparql | relpipe-out-tabular]]></m:pre>
       
   523 		
       
   524 		<p>we can make them executable and run like a (Bash, Perl, PHP etc.) script:</p>
       
   525 		<m:pre jazyk="bash"><![CDATA[chmod +x query.sparql
       
   526 ./query.sparql | relpipe-out-csv     # output in the CSV format
       
   527 ./query.sparql | relpipe-out-recfile # output in the Recfile format
       
   528 ./query.sparql                       # automatically appends relpipe-out-tabular to the pipeline
       
   529 ]]></m:pre>
       
   530 		
       
   531 		<p>(see the <m:a href="implementation">Implementation</m:a> page for complete list of available transformations and output filters)</p>
       
   532 
       
   533 		<p>
       
   534 			We need to add the first line comment that points to the interpreter.
       
   535 			The <code>endpoint</code> and <code>relation</code> parameters
       
   536 			are optional – we can say, where this query will be executed and how the output relation will be named:
       
   537 		</p>
       
   538 		
       
   539 		<m:pre jazyk="sparql" src="examples/rdf-sample-triples.sparql" odkaz="ano"/>
       
   540 		
       
   541 		<p>
       
   542 			Environmental variables <code>RELPIPE_IN_SPARQL_ENDPOINT</code> and <code>RELPIPE_IN_SPARQL_RELATION</code>
       
   543 			can be set to override the parameters from the file.
       
   544 			All the magic is done by this (bit hackish) helper script:
       
   545 		</p>
       
   546 		
       
   547 		<m:pre jazyk="bash" src="examples/rdf-sparql-interpreter.sh" odkaz="ano"/>
       
   548 
       
   549 		<p>
       
   550 			This script requires the <code>relpipe-in-sparql</code> we put together earlier.
       
   551 			Both scripts are just examples (not part of any release yet).
       
   552 		</p>
       
   553 		
       
   554 		
       
   555 		
       
   556 		<h2>Samples of SPARQL queries</h2>
       
   557 		
       
   558 		<p>
       
   559 			<i>Hey kid, rock and roll,</i>
       
   560 			let us list the films where both Coreys starred:
       
   561 		</p>
       
   562 		
       
   563 		<m:sparql-example name="examples/rdf-coreys"/>
       
   564 		
       
   565 		<p>
       
   566 			<i>So Mercedes has scratched our Cadillac, but it was still a great night. </i>
       
   567 		</p>
       
   568 		
       
   569 		<p>Now it is time to visit our friends from the club:</p>
       
   570 		
       
   571 		<m:sparql-example name="examples/rdf-breakfast-club"/>
       
   572 		
       
   573 		<p>
       
   574 			Not only <i>pretty in pink</i>, this is true <i>wisdom</i> and we could have much fun traversing this part of the graph.
       
   575 			But let us turn the globe around… there is also a lot to see in the Eastern Bloc.
       
   576 		</p>
       
   577 		
       
   578 		<m:sparql-example name="examples/rdf-blonde-and-brunette"/>
       
   579 		
       
   580 		<p>
       
   581 			<i>
       
   582 				Dad, what is this place?
       
   583 				Where are we?
       
   584 				Is there anyone here?<br/>
       
   585 				No. Just us.
       
   586 			</i>
       
   587 		</p>
       
   588 		
       
   589 		<m:sparql-example name="examples/rdf-return"/>
       
   590 		
       
   591 		<h2>P.S.</h2>
       
   592 		<p>
       
   593 			<i>
       
   594 				If you got an impression that RDF is just a poor relational database with a single table consisting of mere three columns
       
   595 				and with freaky SQL dialect, please be assured that this example shows just a small fraction of the wonderful RDF world.
       
   596 			</i>
       
   597 		</p>
       
   598 		
       
   599 		
       
   600 	</text>
       
   601 
       
   602 </stránka>