--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/relpipe-data/examples-rdf-sparql.xml Mon Jul 27 17:51:53 2020 +0200
@@ -0,0 +1,602 @@
+<stránka
+ xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
+ xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
+
+ <nadpis>Querying an RDF triplestore using SPARQL</nadpis>
+ <perex>use SQL-DK with Jena JDBC driver or a custom script to gather linked data</perex>
+ <m:pořadí-příkladu>04300</m:pořadí-příkladu>
+
+ <text xmlns="http://www.w3.org/1999/xhtml">
+
+ <p>
+ In the Resource Description Framework (<a href="https://www.w3.org/RDF/">RDF</a>) world, there are no relations.
+ The data model is quite different.
+ It is built on top of triples: subject – predicate – object.
+ Despite there are no tables (compared to relational databases), RDF is not a schema-less clutter –
+ actually RDF has a schema (ontology, vocabulary), just differently shaped.
+ Subjects and predicates are identified by <a href="https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier">IRI</a>s
+ (or formerly <a href="https://en.wikipedia.org/wiki/Uniform_Resource_Identifier">URI</a>s)
+ that are globally unique (compared to primary keys in relational databases that are almost never globally unique).
+ Objects are also identified by IRIs (and yes, one can be both subject and object) or they can be a primitive values like a text string or a number.
+ </p>
+
+ <m:diagram orientace="vodorovně">
+ node [fontname = "Latin Modern Sans, sans-serif"];
+ edge [fontname = "Latin Modern Sans, sans-serif"];
+ subject -> object [ label = "predicate"];
+ </m:diagram>
+
+ <p>
+ This <em>triple</em> is also called a <em>statement</em>.
+ In the following statement:
+ </p>
+
+ <blockquote>
+ <m:name/> tools are released under the GNU GPL license.
+ </blockquote>
+
+ <p>we recognize:</p>
+
+ <ul>
+ <li>
+ Subject: <i>
+ <m:name/> tools</i>
+ </li>
+ <li>Predicate: <i>is released under license</i></li>
+ <li>Object: <i>GNU GPL</i></li>
+ </ul>
+
+ <p>
+ This data model is seemingly simple: just a graph, two kinds of nodes and edges connecting them together.
+ Or a flat list of statements (triples).
+ But it can be also very complicated, depending on how we use it and how rich ontologies we design.
+ RDF can be studied for years and is a great topic for diploma thesis and dissertations,
+ but in this example, we will keep it as simple as possible.
+ </p>
+
+ <p>
+ Collections of statements are stored in special databases called triplestores.
+ The data inside can be queried using the
+ <a href="https://www.w3.org/TR/sparql11-overview/">SPARQL</a> language through the endpoint provided by the triplestore.
+ Popular implementations are
+ <a href="https://jena.apache.org/">Jena</a>,
+ <a href="http://vos.openlinksw.com/owiki/wiki/VOS">Virtuoso</a> and
+ <a href="https://rdf4j.org/about/">RDF4J</a>
+ (all free software).
+ </p>
+
+ <p>
+ Relational model can be easily mapped to RDF.
+ We can just simply add a prefix to the primary keys to make them globally unique IRIs.
+ The attributes will become predicates (also prefixed).
+ And the values will become objects (either primitive values or IRIs in case of foreign keys).
+ Of course, more complex transformation can be done – this is the most straightforward way.
+ </p>
+
+ <p>
+ Mapping RDF data to relational model is bit more difficult.
+ Sometimes easy, sometimes very cumbersome.
+ We can always design some kind of EAV (entity – attribute – value) model in the relational database
+ or we can create a relation for each predicate…
+ If we do some universal automatic mapping and retain the flexibility of RDF and richness of the original ontology,
+ we usually lose the performance and simplicity of our relational queries.
+ Good mapping that will feel natural and idiomatic in the relational world and will perform well usually poses some hard work.
+ </p>
+
+ <p>
+ But mapping mere results of a SPARQL query obtained from an RDF endpoint is a different story.
+ These results can be seen as records and processed using our relational tools,
+ stored, transformed or converted to other formats, displayed in GUI windows or safely passed to shell scripts.
+ This example shows how we can bridge the RDF and relational worlds.
+ </p>
+
+
+ <h2>Several ways of connecting to an RDF triplestore</h2>
+
+ <p>
+ Currently there is no official <code>relpipe-in-rdf</code> or <code>relpipe-in-sparql</code> tool.
+ It will be probably part of some future release of <m:name/>.
+ But until then, despite this lack, we still have several options how to join the RDF world
+ and let the data from an RDF triplestore flow through our relational pipelines:
+ </p>
+
+ <ul>
+ <li>SQL-DK + Jena JDBC driver + <code>relpipe-in-xml</code></li>
+ <li>ODBC-JDBC bridge + Jena JDBC driver + <code>relpipe-in-sql</code></li>
+ <li>A native SPARQL ODBC driver + <code>relpipe-in-sql</code></li>
+ <li>A shell script + <code>relpipe-in-csv</code> or <code>relpipe-in-xml</code></li>
+ </ul>
+
+ <p>In this example, we will look at the first and the last option.</p>
+
+ <h2>SQL-DK + Jena JDBC driver</h2>
+
+
+ <p>
+ Apache Jena is not only a triplestore,
+ it is a framework consisting of several parts
+ and provides also a special JDBC driver that is ready to use
+ (despite this <a href="https://issues.apache.org/jira/browse/JENA-1939">small bug</a>).
+ Thanks to this driver, we can use existing Java tools and run SPARQL queries instead of SQL ones.
+ </p>
+
+ <p>
+ Such a tool that uses this standard API (JDBC)
+ is <a href="https://sql-dk.globalcode.info/">SQL-DK</a>.
+ This tool integrates well with <m:name/> because it can output results in the XML format (or alternatively the Recfile format)
+ that can be directly consumed by <code>relpipe-in-xml</code> (or alternatively <code>relpipe-in-recfile</code>).
+ </p>
+
+ <p>First we download Jena source codes:</p>
+
+ <m:pre jazyk="bash"><![CDATA[mkdir -p ~/src; cd ~/src
+git clone https://gitbox.apache.org/repos/asf/jena.git]]></m:pre>
+
+ <p>
+ and apply the <a href="https://git-zaloha.frantovo.cz/gitbox.apache.org/repos/asf/jena.git/commit/?h=JENA-1939_updateCount&id=bdb5439d22b80b2909258449d82fb7b5003fd64c">patch</a>
+ for abovementioned bug (if not already merged in the upstream).
+ </p>
+
+ <p>n.b. As always when doing such experiments, we would probably run this under a separate user account or in a virtual machine.</p>
+
+ <p>Then we will compile the JDBC driver:</p>
+
+ <m:pre jazyk="bash"><![CDATA[cd ~/src/jena/jena-jdbc/
+mvn clean install]]></m:pre>
+
+ <p>
+ Now we will install SQL-DK (either from sources or from <code>.deb</code> or <code>.rpm</code> package)
+ and run it for the first time (which creates the configuration directory and files):
+ </p>
+
+ <pre>sql-dk --list-databases</pre>
+
+ <p>Then we will register the previously compiled Jena JDBC driver in the <code>~/.sql-dk/environment.sh</code></p>
+
+ <m:pre jazyk="bash"><![CDATA[CUSTOM_JDBC=(
+ ~/src/jena/jena-jdbc/jena-jdbc-driver-bundle/target/jena-jdbc-driver-bundle-*.jar
+);]]></m:pre>
+
+ <p>And we should see it among other drivers:</p>
+
+ <pre><![CDATA[$ sql-dk --list-jdbc-drivers
+ ╭──────────────────────────────────────────────────┬───────────────────┬─────────────────┬─────────────────┬──────────────────────────╮
+ │ class (VARCHAR) │ version (VARCHAR) │ major (INTEGER) │ minor (INTEGER) │ jdbc_compliant (BOOLEAN) │
+ ├──────────────────────────────────────────────────┼───────────────────┼─────────────────┼─────────────────┼──────────────────────────┤
+ │ org.postgresql.Driver │ 9.4 │ 9 │ 4 │ false │
+ │ com.mysql.jdbc.Driver │ 5.1 │ 5 │ 1 │ false │
+ │ org.sqlite.JDBC │ 3.25 │ 3 │ 25 │ false │
+ │ org.apache.jena.jdbc.mem.MemDriver │ 1.0 │ 1 │ 0 │ false │
+ │ org.apache.jena.jdbc.remote.RemoteEndpointDriver │ 1.0 │ 1 │ 0 │ false │
+ │ org.apache.jena.jdbc.tdb.TDBDriver │ 1.0 │ 1 │ 0 │ false │
+ ╰──────────────────────────────────────────────────┴───────────────────┴─────────────────┴─────────────────┴──────────────────────────╯
+Record count: 6]]></pre>
+
+ <p>The driver seems present so we can configure the connection in the <code>~/.sql-dk/config.xml</code> file:</p>
+
+ <m:pre jazyk="xml"><![CDATA[<database>
+ <name>rdf-dbpedia</name>
+ <url>jdbc:jena:remote:query=http://dbpedia.org/sparql</url>
+ <userName></userName>
+ <password></password>
+</database>]]></m:pre>
+
+ <p>
+ This will connect us to the DBpedia endpoint (more datasources are mentioned in the chapter below).
+ We can test the connection:
+ </p>
+
+ <pre><![CDATA[$ sql-dk --test-connection rdf-dbpedia
+ ╭─────────────────────────┬──────────────────────┬─────────────────────┬────────────────────────┬───────────────────────────╮
+ │ database_name (VARCHAR) │ configured (BOOLEAN) │ connected (BOOLEAN) │ product_name (VARCHAR) │ product_version (VARCHAR) │
+ ├─────────────────────────┼──────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────┤
+ │ rdf-dbpedia │ true │ true │ │ │
+ ╰─────────────────────────┴──────────────────────┴─────────────────────┴────────────────────────┴───────────────────────────╯
+Record count: 1]]></pre>
+
+ <p>and run our first SPARQL query:</p>
+
+ <pre><![CDATA[$ sql-dk --db rdf-dbpedia --formatter tabular-prefetching --sql "SELECT * WHERE { ?subject ?predicate ?object . } LIMIT 8"
+ ╭──────────────────────────────────────────────────────────────────────────────┬─────────────────────────────────────────────────┬─────────────────────────────────────────────────────────╮
+ │ subject (org.apache.jena.graph.Node) │ predicate (org.apache.jena.graph.Node) │ object (org.apache.jena.graph.Node) │
+ ├──────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────┼─────────────────────────────────────────────────────────┤
+ │ http://www.openlinksw.com/virtrdf-data-formats#default-iid │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
+ │ http://www.openlinksw.com/virtrdf-data-formats#default-iid-nullable │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
+ │ http://www.openlinksw.com/virtrdf-data-formats#default-iid-nonblank │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
+ │ http://www.openlinksw.com/virtrdf-data-formats#default-iid-nonblank-nullable │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
+ │ http://www.openlinksw.com/virtrdf-data-formats#default │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
+ │ http://www.openlinksw.com/virtrdf-data-formats#default-nullable │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
+ │ http://www.openlinksw.com/virtrdf-data-formats#sql-varchar │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
+ │ http://www.openlinksw.com/virtrdf-data-formats#sql-varchar-nullable │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
+ ╰──────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────┴─────────────────────────────────────────────────────────╯
+Record count: 8]]></pre>
+
+ <p>
+ Not a big fun yet, but it proves that the connection is working and we are getting some results from the endpoint.
+ We will run some more interesting queries later.
+ </p>
+
+ <p>
+ When we switch to the <code>--formatter xml</code> we can pipe the stream from SQL-DK
+ to <code>relpipe-in-xml</code> and then process it using relational tools.
+ We can also use the <code>--sql-in</code> option of SQL-DK which reads the query from STDIN (instead of from command line argument)
+ and then wrap it as a reusable script that reads SPARQL and outputs relational data:
+ </p>
+
+ <m:pre jazyk="bash">sql-dk --db "rdf-dbpedia" --formatter "xml" --sql-in | relpipe-in-xml</m:pre>
+
+ <p>
+ For accessing remote SPARQL endpoint this is a bit overkill with lot of dependencies (so we will use different approach in the next chapter).
+ But Jena JDBC driver is not only for accessing remote endpoints – we can use it as an embedded database,
+ either an in-memory one or regular DB backed by persistent files.
+ </p>
+
+ <p>
+ The in-memory database loads some initial data and then operates on them.
+ So we configure such connection:
+ </p>
+
+ <m:pre jazyk="xml"><![CDATA[<database>
+ <name>rdf-in-memory</name>
+ <url>jdbc:jena:mem:dataset=/tmp/rdf-initial-data.ttl</url>
+ <userName></userName>
+ <password></password>
+</database>]]></m:pre>
+
+ <p>It runs fine, but <a href="https://en.wikipedia.org/wiki/Turtle_(syntax)">turtles</a> are not at home:</p>
+
+ <pre><![CDATA[$ echo > /tmp/rdf-initial-data.ttl
+$ echo "SELECT * WHERE { ?subject ?predicate ?object . }" | sql-dk --db rdf-in-memory --formatter tabular-prefetching --sql-in
+ ╭──────────────────────────────────────┬────────────────────────────────────────┬─────────────────────────────────────╮
+ │ subject (org.apache.jena.graph.Node) │ predicate (org.apache.jena.graph.Node) │ object (org.apache.jena.graph.Node) │
+ ├──────────────────────────────────────┼────────────────────────────────────────┼─────────────────────────────────────┤
+ ╰──────────────────────────────────────┴────────────────────────────────────────┴─────────────────────────────────────╯
+Record count: 0]]></pre>
+
+ <p>
+ If we are in a desperate need of turtles and have installed any <a href="https://lv2plug.in/">LV2</a> plugins,
+ we can find some and put them in our initial data file or reconfigure the database connection:
+ </p>
+
+ <pre><![CDATA[$ find /usr/lib -name '*.ttl' | head
+/usr/lib/lv2/fil4.lv2/manifest.ttl
+/usr/lib/lv2/fil4.lv2/fil4.ttl
+/usr/lib/ardour5/LV2/a-fluidsynth.lv2/manifest.ttl
+/usr/lib/ardour5/LV2/a-fluidsynth.lv2/a-fluidsynth.ttl
+/usr/lib/ardour5/LV2/reasonablesynth.lv2/manifest.ttl
+/usr/lib/ardour5/LV2/reasonablesynth.lv2/reasonablesynth.ttl
+/usr/lib/ardour5/LV2/a-delay.lv2/manifest.ttl
+/usr/lib/ardour5/LV2/a-delay.lv2/presets.ttl
+/usr/lib/ardour5/LV2/a-delay.lv2/a-delay.ttl
+/usr/lib/ardour5/LV2/a-eq.lv2/manifest.ttl
+
+$ cat /usr/lib/lv2/fil4.lv2/manifest.ttl > /tmp/rdf-initial-data.ttl
+$ sed s@/tmp/rdf-initial-data.ttl@/usr/lib/lv2/fil4.lv2/manifest.ttl@g -i ~/.sql-dk/config.xml]]></pre>
+
+ <p>and look through Jena/RDF/SPARQL what is inside:</p>
+
+ <pre><![CDATA[$ echo "SELECT * WHERE { ?subject ?predicate ?object . }" | sql-dk --db rdf-in-memory --formatter xml --sql-in | relpipe-in-xml | relpipe-out-tabular
+r1:
+ ╭───────────────────────────────────────┬─────────────────────────────────────────────────┬───────────────────────────────────────────╮
+ │ subject (string) │ predicate (string) │ object (string) │
+ ├───────────────────────────────────────┼─────────────────────────────────────────────────┼───────────────────────────────────────────┤
+ │ http://gareus.org/oss/lv2/fil4#ui_gl │ http://www.w3.org/2000/01/rdf-schema#seeAlso │ file:///usr/lib/lv2/fil4.lv2/fil4.ttl │
+ │ http://gareus.org/oss/lv2/fil4#ui_gl │ http://lv2plug.in/ns/extensions/ui#binary │ file:///usr/lib/lv2/fil4.lv2/fil4UI_gl.so │
+ │ http://gareus.org/oss/lv2/fil4#ui_gl │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://lv2plug.in/ns/extensions/ui#X11UI │
+ │ http://gareus.org/oss/lv2/fil4#mono │ http://www.w3.org/2000/01/rdf-schema#seeAlso │ file:///usr/lib/lv2/fil4.lv2/fil4.ttl │
+ │ http://gareus.org/oss/lv2/fil4#mono │ http://lv2plug.in/ns/lv2core#binary │ file:///usr/lib/lv2/fil4.lv2/fil4.so │
+ │ http://gareus.org/oss/lv2/fil4#mono │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://lv2plug.in/ns/lv2core#Plugin │
+ │ http://gareus.org/oss/lv2/fil4#stereo │ http://www.w3.org/2000/01/rdf-schema#seeAlso │ file:///usr/lib/lv2/fil4.lv2/fil4.ttl │
+ │ http://gareus.org/oss/lv2/fil4#stereo │ http://lv2plug.in/ns/lv2core#binary │ file:///usr/lib/lv2/fil4.lv2/fil4.so │
+ │ http://gareus.org/oss/lv2/fil4#stereo │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://lv2plug.in/ns/lv2core#Plugin │
+ ╰───────────────────────────────────────┴─────────────────────────────────────────────────┴───────────────────────────────────────────╯
+Record count: 9]]></pre>
+
+ <p>
+ Now we can be sure that LV2 uses the Turtle format for plugin configurations,
+ which is quite ingenious and inspirational –
+ such configuration is well structured and its options (predicates in general) have globally unique identifiers (IRIs).
+ Also plugins are identified by IRIs which is great, because it avoids name collisions.
+ </p>
+
+ <p>
+ Let us make some own turtles.
+ Reconfigure the database connection back:
+ </p>
+
+ <pre>sed s@/usr/lib/lv2/fil4.lv2/manifest.ttl@/tmp/rdf-initial-data.ttl@g -i ~/.sql-dk/config.xml</pre>
+
+ <p>and fill the <code>/tmp/rdf-initial-data.ttl</code> with some new data:</p>
+
+ <m:pre jazyk="turtle"><![CDATA[<http://example.org/person/you>
+ <http://example.org/predicate/have>
+ <http://example.org/thing/nice-day> .]]></m:pre>
+
+ <p>
+ Turtle is a simple format that contains statements.
+ Subjects, predicates and objects are separated by spaces (tabs and line-ends are here just to make it more readable for us).
+ And statements end with <i>full stop</i> like ordinary sentences.
+ </p>
+
+ <p>
+ To avoid repeating common parts of IRIs we can declare namespace prefixes:
+ </p>
+
+ <m:pre jazyk="turtle"><![CDATA[@prefix person: <http://example.org/person/> .
+@prefix predicate: <http://example.org/predicate/> .
+@prefix thing: <http://example.org/thing/> .
+
+person:you
+ predicate:have
+ thing:nice-day .]]></m:pre>
+
+ <p>
+ This format is very concise.
+ If we describe the same subject, we use <i>semicolon</i> to avoid repeating it.
+ And if even the predicate is the same (multiple values), we use <i>comma</i>:
+ </p>
+
+ <m:pre jazyk="turtle"><![CDATA[@prefix person: <http://example.org/person/> .
+@prefix predicate: <http://example.org/predicate/> .
+@prefix thing: <http://example.org/thing/> .
+
+person:you
+ predicate:have
+ thing:nice-day, thing:much-fun;
+ predicate:read-about
+ thing:relational-pipes .]]></m:pre>
+
+ <p>
+ Jena will parse our file and respond to our basic query with these data:
+ </p>
+
+ <pre><![CDATA[$ echo "SELECT * WHERE { ?subject ?predicate ?object . }" | sql-dk --db rdf-in-memory --formatter xml --sql-in --relation rdf_results | relpipe-in-xml | relpipe-out-tabular
+rdf_results:
+ ╭───────────────────────────────┬─────────────────────────────────────────┬───────────────────────────────────────────╮
+ │ subject (string) │ predicate (string) │ object (string) │
+ ├───────────────────────────────┼─────────────────────────────────────────┼───────────────────────────────────────────┤
+ │ http://example.org/person/you │ http://example.org/predicate/read-about │ http://example.org/thing/relational-pipes │
+ │ http://example.org/person/you │ http://example.org/predicate/have │ http://example.org/thing/much-fun │
+ │ http://example.org/person/you │ http://example.org/predicate/have │ http://example.org/thing/nice-day │
+ ╰───────────────────────────────┴─────────────────────────────────────────┴───────────────────────────────────────────╯
+Record count: 3]]></pre>
+
+ <p>Or if we prefer more vertical formats like Recfile:</p>
+
+ <pre><![CDATA[$ echo "SELECT * WHERE { ?subject ?predicate ?object . }" | sql-dk --db rdf-in-memory --formatter xml --sql-in --relation rdf_results | relpipe-in-xml | relpipe-out-recfile
+%rec: rdf_results
+
+subject: http://example.org/person/you
+predicate: http://example.org/predicate/read-about
+object: http://example.org/thing/relational-pipes
+
+subject: http://example.org/person/you
+predicate: http://example.org/predicate/have
+object: http://example.org/thing/much-fun
+
+subject: http://example.org/person/you
+predicate: http://example.org/predicate/have
+object: http://example.org/thing/nice-day]]></pre>
+
+ <p>Let us create some more data:</p>
+
+ <m:pre jazyk="turtle" src="examples/rdf-heathers.ttl"/>
+
+ <p>list them as statements:</p>
+
+ <m:pre jazyk="text" src="examples/rdf-heathers.txt"/>
+
+ <p>and run some more SPARQL queries…</p>
+
+ <p>
+ Note:
+ <em>
+ we use <a href="https://tools.ietf.org/html/rfc4151">The tag: URI scheme</a> for our IRIs.
+ It makes URIs (IRIs) globally unique not only in space but also in time (domain owners change during time).
+ Which is great.
+ In the semantic web and linked data world, it is not common and locators (URLs) are used rather than pure identifiers (URIs, IRIs).
+ But here we want to emphasise that we work strictly with our local data
+ and make it clear that we do not depend on any on-line resources and nothing will be downloaded from remote servers.
+ And in a real project, we should use existing ontologies / vocabularies as much as possible instead of inventing new ones.
+ But we keep this example rather isolated from the complexity of the outer world and bit synthetic.
+ </em>
+ </p>
+
+ <p>Find all quotes and names of their authors:</p>
+ <m:sparql-example name="examples/rdf-heathers-quotes"/>
+
+ <p>List groups and counts of their members:</p>
+ <m:sparql-example name="examples/rdf-heathers-members"/>
+
+ <p>Filter by a regular expression and list actor names rather than characters:</p>
+ <m:sparql-example name="examples/rdf-heathers-much"/>
+
+ <p>Now imagine semantic model of Twin Peaks… How very!</p>
+
+ <h2>Improvised relpipe-in-sparql tool</h2>
+
+ <p>
+ Starting the JVM and creating always a new database from scratch on each query is quite… <i>heavy</i>.
+ We can keep Jena running in the background and connect to its SPARQL endpoint – or connect to any other endpoint on the internet.
+ So we will hack together a light script and name it <code>relpipe-in-sparql</code> (in some future release there will be such official tool).
+ </p>
+
+ <p>
+ Because SPARQL endpoints accept plain HTTP requests, support besides XML also CSV and we already have <code>relpipe-in-csv</code>
+ the script can be very simple:
+ </p>
+
+ <m:pre jazyk="bash"><![CDATA[curl \
+ --header "Accept: text/csv" \
+ --data-urlencode query="SELECT * WHERE { ?subject ?predicate ?object . } LIMIT 3" \
+ https://dbpedia.org/sparql | relpipe-in-csv | relpipe-out-tabular]]></m:pre>
+
+ <p>
+ It becomes bit longer if we add some documentation, argument parsing and configuration:
+ </p>
+
+
+ <m:pre jazyk="bash" src="examples/relpipe-in-sparql.sh" odkaz="ano"/>
+
+ <p>
+ Here we have even two implementations that could be switched using the <code>RELPIPE_IN_SPARQL_IMPLEMENTATION</code> environmental variable.
+ The XML one is more powerful and can be customized (e.g. to specifically handle localized strings or add some new attributes to the relational output).
+ On the other hand, the CSV one has fewer dependencies and support streaming of long result sets (XSLT needs to load whole document first).
+ </p>
+
+ <p>Both implementation should work:</p>
+
+ <m:pre jazyk="bash"><![CDATA[export RELPIPE_IN_SPARQL_IMPLEMENTATION=xml
+export RELPIPE_IN_SPARQL_IMPLEMENTATION=csv
+echo 'SELECT * WHERE { ?subject ?predicate "Laura Dern"@en . } LIMIT 3' \
+ | relpipe-in-sparql \
+ --relation "jurassic" \
+ --endpoint "https://dbpedia.org/sparql" \
+ | relpipe-out-tabular]]></m:pre>
+
+ <p>and produce the same output:</p>
+
+ <pre><![CDATA[jurassic:
+ ╭────────────────────────────────────────┬────────────────────────────────────────────╮
+ │ subject (string) │ predicate (string) │
+ ├────────────────────────────────────────┼────────────────────────────────────────────┤
+ │ http://dbpedia.org/resource/Laura_Dern │ http://www.w3.org/2000/01/rdf-schema#label │
+ │ http://www.wikidata.org/entity/Q220901 │ http://www.w3.org/2000/01/rdf-schema#label │
+ │ http://dbpedia.org/resource/Laura_Dern │ http://xmlns.com/foaf/0.1/name │
+ ╰────────────────────────────────────────┴────────────────────────────────────────────╯
+Record count: 3]]></pre>
+
+ <p>And maybe somewhere nearby in the graph we will find:</p>
+
+ <blockquote>It's a Unix System… I know this!</blockquote>
+
+ <h2>Sources of RDF data</h2>
+
+ <p></p>
+
+ <p>
+ The bad news are that we are not querying the real world.
+ We are querying an imperfect, incomplete and outdated snapshot of the reality stored in someone's database.
+ The good news are that we can improve the content of certain databases like we improve articles in Wikipedia.
+ </p>
+
+ <p>
+ Some addresses have already <i>leaked</i> in the <code>relpipe-in-sparql --help</code> above.
+ Here is brief description of some publicly available sources of RDF data
+ that we can play with.
+ </p>
+
+
+ <h3>Wikidata</h3>
+
+ <p>
+ A free and open knowledge base, a sister project of Wikipedia.
+ Anyone can use and even edit its content.
+ </p>
+
+ <m:sparql-endpoint url="https://query.wikidata.org/sparql" website-url="https://www.wikidata.org/" website-title="Wikidata"/>
+
+
+ <h3>DBpedia</h3>
+
+ <p>
+ They extract structured content from the information created in various Wikimedia projects.
+ And publish this knowledge graph for everyone.
+ </p>
+
+ <m:sparql-endpoint url="https://dbpedia.org/sparql" website-url="https://wiki.dbpedia.org/" website-title="DBpedia"/>
+
+ <h3>Czech government</h3>
+ <p>
+ Ministries and other institutions publish some data as open data and part of them as linked open data (LOD).
+ </p>
+
+ <m:sparql-endpoint url="https://data.gov.cz/sparql" website-url="https://data.gov.cz/english/" website-title="Open data portal of the Czech Republic"/>
+ <m:sparql-endpoint url="https://data.cssz.cz/sparql" website-url="https://data.cssz.cz/" website-title="Open data portal of the Czech Social Security Administration"/>
+ <m:sparql-endpoint url="https://cedropendata.mfcr.cz/c3lod/cedr/sparql" website-url="https://cedropendata.mfcr.cz/" website-title="Open Data CEDR III"/>
+
+
+ <h2>Running SPARQL queries as scripts</h2>
+
+ <p>Besides piping SPARQL queries through <code>relpipe-in-sparql</code> like this:</p>
+ <m:pre jazyk="bash"><![CDATA[cat query.sparql | relpipe-in-sparql | relpipe-out-tabular]]></m:pre>
+
+ <p>we can make them executable and run like a (Bash, Perl, PHP etc.) script:</p>
+ <m:pre jazyk="bash"><![CDATA[chmod +x query.sparql
+./query.sparql | relpipe-out-csv # output in the CSV format
+./query.sparql | relpipe-out-recfile # output in the Recfile format
+./query.sparql # automatically appends relpipe-out-tabular to the pipeline
+]]></m:pre>
+
+ <p>(see the <m:a href="implementation">Implementation</m:a> page for complete list of available transformations and output filters)</p>
+
+ <p>
+ We need to add the first line comment that points to the interpreter.
+ The <code>endpoint</code> and <code>relation</code> parameters
+ are optional – we can say, where this query will be executed and how the output relation will be named:
+ </p>
+
+ <m:pre jazyk="sparql" src="examples/rdf-sample-triples.sparql" odkaz="ano"/>
+
+ <p>
+ Environmental variables <code>RELPIPE_IN_SPARQL_ENDPOINT</code> and <code>RELPIPE_IN_SPARQL_RELATION</code>
+ can be set to override the parameters from the file.
+ All the magic is done by this (bit hackish) helper script:
+ </p>
+
+ <m:pre jazyk="bash" src="examples/rdf-sparql-interpreter.sh" odkaz="ano"/>
+
+ <p>
+ This script requires the <code>relpipe-in-sparql</code> we put together earlier.
+ Both scripts are just examples (not part of any release yet).
+ </p>
+
+
+
+ <h2>Samples of SPARQL queries</h2>
+
+ <p>
+ <i>Hey kid, rock and roll,</i>
+ let us list the films where both Coreys starred:
+ </p>
+
+ <m:sparql-example name="examples/rdf-coreys"/>
+
+ <p>
+ <i>So Mercedes has scratched our Cadillac, but it was still a great night. </i>
+ </p>
+
+ <p>Now it is time to visit our friends from the club:</p>
+
+ <m:sparql-example name="examples/rdf-breakfast-club"/>
+
+ <p>
+ Not only <i>pretty in pink</i>, this is true <i>wisdom</i> and we could have much fun traversing this part of the graph.
+ But let us turn the globe around… there is also a lot to see in the Eastern Bloc.
+ </p>
+
+ <m:sparql-example name="examples/rdf-blonde-and-brunette"/>
+
+ <p>
+ <i>
+ Dad, what is this place?
+ Where are we?
+ Is there anyone here?<br/>
+ No. Just us.
+ </i>
+ </p>
+
+ <m:sparql-example name="examples/rdf-return"/>
+
+ <h2>P.S.</h2>
+ <p>
+ <i>
+ If you got an impression that RDF is just a poor relational database with a single table consisting of mere three columns
+ and with freaky SQL dialect, please be assured that this example shows just a small fraction of the wonderful RDF world.
+ </i>
+ </p>
+
+
+ </text>
+
+</stránka>
\ No newline at end of file