relpipe/relpipe-web: comparison relpipe-data/examples-rdf-sparql.xml

equal deleted inserted replaced

-:71a627e72815
+:aeda3cb4528d
+<stránka
+	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
+	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
+	<nadpis>Querying an RDF triplestore using SPARQL</nadpis>
+	<perex>use SQL-DK with Jena JDBC driver or a custom script to gather linked data</perex>
+	<m:pořadí-příkladu>04300</m:pořadí-příkladu>
+	<text xmlns="http://www.w3.org/1999/xhtml">
+		<p>
+			In the Resource Description Framework (<a href="https://www.w3.org/RDF/">RDF</a>) world, there are no relations.
+			The data model is quite different.
+			It is built on top of triples: subject – predicate – object.
+			Despite there are no tables (compared to relational databases), RDF is not a schema-less clutter –
+			actually RDF has a schema (ontology, vocabulary), just differently shaped.
+			Subjects and predicates are identified by <a href="https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier">IRI</a>s
+			(or formerly <a href="https://en.wikipedia.org/wiki/Uniform_Resource_Identifier">URI</a>s)
+			that are globally unique (compared to primary keys in relational databases that are almost never globally unique).
+			Objects are also identified by IRIs (and yes, one can be both subject and object) or they can be a primitive values like a text string or a number.
+		</p>
+		<m:diagram orientace="vodorovně">
+			node [fontname = "Latin Modern Sans, sans-serif"];
+			edge [fontname = "Latin Modern Sans, sans-serif"];
+			subject	->	object [ label = "predicate"];
+		</m:diagram>
+		<p>
+			This <em>triple</em> is also called a <em>statement</em>.
+			In the following statement:
+		</p>
+		<blockquote>
+			<m:name/> tools are released under the GNU GPL license.
+		</blockquote>
+		<p>we recognize:</p>
+		<ul>
+			<li>
+				Subject: <i>
+					<m:name/> tools</i>
+			</li>
+			<li>Predicate: <i>is released under license</i></li>
+			<li>Object: <i>GNU GPL</i></li>
+		</ul>
+		<p>
+			This data model is seemingly simple: just a graph, two kinds of nodes and edges connecting them together.
+			Or a flat list of statements (triples).
+			But it can be also very complicated, depending on how we use it and how rich ontologies we design.
+			RDF can be studied for years and is a great topic for diploma thesis and dissertations,
+			but in this example, we will keep it as simple as possible.
+		</p>
+		<p>
+			Collections of statements are stored in special databases called triplestores.
+			The data inside can be queried using the
+			<a href="https://www.w3.org/TR/sparql11-overview/">SPARQL</a> language through the endpoint provided by the triplestore.
+			Popular implementations are
+			<a href="https://jena.apache.org/">Jena</a>,
+			<a href="http://vos.openlinksw.com/owiki/wiki/VOS">Virtuoso</a> and
+			<a href="https://rdf4j.org/about/">RDF4J</a>
+			(all free software).
+		</p>
+		<p>
+			Relational model can be easily mapped to RDF.
+			We can just simply add a prefix to the primary keys to make them globally unique IRIs.
+			The attributes will become predicates (also prefixed).
+			And the values will become objects (either primitive values or IRIs in case of foreign keys).
+			Of course, more complex transformation can be done – this is the most straightforward way.
+		</p>
+		<p>
+			Mapping RDF data to relational model is bit more difficult.
+			Sometimes easy, sometimes very cumbersome.
+			We can always design some kind of EAV (entity – attribute – value) model in the relational database
+			or we can create a relation for each predicate…
+			If we do some universal automatic mapping and retain the flexibility of RDF and richness of the original ontology,
+			we usually lose the performance and simplicity of our relational queries.
+			Good mapping that will feel natural and idiomatic in the relational world and will perform well usually poses some hard work.
+		</p>
+		<p>
+			But mapping mere results of a SPARQL query obtained from an RDF endpoint is a different story.
+			These results can be seen as records and processed using our relational tools,
+			stored, transformed or converted to other formats, displayed in GUI windows or safely passed to shell scripts.
+			This example shows how we can bridge the RDF and relational worlds.
+		</p>
+		<h2>Several ways of connecting to an RDF triplestore</h2>
+		<p>
+			Currently there is no official <code>relpipe-in-rdf</code> or <code>relpipe-in-sparql</code> tool.
+			It will be probably part of some future release of <m:name/>.
+			But until then, despite this lack, we still have several options how to join the RDF world
+			and let the data from an RDF triplestore flow through our relational pipelines:
+		</p>
+		<ul>
+			<li>SQL-DK + Jena JDBC driver + <code>relpipe-in-xml</code></li>
+			<li>ODBC-JDBC bridge + Jena JDBC driver + <code>relpipe-in-sql</code></li>
+			<li>A native SPARQL ODBC driver + <code>relpipe-in-sql</code></li>
+			<li>A shell script + <code>relpipe-in-csv</code> or <code>relpipe-in-xml</code></li>
+		</ul>
+		<p>In this example, we will look at the first and the last option.</p>
+		<h2>SQL-DK + Jena JDBC driver</h2>
+		<p>
+			Apache Jena is not only a triplestore,
+			it is a framework consisting of several parts
+			and provides also a special JDBC driver that is ready to use
+			(despite this <a href="https://issues.apache.org/jira/browse/JENA-1939">small bug</a>).
+			Thanks to this driver, we can use existing Java tools and run SPARQL queries instead of SQL ones.
+		</p>
+		<p>
+			Such a tool that uses this standard API (JDBC)
+			is <a href="https://sql-dk.globalcode.info/">SQL-DK</a>.
+			This tool integrates well with <m:name/> because it can output results in the XML format (or alternatively the Recfile format)
+			that can be directly consumed by <code>relpipe-in-xml</code> (or alternatively <code>relpipe-in-recfile</code>).
+		</p>
+		<p>First we download Jena source codes:</p>
+		<m:pre jazyk="bash"><![CDATA[mkdir -p ~/src; cd ~/src
+git clone https://gitbox.apache.org/repos/asf/jena.git]]></m:pre>
+		<p>
+			and apply the <a href="https://git-zaloha.frantovo.cz/gitbox.apache.org/repos/asf/jena.git/commit/?h=JENA-1939_updateCount&amp;id=bdb5439d22b80b2909258449d82fb7b5003fd64c">patch</a>
+			for abovementioned bug (if not already merged in the upstream).
+		</p>
+		<p>n.b. As always when doing such experiments, we would probably run this under a separate user account or in a virtual machine.</p>
+		<p>Then we will compile the JDBC driver:</p>
+		<m:pre jazyk="bash"><![CDATA[cd ~/src/jena/jena-jdbc/
+mvn clean install]]></m:pre>
+		<p>
+			Now we will install SQL-DK (either from sources or from <code>.deb</code> or <code>.rpm</code> package)
+			and run it for the first time (which creates the configuration directory and files):
+		</p>
+		<pre>sql-dk --list-databases</pre>
+		<p>Then we will register the previously compiled Jena JDBC driver in the <code>~/.sql-dk/environment.sh</code></p>
+		<m:pre jazyk="bash"><![CDATA[CUSTOM_JDBC=(
+	~/src/jena/jena-jdbc/jena-jdbc-driver-bundle/target/jena-jdbc-driver-bundle-*.jar
+);]]></m:pre>
+		<p>And we should see it among other drivers:</p>
+		<pre><![CDATA[$ sql-dk --list-jdbc-drivers
+╭──────────────────────────────────────────────────┬───────────────────┬─────────────────┬─────────────────┬──────────────────────────╮
+│ class                                  (VARCHAR) │ version (VARCHAR) │ major (INTEGER) │ minor (INTEGER) │ jdbc_compliant (BOOLEAN) │
+├──────────────────────────────────────────────────┼───────────────────┼─────────────────┼─────────────────┼──────────────────────────┤
+│ org.postgresql.Driver                            │ 9.4               │               9 │               4 │                    false │
+│ com.mysql.jdbc.Driver                            │ 5.1               │               5 │               1 │                    false │
+│ org.sqlite.JDBC                                  │ 3.25              │               3 │              25 │                    false │
+│ org.apache.jena.jdbc.mem.MemDriver               │ 1.0               │               1 │               0 │                    false │
+│ org.apache.jena.jdbc.remote.RemoteEndpointDriver │ 1.0               │               1 │               0 │                    false │
+│ org.apache.jena.jdbc.tdb.TDBDriver               │ 1.0               │               1 │               0 │                    false │
+╰──────────────────────────────────────────────────┴───────────────────┴─────────────────┴─────────────────┴──────────────────────────╯
+Record count: 6]]></pre>
+		<p>The driver seems present so we can configure the connection in the <code>~/.sql-dk/config.xml</code> file:</p>
+		<m:pre jazyk="xml"><![CDATA[<database>
+	<name>rdf-dbpedia</name>
+	<url>jdbc:jena:remote:query=http://dbpedia.org/sparql</url>
+	<userName></userName>
+	<password></password>
+</database>]]></m:pre>
+		<p>
+			This will connect us to the DBpedia endpoint (more datasources are mentioned in the chapter below).
+			We can test the connection:
+		</p>
+		<pre><![CDATA[$ sql-dk --test-connection rdf-dbpedia
+╭─────────────────────────┬──────────────────────┬─────────────────────┬────────────────────────┬───────────────────────────╮
+│ database_name (VARCHAR) │ configured (BOOLEAN) │ connected (BOOLEAN) │ product_name (VARCHAR) │ product_version (VARCHAR) │
+├─────────────────────────┼──────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────┤
+│ rdf-dbpedia             │                 true │                true │                        │                           │
+╰─────────────────────────┴──────────────────────┴─────────────────────┴────────────────────────┴───────────────────────────╯
+Record count: 1]]></pre>
+		<p>and run our first SPARQL query:</p>
+		<pre><![CDATA[$ sql-dk --db rdf-dbpedia --formatter tabular-prefetching --sql "SELECT * WHERE { ?subject ?predicate ?object . } LIMIT 8"
+╭──────────────────────────────────────────────────────────────────────────────┬─────────────────────────────────────────────────┬─────────────────────────────────────────────────────────╮
+│ subject                                         (org.apache.jena.graph.Node) │ predicate          (org.apache.jena.graph.Node) │ object                     (org.apache.jena.graph.Node) │
+├──────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────┼─────────────────────────────────────────────────────────┤
+│ http://www.openlinksw.com/virtrdf-data-formats#default-iid                   │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
+│ http://www.openlinksw.com/virtrdf-data-formats#default-iid-nullable          │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
+│ http://www.openlinksw.com/virtrdf-data-formats#default-iid-nonblank          │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
+│ http://www.openlinksw.com/virtrdf-data-formats#default-iid-nonblank-nullable │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
+│ http://www.openlinksw.com/virtrdf-data-formats#default                       │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
+│ http://www.openlinksw.com/virtrdf-data-formats#default-nullable              │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
+│ http://www.openlinksw.com/virtrdf-data-formats#sql-varchar                   │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
+│ http://www.openlinksw.com/virtrdf-data-formats#sql-varchar-nullable          │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │
+╰──────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────┴─────────────────────────────────────────────────────────╯
+Record count: 8]]></pre>
+		<p>
+			Not a big fun yet, but it proves that the connection is working and we are getting some results from the endpoint.
+			We will run some more interesting queries later.
+		</p>
+		<p>
+			When we switch to the <code>--formatter xml</code> we can pipe the stream from SQL-DK
+			to <code>relpipe-in-xml</code> and then process it using relational tools.
+			We can also use the <code>--sql-in</code> option of SQL-DK which reads the query from STDIN (instead of from command line argument)
+			and then wrap it as a reusable script that reads SPARQL and outputs relational data:
+		</p>
+		<m:pre jazyk="bash">sql-dk --db "rdf-dbpedia" --formatter "xml" --sql-in | relpipe-in-xml</m:pre>
+		<p>
+			For accessing remote SPARQL endpoint this is a bit overkill with lot of dependencies (so we will use different approach in the next chapter).
+			But Jena JDBC driver is not only for accessing remote endpoints – we can use it as an embedded database,
+			either an in-memory one or regular DB backed by persistent files.
+		</p>
+		<p>
+			The in-memory database loads some initial data and then operates on them.
+			So we configure such connection:
+		</p>
+		<m:pre jazyk="xml"><![CDATA[<database>
+	<name>rdf-in-memory</name>
+	<url>jdbc:jena:mem:dataset=/tmp/rdf-initial-data.ttl</url>
+	<userName></userName>
+	<password></password>
+</database>]]></m:pre>
+		<p>It runs fine, but <a href="https://en.wikipedia.org/wiki/Turtle_(syntax)">turtles</a> are not at home:</p>
+		<pre><![CDATA[$ echo > /tmp/rdf-initial-data.ttl
+$ echo "SELECT * WHERE { ?subject ?predicate ?object . }" | sql-dk --db rdf-in-memory --formatter tabular-prefetching --sql-in
+╭──────────────────────────────────────┬────────────────────────────────────────┬─────────────────────────────────────╮
+│ subject (org.apache.jena.graph.Node) │ predicate (org.apache.jena.graph.Node) │ object (org.apache.jena.graph.Node) │
+├──────────────────────────────────────┼────────────────────────────────────────┼─────────────────────────────────────┤
+╰──────────────────────────────────────┴────────────────────────────────────────┴─────────────────────────────────────╯
+Record count: 0]]></pre>
+		<p>
+			If we are in a desperate need of turtles and have installed any <a href="https://lv2plug.in/">LV2</a> plugins,
+			we can find some and put them in our initial data file or reconfigure the database connection:
+		</p>
+		<pre><![CDATA[$ find /usr/lib -name '*.ttl' | head
+/usr/lib/lv2/fil4.lv2/manifest.ttl
+/usr/lib/lv2/fil4.lv2/fil4.ttl
+/usr/lib/ardour5/LV2/a-fluidsynth.lv2/manifest.ttl
+/usr/lib/ardour5/LV2/a-fluidsynth.lv2/a-fluidsynth.ttl
+/usr/lib/ardour5/LV2/reasonablesynth.lv2/manifest.ttl
+/usr/lib/ardour5/LV2/reasonablesynth.lv2/reasonablesynth.ttl
+/usr/lib/ardour5/LV2/a-delay.lv2/manifest.ttl
+/usr/lib/ardour5/LV2/a-delay.lv2/presets.ttl
+/usr/lib/ardour5/LV2/a-delay.lv2/a-delay.ttl
+/usr/lib/ardour5/LV2/a-eq.lv2/manifest.ttl
+$ cat /usr/lib/lv2/fil4.lv2/manifest.ttl > /tmp/rdf-initial-data.ttl
+$ sed s@/tmp/rdf-initial-data.ttl@/usr/lib/lv2/fil4.lv2/manifest.ttl@g -i ~/.sql-dk/config.xml]]></pre>
+		<p>and look through Jena/RDF/SPARQL what is inside:</p>
+		<pre><![CDATA[$ echo "SELECT * WHERE { ?subject ?predicate ?object . }" | sql-dk --db rdf-in-memory --formatter xml --sql-in | relpipe-in-xml | relpipe-out-tabular
+r1:
+╭───────────────────────────────────────┬─────────────────────────────────────────────────┬───────────────────────────────────────────╮
+│ subject                      (string) │ predicate                              (string) │ object                           (string) │
+├───────────────────────────────────────┼─────────────────────────────────────────────────┼───────────────────────────────────────────┤
+│ http://gareus.org/oss/lv2/fil4#ui_gl  │ http://www.w3.org/2000/01/rdf-schema#seeAlso    │ file:///usr/lib/lv2/fil4.lv2/fil4.ttl     │
+│ http://gareus.org/oss/lv2/fil4#ui_gl  │ http://lv2plug.in/ns/extensions/ui#binary       │ file:///usr/lib/lv2/fil4.lv2/fil4UI_gl.so │
+│ http://gareus.org/oss/lv2/fil4#ui_gl  │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://lv2plug.in/ns/extensions/ui#X11UI  │
+│ http://gareus.org/oss/lv2/fil4#mono   │ http://www.w3.org/2000/01/rdf-schema#seeAlso    │ file:///usr/lib/lv2/fil4.lv2/fil4.ttl     │
+│ http://gareus.org/oss/lv2/fil4#mono   │ http://lv2plug.in/ns/lv2core#binary             │ file:///usr/lib/lv2/fil4.lv2/fil4.so      │
+│ http://gareus.org/oss/lv2/fil4#mono   │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://lv2plug.in/ns/lv2core#Plugin       │
+│ http://gareus.org/oss/lv2/fil4#stereo │ http://www.w3.org/2000/01/rdf-schema#seeAlso    │ file:///usr/lib/lv2/fil4.lv2/fil4.ttl     │
+│ http://gareus.org/oss/lv2/fil4#stereo │ http://lv2plug.in/ns/lv2core#binary             │ file:///usr/lib/lv2/fil4.lv2/fil4.so      │
+│ http://gareus.org/oss/lv2/fil4#stereo │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://lv2plug.in/ns/lv2core#Plugin       │
+╰───────────────────────────────────────┴─────────────────────────────────────────────────┴───────────────────────────────────────────╯
+Record count: 9]]></pre>
+		<p>
+			Now we can be sure that LV2 uses the Turtle format for plugin configurations,
+			which is quite ingenious and inspirational –
+			such configuration is well structured and its options (predicates in general) have globally unique identifiers (IRIs).
+			Also plugins are identified by IRIs which is great, because it avoids name collisions.
+		</p>
+		<p>
+			Let us make some own turtles.
+			Reconfigure the database connection back:
+		</p>
+		<pre>sed s@/usr/lib/lv2/fil4.lv2/manifest.ttl@/tmp/rdf-initial-data.ttl@g -i ~/.sql-dk/config.xml</pre>
+		<p>and fill the <code>/tmp/rdf-initial-data.ttl</code> with some new data:</p>
+		<m:pre jazyk="turtle"><![CDATA[<http://example.org/person/you>
+	<http://example.org/predicate/have>
+	<http://example.org/thing/nice-day> .]]></m:pre>
+		<p>
+			Turtle is a simple format that contains statements.
+			Subjects, predicates and objects are separated by spaces (tabs and line-ends are here just to make it more readable for us).
+			And statements end with <i>full stop</i> like ordinary sentences.
+		</p>
+		<p>
+			To avoid repeating common parts of IRIs we can declare namespace prefixes:
+		</p>
+		<m:pre jazyk="turtle"><![CDATA[@prefix person:     <http://example.org/person/> .
+@prefix predicate:  <http://example.org/predicate/> .
+@prefix thing:      <http://example.org/thing/> .
+person:you
+	predicate:have
+		thing:nice-day .]]></m:pre>
+		<p>
+			This format is very concise.
+			If we describe the same subject, we use <i>semicolon</i> to avoid repeating it.
+			And if even the predicate is the same (multiple values), we use <i>comma</i>:
+		</p>
+		<m:pre jazyk="turtle"><![CDATA[@prefix person:     <http://example.org/person/> .
+@prefix predicate:  <http://example.org/predicate/> .
+@prefix thing:      <http://example.org/thing/> .
+person:you
+	predicate:have
+		thing:nice-day, thing:much-fun;
+	predicate:read-about
+		thing:relational-pipes .]]></m:pre>
+		<p>
+			Jena will parse our file and respond to our basic query with these data:
+		</p>
+		<pre><![CDATA[$ echo "SELECT * WHERE { ?subject ?predicate ?object . }" | sql-dk --db rdf-in-memory --formatter xml --sql-in --relation rdf_results | relpipe-in-xml | relpipe-out-tabular
+rdf_results:
+╭───────────────────────────────┬─────────────────────────────────────────┬───────────────────────────────────────────╮
+│ subject              (string) │ predicate                      (string) │ object                           (string) │
+├───────────────────────────────┼─────────────────────────────────────────┼───────────────────────────────────────────┤
+│ http://example.org/person/you │ http://example.org/predicate/read-about │ http://example.org/thing/relational-pipes │
+│ http://example.org/person/you │ http://example.org/predicate/have       │ http://example.org/thing/much-fun         │
+│ http://example.org/person/you │ http://example.org/predicate/have       │ http://example.org/thing/nice-day         │
+╰───────────────────────────────┴─────────────────────────────────────────┴───────────────────────────────────────────╯
+Record count: 3]]></pre>
+		<p>Or if we prefer more vertical formats like Recfile:</p>
+		<pre><![CDATA[$ echo "SELECT * WHERE { ?subject ?predicate ?object . }" | sql-dk --db rdf-in-memory --formatter xml --sql-in --relation rdf_results | relpipe-in-xml | relpipe-out-recfile
+%rec: rdf_results
+subject: http://example.org/person/you
+predicate: http://example.org/predicate/read-about
+object: http://example.org/thing/relational-pipes
+subject: http://example.org/person/you
+predicate: http://example.org/predicate/have
+object: http://example.org/thing/much-fun
+subject: http://example.org/person/you
+predicate: http://example.org/predicate/have
+object: http://example.org/thing/nice-day]]></pre>
+		<p>Let us create some more data:</p>
+		<m:pre jazyk="turtle" src="examples/rdf-heathers.ttl"/>
+		<p>list them as statements:</p>
+		<m:pre jazyk="text" src="examples/rdf-heathers.txt"/>
+		<p>and run some more SPARQL queries…</p>
+		<p>
+			Note:
+			<em>
+				we use <a href="https://tools.ietf.org/html/rfc4151">The tag: URI scheme</a> for our IRIs.
+				It makes URIs (IRIs) globally unique not only in space but also in time (domain owners change during time).
+				Which is great.
+				In the semantic web and linked data world, it is not common and locators (URLs) are used rather than pure identifiers (URIs, IRIs).
+				But here we want to emphasise that we work strictly with our local data
+				and make it clear that we do not depend on any on-line resources and nothing will be downloaded from remote servers.
+				And in a real project, we should use existing ontologies / vocabularies as much as possible instead of inventing new ones.
+				But we keep this example rather isolated from the complexity of the outer world and bit synthetic.
+			</em>
+		</p>
+		<p>Find all quotes and names of their authors:</p>
+		<m:sparql-example name="examples/rdf-heathers-quotes"/>
+		<p>List groups and counts of their members:</p>
+		<m:sparql-example name="examples/rdf-heathers-members"/>
+		<p>Filter by a regular expression and list actor names rather than characters:</p>
+		<m:sparql-example name="examples/rdf-heathers-much"/>
+		<p>Now imagine semantic model of Twin Peaks… How very!</p>
+		<h2>Improvised relpipe-in-sparql tool</h2>
+		<p>
+			Starting the JVM and creating always a new database from scratch on each query is quite… <i>heavy</i>.
+			We can keep Jena running in the background and connect to its SPARQL endpoint – or connect to any other endpoint on the internet.
+			So we will hack together a light script and name it <code>relpipe-in-sparql</code> (in some future release there will be such official tool).
+		</p>
+		<p>
+			Because SPARQL endpoints accept plain HTTP requests, support besides XML also CSV and we already have <code>relpipe-in-csv</code>
+			the script can be very simple:
+		</p>
+		<m:pre jazyk="bash"><![CDATA[curl \
+	--header "Accept: text/csv" \
+	--data-urlencode query="SELECT * WHERE { ?subject ?predicate ?object . } LIMIT 3" \
+	https://dbpedia.org/sparql | relpipe-in-csv | relpipe-out-tabular]]></m:pre>
+		<p>
+			It becomes bit longer if we add some documentation, argument parsing and configuration:
+		</p>
+		<m:pre jazyk="bash" src="examples/relpipe-in-sparql.sh" odkaz="ano"/>
+		<p>
+			Here we have even two implementations that could be switched using the <code>RELPIPE_IN_SPARQL_IMPLEMENTATION</code> environmental variable.
+			The XML one is more powerful and can be customized (e.g. to specifically handle localized strings or add some new attributes to the relational output).
+			On the other hand, the CSV one has fewer dependencies and support streaming of long result sets (XSLT needs to load whole document first).
+		</p>
+		<p>Both implementation should work:</p>
+		<m:pre jazyk="bash"><![CDATA[export RELPIPE_IN_SPARQL_IMPLEMENTATION=xml
+export RELPIPE_IN_SPARQL_IMPLEMENTATION=csv
+echo 'SELECT * WHERE { ?subject ?predicate "Laura Dern"@en . } LIMIT 3' \
+	| relpipe-in-sparql \
+		--relation "jurassic" \
+		--endpoint "https://dbpedia.org/sparql" \
+	| relpipe-out-tabular]]></m:pre>
+		<p>and produce the same output:</p>
+		<pre><![CDATA[jurassic:
+╭────────────────────────────────────────┬────────────────────────────────────────────╮
+│ subject                       (string) │ predicate                         (string) │
+├────────────────────────────────────────┼────────────────────────────────────────────┤
+│ http://dbpedia.org/resource/Laura_Dern │ http://www.w3.org/2000/01/rdf-schema#label │
+│ http://www.wikidata.org/entity/Q220901 │ http://www.w3.org/2000/01/rdf-schema#label │
+│ http://dbpedia.org/resource/Laura_Dern │ http://xmlns.com/foaf/0.1/name             │
+╰────────────────────────────────────────┴────────────────────────────────────────────╯
+Record count: 3]]></pre>
+		<p>And maybe somewhere nearby in the graph we will find:</p>
+		<blockquote>It's a Unix System… I know this!</blockquote>
+		<h2>Sources of RDF data</h2>
+		<p></p>
+		<p>
+			The bad news are that we are not querying the real world.
+			We are querying an imperfect, incomplete and outdated snapshot of the reality stored in someone's database.
+			The good news are that we can improve the content of certain databases like we improve articles in Wikipedia.
+		</p>
+		<p>
+			Some addresses have already <i>leaked</i> in the <code>relpipe-in-sparql --help</code> above.
+			Here is brief description of some publicly available sources of RDF data
+			that we can play with.
+		</p>
+		<h3>Wikidata</h3>
+		<p>
+			A free and open knowledge base, a sister project of Wikipedia.
+			Anyone can use and even edit its content.
+		</p>
+		<m:sparql-endpoint url="https://query.wikidata.org/sparql" website-url="https://www.wikidata.org/" website-title="Wikidata"/>
+		<h3>DBpedia</h3>
+		<p>
+			They extract structured content from the information created in various Wikimedia projects.
+			And publish this knowledge graph for everyone.
+		</p>
+		<m:sparql-endpoint url="https://dbpedia.org/sparql" website-url="https://wiki.dbpedia.org/" website-title="DBpedia"/>
+		<h3>Czech government</h3>
+		<p>
+			Ministries and other institutions publish some data as open data and part of them as linked open data (LOD).
+		</p>
+		<m:sparql-endpoint url="https://data.gov.cz/sparql" website-url="https://data.gov.cz/english/" website-title="Open data portal of the Czech Republic"/>
+		<m:sparql-endpoint url="https://data.cssz.cz/sparql" website-url="https://data.cssz.cz/" website-title="Open data portal of the Czech Social Security Administration"/>
+		<m:sparql-endpoint url="https://cedropendata.mfcr.cz/c3lod/cedr/sparql" website-url="https://cedropendata.mfcr.cz/" website-title="Open Data CEDR III"/>
+		<h2>Running SPARQL queries as scripts</h2>
+		<p>Besides piping SPARQL queries through <code>relpipe-in-sparql</code> like this:</p>
+		<m:pre jazyk="bash"><![CDATA[cat query.sparql | relpipe-in-sparql | relpipe-out-tabular]]></m:pre>
+		<p>we can make them executable and run like a (Bash, Perl, PHP etc.) script:</p>
+		<m:pre jazyk="bash"><![CDATA[chmod +x query.sparql
+./query.sparql | relpipe-out-csv     # output in the CSV format
+./query.sparql | relpipe-out-recfile # output in the Recfile format
+./query.sparql                       # automatically appends relpipe-out-tabular to the pipeline
+]]></m:pre>
+		<p>(see the <m:a href="implementation">Implementation</m:a> page for complete list of available transformations and output filters)</p>
+		<p>
+			We need to add the first line comment that points to the interpreter.
+			The <code>endpoint</code> and <code>relation</code> parameters
+			are optional – we can say, where this query will be executed and how the output relation will be named:
+		</p>
+		<m:pre jazyk="sparql" src="examples/rdf-sample-triples.sparql" odkaz="ano"/>
+		<p>
+			Environmental variables <code>RELPIPE_IN_SPARQL_ENDPOINT</code> and <code>RELPIPE_IN_SPARQL_RELATION</code>
+			can be set to override the parameters from the file.
+			All the magic is done by this (bit hackish) helper script:
+		</p>
+		<m:pre jazyk="bash" src="examples/rdf-sparql-interpreter.sh" odkaz="ano"/>
+		<p>
+			This script requires the <code>relpipe-in-sparql</code> we put together earlier.
+			Both scripts are just examples (not part of any release yet).
+		</p>
+		<h2>Samples of SPARQL queries</h2>
+		<p>
+			<i>Hey kid, rock and roll,</i>
+			let us list the films where both Coreys starred:
+		</p>
+		<m:sparql-example name="examples/rdf-coreys"/>
+		<p>
+			<i>So Mercedes has scratched our Cadillac, but it was still a great night. </i>
+		</p>
+		<p>Now it is time to visit our friends from the club:</p>
+		<m:sparql-example name="examples/rdf-breakfast-club"/>
+		<p>
+			Not only <i>pretty in pink</i>, this is true <i>wisdom</i> and we could have much fun traversing this part of the graph.
+			But let us turn the globe around… there is also a lot to see in the Eastern Bloc.
+		</p>
+		<m:sparql-example name="examples/rdf-blonde-and-brunette"/>
+		<p>
+			<i>
+				Dad, what is this place?
+				Where are we?
+				Is there anyone here?<br/>
+				No. Just us.
+			</i>
+		</p>
+		<m:sparql-example name="examples/rdf-return"/>
+		<h2>P.S.</h2>
+		<p>
+			<i>
+				If you got an impression that RDF is just a poor relational database with a single table consisting of mere three columns
+				and with freaky SQL dialect, please be assured that this example shows just a small fraction of the wonderful RDF world.
+			</i>
+		</p>
+	</text>
+</stránka>

branch	v_0
changeset 310	aeda3cb4528d
child 312	0a65e49a076f