|
1 <stránka |
|
2 xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana" |
|
3 xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro"> |
|
4 |
|
5 <nadpis>Querying an RDF triplestore using SPARQL</nadpis> |
|
6 <perex>use SQL-DK with Jena JDBC driver or a custom script to gather linked data</perex> |
|
7 <m:pořadí-příkladu>04300</m:pořadí-příkladu> |
|
8 |
|
9 <text xmlns="http://www.w3.org/1999/xhtml"> |
|
10 |
|
11 <p> |
|
12 In the Resource Description Framework (<a href="https://www.w3.org/RDF/">RDF</a>) world, there are no relations. |
|
13 The data model is quite different. |
|
14 It is built on top of triples: subject – predicate – object. |
|
15 Despite there are no tables (compared to relational databases), RDF is not a schema-less clutter – |
|
16 actually RDF has a schema (ontology, vocabulary), just differently shaped. |
|
17 Subjects and predicates are identified by <a href="https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier">IRI</a>s |
|
18 (or formerly <a href="https://en.wikipedia.org/wiki/Uniform_Resource_Identifier">URI</a>s) |
|
19 that are globally unique (compared to primary keys in relational databases that are almost never globally unique). |
|
20 Objects are also identified by IRIs (and yes, one can be both subject and object) or they can be a primitive values like a text string or a number. |
|
21 </p> |
|
22 |
|
23 <m:diagram orientace="vodorovně"> |
|
24 node [fontname = "Latin Modern Sans, sans-serif"]; |
|
25 edge [fontname = "Latin Modern Sans, sans-serif"]; |
|
26 subject -> object [ label = "predicate"]; |
|
27 </m:diagram> |
|
28 |
|
29 <p> |
|
30 This <em>triple</em> is also called a <em>statement</em>. |
|
31 In the following statement: |
|
32 </p> |
|
33 |
|
34 <blockquote> |
|
35 <m:name/> tools are released under the GNU GPL license. |
|
36 </blockquote> |
|
37 |
|
38 <p>we recognize:</p> |
|
39 |
|
40 <ul> |
|
41 <li> |
|
42 Subject: <i> |
|
43 <m:name/> tools</i> |
|
44 </li> |
|
45 <li>Predicate: <i>is released under license</i></li> |
|
46 <li>Object: <i>GNU GPL</i></li> |
|
47 </ul> |
|
48 |
|
49 <p> |
|
50 This data model is seemingly simple: just a graph, two kinds of nodes and edges connecting them together. |
|
51 Or a flat list of statements (triples). |
|
52 But it can be also very complicated, depending on how we use it and how rich ontologies we design. |
|
53 RDF can be studied for years and is a great topic for diploma thesis and dissertations, |
|
54 but in this example, we will keep it as simple as possible. |
|
55 </p> |
|
56 |
|
57 <p> |
|
58 Collections of statements are stored in special databases called triplestores. |
|
59 The data inside can be queried using the |
|
60 <a href="https://www.w3.org/TR/sparql11-overview/">SPARQL</a> language through the endpoint provided by the triplestore. |
|
61 Popular implementations are |
|
62 <a href="https://jena.apache.org/">Jena</a>, |
|
63 <a href="http://vos.openlinksw.com/owiki/wiki/VOS">Virtuoso</a> and |
|
64 <a href="https://rdf4j.org/about/">RDF4J</a> |
|
65 (all free software). |
|
66 </p> |
|
67 |
|
68 <p> |
|
69 Relational model can be easily mapped to RDF. |
|
70 We can just simply add a prefix to the primary keys to make them globally unique IRIs. |
|
71 The attributes will become predicates (also prefixed). |
|
72 And the values will become objects (either primitive values or IRIs in case of foreign keys). |
|
73 Of course, more complex transformation can be done – this is the most straightforward way. |
|
74 </p> |
|
75 |
|
76 <p> |
|
77 Mapping RDF data to relational model is bit more difficult. |
|
78 Sometimes easy, sometimes very cumbersome. |
|
79 We can always design some kind of EAV (entity – attribute – value) model in the relational database |
|
80 or we can create a relation for each predicate… |
|
81 If we do some universal automatic mapping and retain the flexibility of RDF and richness of the original ontology, |
|
82 we usually lose the performance and simplicity of our relational queries. |
|
83 Good mapping that will feel natural and idiomatic in the relational world and will perform well usually poses some hard work. |
|
84 </p> |
|
85 |
|
86 <p> |
|
87 But mapping mere results of a SPARQL query obtained from an RDF endpoint is a different story. |
|
88 These results can be seen as records and processed using our relational tools, |
|
89 stored, transformed or converted to other formats, displayed in GUI windows or safely passed to shell scripts. |
|
90 This example shows how we can bridge the RDF and relational worlds. |
|
91 </p> |
|
92 |
|
93 |
|
94 <h2>Several ways of connecting to an RDF triplestore</h2> |
|
95 |
|
96 <p> |
|
97 Currently there is no official <code>relpipe-in-rdf</code> or <code>relpipe-in-sparql</code> tool. |
|
98 It will be probably part of some future release of <m:name/>. |
|
99 But until then, despite this lack, we still have several options how to join the RDF world |
|
100 and let the data from an RDF triplestore flow through our relational pipelines: |
|
101 </p> |
|
102 |
|
103 <ul> |
|
104 <li>SQL-DK + Jena JDBC driver + <code>relpipe-in-xml</code></li> |
|
105 <li>ODBC-JDBC bridge + Jena JDBC driver + <code>relpipe-in-sql</code></li> |
|
106 <li>A native SPARQL ODBC driver + <code>relpipe-in-sql</code></li> |
|
107 <li>A shell script + <code>relpipe-in-csv</code> or <code>relpipe-in-xml</code></li> |
|
108 </ul> |
|
109 |
|
110 <p>In this example, we will look at the first and the last option.</p> |
|
111 |
|
112 <h2>SQL-DK + Jena JDBC driver</h2> |
|
113 |
|
114 |
|
115 <p> |
|
116 Apache Jena is not only a triplestore, |
|
117 it is a framework consisting of several parts |
|
118 and provides also a special JDBC driver that is ready to use |
|
119 (despite this <a href="https://issues.apache.org/jira/browse/JENA-1939">small bug</a>). |
|
120 Thanks to this driver, we can use existing Java tools and run SPARQL queries instead of SQL ones. |
|
121 </p> |
|
122 |
|
123 <p> |
|
124 Such a tool that uses this standard API (JDBC) |
|
125 is <a href="https://sql-dk.globalcode.info/">SQL-DK</a>. |
|
126 This tool integrates well with <m:name/> because it can output results in the XML format (or alternatively the Recfile format) |
|
127 that can be directly consumed by <code>relpipe-in-xml</code> (or alternatively <code>relpipe-in-recfile</code>). |
|
128 </p> |
|
129 |
|
130 <p>First we download Jena source codes:</p> |
|
131 |
|
132 <m:pre jazyk="bash"><![CDATA[mkdir -p ~/src; cd ~/src |
|
133 git clone https://gitbox.apache.org/repos/asf/jena.git]]></m:pre> |
|
134 |
|
135 <p> |
|
136 and apply the <a href="https://git-zaloha.frantovo.cz/gitbox.apache.org/repos/asf/jena.git/commit/?h=JENA-1939_updateCount&id=bdb5439d22b80b2909258449d82fb7b5003fd64c">patch</a> |
|
137 for abovementioned bug (if not already merged in the upstream). |
|
138 </p> |
|
139 |
|
140 <p>n.b. As always when doing such experiments, we would probably run this under a separate user account or in a virtual machine.</p> |
|
141 |
|
142 <p>Then we will compile the JDBC driver:</p> |
|
143 |
|
144 <m:pre jazyk="bash"><![CDATA[cd ~/src/jena/jena-jdbc/ |
|
145 mvn clean install]]></m:pre> |
|
146 |
|
147 <p> |
|
148 Now we will install SQL-DK (either from sources or from <code>.deb</code> or <code>.rpm</code> package) |
|
149 and run it for the first time (which creates the configuration directory and files): |
|
150 </p> |
|
151 |
|
152 <pre>sql-dk --list-databases</pre> |
|
153 |
|
154 <p>Then we will register the previously compiled Jena JDBC driver in the <code>~/.sql-dk/environment.sh</code></p> |
|
155 |
|
156 <m:pre jazyk="bash"><![CDATA[CUSTOM_JDBC=( |
|
157 ~/src/jena/jena-jdbc/jena-jdbc-driver-bundle/target/jena-jdbc-driver-bundle-*.jar |
|
158 );]]></m:pre> |
|
159 |
|
160 <p>And we should see it among other drivers:</p> |
|
161 |
|
162 <pre><![CDATA[$ sql-dk --list-jdbc-drivers |
|
163 ╭──────────────────────────────────────────────────┬───────────────────┬─────────────────┬─────────────────┬──────────────────────────╮ |
|
164 │ class (VARCHAR) │ version (VARCHAR) │ major (INTEGER) │ minor (INTEGER) │ jdbc_compliant (BOOLEAN) │ |
|
165 ├──────────────────────────────────────────────────┼───────────────────┼─────────────────┼─────────────────┼──────────────────────────┤ |
|
166 │ org.postgresql.Driver │ 9.4 │ 9 │ 4 │ false │ |
|
167 │ com.mysql.jdbc.Driver │ 5.1 │ 5 │ 1 │ false │ |
|
168 │ org.sqlite.JDBC │ 3.25 │ 3 │ 25 │ false │ |
|
169 │ org.apache.jena.jdbc.mem.MemDriver │ 1.0 │ 1 │ 0 │ false │ |
|
170 │ org.apache.jena.jdbc.remote.RemoteEndpointDriver │ 1.0 │ 1 │ 0 │ false │ |
|
171 │ org.apache.jena.jdbc.tdb.TDBDriver │ 1.0 │ 1 │ 0 │ false │ |
|
172 ╰──────────────────────────────────────────────────┴───────────────────┴─────────────────┴─────────────────┴──────────────────────────╯ |
|
173 Record count: 6]]></pre> |
|
174 |
|
175 <p>The driver seems present so we can configure the connection in the <code>~/.sql-dk/config.xml</code> file:</p> |
|
176 |
|
177 <m:pre jazyk="xml"><![CDATA[<database> |
|
178 <name>rdf-dbpedia</name> |
|
179 <url>jdbc:jena:remote:query=http://dbpedia.org/sparql</url> |
|
180 <userName></userName> |
|
181 <password></password> |
|
182 </database>]]></m:pre> |
|
183 |
|
184 <p> |
|
185 This will connect us to the DBpedia endpoint (more datasources are mentioned in the chapter below). |
|
186 We can test the connection: |
|
187 </p> |
|
188 |
|
189 <pre><![CDATA[$ sql-dk --test-connection rdf-dbpedia |
|
190 ╭─────────────────────────┬──────────────────────┬─────────────────────┬────────────────────────┬───────────────────────────╮ |
|
191 │ database_name (VARCHAR) │ configured (BOOLEAN) │ connected (BOOLEAN) │ product_name (VARCHAR) │ product_version (VARCHAR) │ |
|
192 ├─────────────────────────┼──────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────┤ |
|
193 │ rdf-dbpedia │ true │ true │ │ │ |
|
194 ╰─────────────────────────┴──────────────────────┴─────────────────────┴────────────────────────┴───────────────────────────╯ |
|
195 Record count: 1]]></pre> |
|
196 |
|
197 <p>and run our first SPARQL query:</p> |
|
198 |
|
199 <pre><![CDATA[$ sql-dk --db rdf-dbpedia --formatter tabular-prefetching --sql "SELECT * WHERE { ?subject ?predicate ?object . } LIMIT 8" |
|
200 ╭──────────────────────────────────────────────────────────────────────────────┬─────────────────────────────────────────────────┬─────────────────────────────────────────────────────────╮ |
|
201 │ subject (org.apache.jena.graph.Node) │ predicate (org.apache.jena.graph.Node) │ object (org.apache.jena.graph.Node) │ |
|
202 ├──────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────┼─────────────────────────────────────────────────────────┤ |
|
203 │ http://www.openlinksw.com/virtrdf-data-formats#default-iid │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │ |
|
204 │ http://www.openlinksw.com/virtrdf-data-formats#default-iid-nullable │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │ |
|
205 │ http://www.openlinksw.com/virtrdf-data-formats#default-iid-nonblank │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │ |
|
206 │ http://www.openlinksw.com/virtrdf-data-formats#default-iid-nonblank-nullable │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │ |
|
207 │ http://www.openlinksw.com/virtrdf-data-formats#default │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │ |
|
208 │ http://www.openlinksw.com/virtrdf-data-formats#default-nullable │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │ |
|
209 │ http://www.openlinksw.com/virtrdf-data-formats#sql-varchar │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │ |
|
210 │ http://www.openlinksw.com/virtrdf-data-formats#sql-varchar-nullable │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat │ |
|
211 ╰──────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────┴─────────────────────────────────────────────────────────╯ |
|
212 Record count: 8]]></pre> |
|
213 |
|
214 <p> |
|
215 Not a big fun yet, but it proves that the connection is working and we are getting some results from the endpoint. |
|
216 We will run some more interesting queries later. |
|
217 </p> |
|
218 |
|
219 <p> |
|
220 When we switch to the <code>--formatter xml</code> we can pipe the stream from SQL-DK |
|
221 to <code>relpipe-in-xml</code> and then process it using relational tools. |
|
222 We can also use the <code>--sql-in</code> option of SQL-DK which reads the query from STDIN (instead of from command line argument) |
|
223 and then wrap it as a reusable script that reads SPARQL and outputs relational data: |
|
224 </p> |
|
225 |
|
226 <m:pre jazyk="bash">sql-dk --db "rdf-dbpedia" --formatter "xml" --sql-in | relpipe-in-xml</m:pre> |
|
227 |
|
228 <p> |
|
229 For accessing remote SPARQL endpoint this is a bit overkill with lot of dependencies (so we will use different approach in the next chapter). |
|
230 But Jena JDBC driver is not only for accessing remote endpoints – we can use it as an embedded database, |
|
231 either an in-memory one or regular DB backed by persistent files. |
|
232 </p> |
|
233 |
|
234 <p> |
|
235 The in-memory database loads some initial data and then operates on them. |
|
236 So we configure such connection: |
|
237 </p> |
|
238 |
|
239 <m:pre jazyk="xml"><![CDATA[<database> |
|
240 <name>rdf-in-memory</name> |
|
241 <url>jdbc:jena:mem:dataset=/tmp/rdf-initial-data.ttl</url> |
|
242 <userName></userName> |
|
243 <password></password> |
|
244 </database>]]></m:pre> |
|
245 |
|
246 <p>It runs fine, but <a href="https://en.wikipedia.org/wiki/Turtle_(syntax)">turtles</a> are not at home:</p> |
|
247 |
|
248 <pre><![CDATA[$ echo > /tmp/rdf-initial-data.ttl |
|
249 $ echo "SELECT * WHERE { ?subject ?predicate ?object . }" | sql-dk --db rdf-in-memory --formatter tabular-prefetching --sql-in |
|
250 ╭──────────────────────────────────────┬────────────────────────────────────────┬─────────────────────────────────────╮ |
|
251 │ subject (org.apache.jena.graph.Node) │ predicate (org.apache.jena.graph.Node) │ object (org.apache.jena.graph.Node) │ |
|
252 ├──────────────────────────────────────┼────────────────────────────────────────┼─────────────────────────────────────┤ |
|
253 ╰──────────────────────────────────────┴────────────────────────────────────────┴─────────────────────────────────────╯ |
|
254 Record count: 0]]></pre> |
|
255 |
|
256 <p> |
|
257 If we are in a desperate need of turtles and have installed any <a href="https://lv2plug.in/">LV2</a> plugins, |
|
258 we can find some and put them in our initial data file or reconfigure the database connection: |
|
259 </p> |
|
260 |
|
261 <pre><![CDATA[$ find /usr/lib -name '*.ttl' | head |
|
262 /usr/lib/lv2/fil4.lv2/manifest.ttl |
|
263 /usr/lib/lv2/fil4.lv2/fil4.ttl |
|
264 /usr/lib/ardour5/LV2/a-fluidsynth.lv2/manifest.ttl |
|
265 /usr/lib/ardour5/LV2/a-fluidsynth.lv2/a-fluidsynth.ttl |
|
266 /usr/lib/ardour5/LV2/reasonablesynth.lv2/manifest.ttl |
|
267 /usr/lib/ardour5/LV2/reasonablesynth.lv2/reasonablesynth.ttl |
|
268 /usr/lib/ardour5/LV2/a-delay.lv2/manifest.ttl |
|
269 /usr/lib/ardour5/LV2/a-delay.lv2/presets.ttl |
|
270 /usr/lib/ardour5/LV2/a-delay.lv2/a-delay.ttl |
|
271 /usr/lib/ardour5/LV2/a-eq.lv2/manifest.ttl |
|
272 |
|
273 $ cat /usr/lib/lv2/fil4.lv2/manifest.ttl > /tmp/rdf-initial-data.ttl |
|
274 $ sed s@/tmp/rdf-initial-data.ttl@/usr/lib/lv2/fil4.lv2/manifest.ttl@g -i ~/.sql-dk/config.xml]]></pre> |
|
275 |
|
276 <p>and look through Jena/RDF/SPARQL what is inside:</p> |
|
277 |
|
278 <pre><![CDATA[$ echo "SELECT * WHERE { ?subject ?predicate ?object . }" | sql-dk --db rdf-in-memory --formatter xml --sql-in | relpipe-in-xml | relpipe-out-tabular |
|
279 r1: |
|
280 ╭───────────────────────────────────────┬─────────────────────────────────────────────────┬───────────────────────────────────────────╮ |
|
281 │ subject (string) │ predicate (string) │ object (string) │ |
|
282 ├───────────────────────────────────────┼─────────────────────────────────────────────────┼───────────────────────────────────────────┤ |
|
283 │ http://gareus.org/oss/lv2/fil4#ui_gl │ http://www.w3.org/2000/01/rdf-schema#seeAlso │ file:///usr/lib/lv2/fil4.lv2/fil4.ttl │ |
|
284 │ http://gareus.org/oss/lv2/fil4#ui_gl │ http://lv2plug.in/ns/extensions/ui#binary │ file:///usr/lib/lv2/fil4.lv2/fil4UI_gl.so │ |
|
285 │ http://gareus.org/oss/lv2/fil4#ui_gl │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://lv2plug.in/ns/extensions/ui#X11UI │ |
|
286 │ http://gareus.org/oss/lv2/fil4#mono │ http://www.w3.org/2000/01/rdf-schema#seeAlso │ file:///usr/lib/lv2/fil4.lv2/fil4.ttl │ |
|
287 │ http://gareus.org/oss/lv2/fil4#mono │ http://lv2plug.in/ns/lv2core#binary │ file:///usr/lib/lv2/fil4.lv2/fil4.so │ |
|
288 │ http://gareus.org/oss/lv2/fil4#mono │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://lv2plug.in/ns/lv2core#Plugin │ |
|
289 │ http://gareus.org/oss/lv2/fil4#stereo │ http://www.w3.org/2000/01/rdf-schema#seeAlso │ file:///usr/lib/lv2/fil4.lv2/fil4.ttl │ |
|
290 │ http://gareus.org/oss/lv2/fil4#stereo │ http://lv2plug.in/ns/lv2core#binary │ file:///usr/lib/lv2/fil4.lv2/fil4.so │ |
|
291 │ http://gareus.org/oss/lv2/fil4#stereo │ http://www.w3.org/1999/02/22-rdf-syntax-ns#type │ http://lv2plug.in/ns/lv2core#Plugin │ |
|
292 ╰───────────────────────────────────────┴─────────────────────────────────────────────────┴───────────────────────────────────────────╯ |
|
293 Record count: 9]]></pre> |
|
294 |
|
295 <p> |
|
296 Now we can be sure that LV2 uses the Turtle format for plugin configurations, |
|
297 which is quite ingenious and inspirational – |
|
298 such configuration is well structured and its options (predicates in general) have globally unique identifiers (IRIs). |
|
299 Also plugins are identified by IRIs which is great, because it avoids name collisions. |
|
300 </p> |
|
301 |
|
302 <p> |
|
303 Let us make some own turtles. |
|
304 Reconfigure the database connection back: |
|
305 </p> |
|
306 |
|
307 <pre>sed s@/usr/lib/lv2/fil4.lv2/manifest.ttl@/tmp/rdf-initial-data.ttl@g -i ~/.sql-dk/config.xml</pre> |
|
308 |
|
309 <p>and fill the <code>/tmp/rdf-initial-data.ttl</code> with some new data:</p> |
|
310 |
|
311 <m:pre jazyk="turtle"><![CDATA[<http://example.org/person/you> |
|
312 <http://example.org/predicate/have> |
|
313 <http://example.org/thing/nice-day> .]]></m:pre> |
|
314 |
|
315 <p> |
|
316 Turtle is a simple format that contains statements. |
|
317 Subjects, predicates and objects are separated by spaces (tabs and line-ends are here just to make it more readable for us). |
|
318 And statements end with <i>full stop</i> like ordinary sentences. |
|
319 </p> |
|
320 |
|
321 <p> |
|
322 To avoid repeating common parts of IRIs we can declare namespace prefixes: |
|
323 </p> |
|
324 |
|
325 <m:pre jazyk="turtle"><![CDATA[@prefix person: <http://example.org/person/> . |
|
326 @prefix predicate: <http://example.org/predicate/> . |
|
327 @prefix thing: <http://example.org/thing/> . |
|
328 |
|
329 person:you |
|
330 predicate:have |
|
331 thing:nice-day .]]></m:pre> |
|
332 |
|
333 <p> |
|
334 This format is very concise. |
|
335 If we describe the same subject, we use <i>semicolon</i> to avoid repeating it. |
|
336 And if even the predicate is the same (multiple values), we use <i>comma</i>: |
|
337 </p> |
|
338 |
|
339 <m:pre jazyk="turtle"><![CDATA[@prefix person: <http://example.org/person/> . |
|
340 @prefix predicate: <http://example.org/predicate/> . |
|
341 @prefix thing: <http://example.org/thing/> . |
|
342 |
|
343 person:you |
|
344 predicate:have |
|
345 thing:nice-day, thing:much-fun; |
|
346 predicate:read-about |
|
347 thing:relational-pipes .]]></m:pre> |
|
348 |
|
349 <p> |
|
350 Jena will parse our file and respond to our basic query with these data: |
|
351 </p> |
|
352 |
|
353 <pre><![CDATA[$ echo "SELECT * WHERE { ?subject ?predicate ?object . }" | sql-dk --db rdf-in-memory --formatter xml --sql-in --relation rdf_results | relpipe-in-xml | relpipe-out-tabular |
|
354 rdf_results: |
|
355 ╭───────────────────────────────┬─────────────────────────────────────────┬───────────────────────────────────────────╮ |
|
356 │ subject (string) │ predicate (string) │ object (string) │ |
|
357 ├───────────────────────────────┼─────────────────────────────────────────┼───────────────────────────────────────────┤ |
|
358 │ http://example.org/person/you │ http://example.org/predicate/read-about │ http://example.org/thing/relational-pipes │ |
|
359 │ http://example.org/person/you │ http://example.org/predicate/have │ http://example.org/thing/much-fun │ |
|
360 │ http://example.org/person/you │ http://example.org/predicate/have │ http://example.org/thing/nice-day │ |
|
361 ╰───────────────────────────────┴─────────────────────────────────────────┴───────────────────────────────────────────╯ |
|
362 Record count: 3]]></pre> |
|
363 |
|
364 <p>Or if we prefer more vertical formats like Recfile:</p> |
|
365 |
|
366 <pre><![CDATA[$ echo "SELECT * WHERE { ?subject ?predicate ?object . }" | sql-dk --db rdf-in-memory --formatter xml --sql-in --relation rdf_results | relpipe-in-xml | relpipe-out-recfile |
|
367 %rec: rdf_results |
|
368 |
|
369 subject: http://example.org/person/you |
|
370 predicate: http://example.org/predicate/read-about |
|
371 object: http://example.org/thing/relational-pipes |
|
372 |
|
373 subject: http://example.org/person/you |
|
374 predicate: http://example.org/predicate/have |
|
375 object: http://example.org/thing/much-fun |
|
376 |
|
377 subject: http://example.org/person/you |
|
378 predicate: http://example.org/predicate/have |
|
379 object: http://example.org/thing/nice-day]]></pre> |
|
380 |
|
381 <p>Let us create some more data:</p> |
|
382 |
|
383 <m:pre jazyk="turtle" src="examples/rdf-heathers.ttl"/> |
|
384 |
|
385 <p>list them as statements:</p> |
|
386 |
|
387 <m:pre jazyk="text" src="examples/rdf-heathers.txt"/> |
|
388 |
|
389 <p>and run some more SPARQL queries…</p> |
|
390 |
|
391 <p> |
|
392 Note: |
|
393 <em> |
|
394 we use <a href="https://tools.ietf.org/html/rfc4151">The tag: URI scheme</a> for our IRIs. |
|
395 It makes URIs (IRIs) globally unique not only in space but also in time (domain owners change during time). |
|
396 Which is great. |
|
397 In the semantic web and linked data world, it is not common and locators (URLs) are used rather than pure identifiers (URIs, IRIs). |
|
398 But here we want to emphasise that we work strictly with our local data |
|
399 and make it clear that we do not depend on any on-line resources and nothing will be downloaded from remote servers. |
|
400 And in a real project, we should use existing ontologies / vocabularies as much as possible instead of inventing new ones. |
|
401 But we keep this example rather isolated from the complexity of the outer world and bit synthetic. |
|
402 </em> |
|
403 </p> |
|
404 |
|
405 <p>Find all quotes and names of their authors:</p> |
|
406 <m:sparql-example name="examples/rdf-heathers-quotes"/> |
|
407 |
|
408 <p>List groups and counts of their members:</p> |
|
409 <m:sparql-example name="examples/rdf-heathers-members"/> |
|
410 |
|
411 <p>Filter by a regular expression and list actor names rather than characters:</p> |
|
412 <m:sparql-example name="examples/rdf-heathers-much"/> |
|
413 |
|
414 <p>Now imagine semantic model of Twin Peaks… How very!</p> |
|
415 |
|
416 <h2>Improvised relpipe-in-sparql tool</h2> |
|
417 |
|
418 <p> |
|
419 Starting the JVM and creating always a new database from scratch on each query is quite… <i>heavy</i>. |
|
420 We can keep Jena running in the background and connect to its SPARQL endpoint – or connect to any other endpoint on the internet. |
|
421 So we will hack together a light script and name it <code>relpipe-in-sparql</code> (in some future release there will be such official tool). |
|
422 </p> |
|
423 |
|
424 <p> |
|
425 Because SPARQL endpoints accept plain HTTP requests, support besides XML also CSV and we already have <code>relpipe-in-csv</code> |
|
426 the script can be very simple: |
|
427 </p> |
|
428 |
|
429 <m:pre jazyk="bash"><![CDATA[curl \ |
|
430 --header "Accept: text/csv" \ |
|
431 --data-urlencode query="SELECT * WHERE { ?subject ?predicate ?object . } LIMIT 3" \ |
|
432 https://dbpedia.org/sparql | relpipe-in-csv | relpipe-out-tabular]]></m:pre> |
|
433 |
|
434 <p> |
|
435 It becomes bit longer if we add some documentation, argument parsing and configuration: |
|
436 </p> |
|
437 |
|
438 |
|
439 <m:pre jazyk="bash" src="examples/relpipe-in-sparql.sh" odkaz="ano"/> |
|
440 |
|
441 <p> |
|
442 Here we have even two implementations that could be switched using the <code>RELPIPE_IN_SPARQL_IMPLEMENTATION</code> environmental variable. |
|
443 The XML one is more powerful and can be customized (e.g. to specifically handle localized strings or add some new attributes to the relational output). |
|
444 On the other hand, the CSV one has fewer dependencies and support streaming of long result sets (XSLT needs to load whole document first). |
|
445 </p> |
|
446 |
|
447 <p>Both implementation should work:</p> |
|
448 |
|
449 <m:pre jazyk="bash"><![CDATA[export RELPIPE_IN_SPARQL_IMPLEMENTATION=xml |
|
450 export RELPIPE_IN_SPARQL_IMPLEMENTATION=csv |
|
451 echo 'SELECT * WHERE { ?subject ?predicate "Laura Dern"@en . } LIMIT 3' \ |
|
452 | relpipe-in-sparql \ |
|
453 --relation "jurassic" \ |
|
454 --endpoint "https://dbpedia.org/sparql" \ |
|
455 | relpipe-out-tabular]]></m:pre> |
|
456 |
|
457 <p>and produce the same output:</p> |
|
458 |
|
459 <pre><![CDATA[jurassic: |
|
460 ╭────────────────────────────────────────┬────────────────────────────────────────────╮ |
|
461 │ subject (string) │ predicate (string) │ |
|
462 ├────────────────────────────────────────┼────────────────────────────────────────────┤ |
|
463 │ http://dbpedia.org/resource/Laura_Dern │ http://www.w3.org/2000/01/rdf-schema#label │ |
|
464 │ http://www.wikidata.org/entity/Q220901 │ http://www.w3.org/2000/01/rdf-schema#label │ |
|
465 │ http://dbpedia.org/resource/Laura_Dern │ http://xmlns.com/foaf/0.1/name │ |
|
466 ╰────────────────────────────────────────┴────────────────────────────────────────────╯ |
|
467 Record count: 3]]></pre> |
|
468 |
|
469 <p>And maybe somewhere nearby in the graph we will find:</p> |
|
470 |
|
471 <blockquote>It's a Unix System… I know this!</blockquote> |
|
472 |
|
473 <h2>Sources of RDF data</h2> |
|
474 |
|
475 <p></p> |
|
476 |
|
477 <p> |
|
478 The bad news are that we are not querying the real world. |
|
479 We are querying an imperfect, incomplete and outdated snapshot of the reality stored in someone's database. |
|
480 The good news are that we can improve the content of certain databases like we improve articles in Wikipedia. |
|
481 </p> |
|
482 |
|
483 <p> |
|
484 Some addresses have already <i>leaked</i> in the <code>relpipe-in-sparql --help</code> above. |
|
485 Here is brief description of some publicly available sources of RDF data |
|
486 that we can play with. |
|
487 </p> |
|
488 |
|
489 |
|
490 <h3>Wikidata</h3> |
|
491 |
|
492 <p> |
|
493 A free and open knowledge base, a sister project of Wikipedia. |
|
494 Anyone can use and even edit its content. |
|
495 </p> |
|
496 |
|
497 <m:sparql-endpoint url="https://query.wikidata.org/sparql" website-url="https://www.wikidata.org/" website-title="Wikidata"/> |
|
498 |
|
499 |
|
500 <h3>DBpedia</h3> |
|
501 |
|
502 <p> |
|
503 They extract structured content from the information created in various Wikimedia projects. |
|
504 And publish this knowledge graph for everyone. |
|
505 </p> |
|
506 |
|
507 <m:sparql-endpoint url="https://dbpedia.org/sparql" website-url="https://wiki.dbpedia.org/" website-title="DBpedia"/> |
|
508 |
|
509 <h3>Czech government</h3> |
|
510 <p> |
|
511 Ministries and other institutions publish some data as open data and part of them as linked open data (LOD). |
|
512 </p> |
|
513 |
|
514 <m:sparql-endpoint url="https://data.gov.cz/sparql" website-url="https://data.gov.cz/english/" website-title="Open data portal of the Czech Republic"/> |
|
515 <m:sparql-endpoint url="https://data.cssz.cz/sparql" website-url="https://data.cssz.cz/" website-title="Open data portal of the Czech Social Security Administration"/> |
|
516 <m:sparql-endpoint url="https://cedropendata.mfcr.cz/c3lod/cedr/sparql" website-url="https://cedropendata.mfcr.cz/" website-title="Open Data CEDR III"/> |
|
517 |
|
518 |
|
519 <h2>Running SPARQL queries as scripts</h2> |
|
520 |
|
521 <p>Besides piping SPARQL queries through <code>relpipe-in-sparql</code> like this:</p> |
|
522 <m:pre jazyk="bash"><![CDATA[cat query.sparql | relpipe-in-sparql | relpipe-out-tabular]]></m:pre> |
|
523 |
|
524 <p>we can make them executable and run like a (Bash, Perl, PHP etc.) script:</p> |
|
525 <m:pre jazyk="bash"><![CDATA[chmod +x query.sparql |
|
526 ./query.sparql | relpipe-out-csv # output in the CSV format |
|
527 ./query.sparql | relpipe-out-recfile # output in the Recfile format |
|
528 ./query.sparql # automatically appends relpipe-out-tabular to the pipeline |
|
529 ]]></m:pre> |
|
530 |
|
531 <p>(see the <m:a href="implementation">Implementation</m:a> page for complete list of available transformations and output filters)</p> |
|
532 |
|
533 <p> |
|
534 We need to add the first line comment that points to the interpreter. |
|
535 The <code>endpoint</code> and <code>relation</code> parameters |
|
536 are optional – we can say, where this query will be executed and how the output relation will be named: |
|
537 </p> |
|
538 |
|
539 <m:pre jazyk="sparql" src="examples/rdf-sample-triples.sparql" odkaz="ano"/> |
|
540 |
|
541 <p> |
|
542 Environmental variables <code>RELPIPE_IN_SPARQL_ENDPOINT</code> and <code>RELPIPE_IN_SPARQL_RELATION</code> |
|
543 can be set to override the parameters from the file. |
|
544 All the magic is done by this (bit hackish) helper script: |
|
545 </p> |
|
546 |
|
547 <m:pre jazyk="bash" src="examples/rdf-sparql-interpreter.sh" odkaz="ano"/> |
|
548 |
|
549 <p> |
|
550 This script requires the <code>relpipe-in-sparql</code> we put together earlier. |
|
551 Both scripts are just examples (not part of any release yet). |
|
552 </p> |
|
553 |
|
554 |
|
555 |
|
556 <h2>Samples of SPARQL queries</h2> |
|
557 |
|
558 <p> |
|
559 <i>Hey kid, rock and roll,</i> |
|
560 let us list the films where both Coreys starred: |
|
561 </p> |
|
562 |
|
563 <m:sparql-example name="examples/rdf-coreys"/> |
|
564 |
|
565 <p> |
|
566 <i>So Mercedes has scratched our Cadillac, but it was still a great night. </i> |
|
567 </p> |
|
568 |
|
569 <p>Now it is time to visit our friends from the club:</p> |
|
570 |
|
571 <m:sparql-example name="examples/rdf-breakfast-club"/> |
|
572 |
|
573 <p> |
|
574 Not only <i>pretty in pink</i>, this is true <i>wisdom</i> and we could have much fun traversing this part of the graph. |
|
575 But let us turn the globe around… there is also a lot to see in the Eastern Bloc. |
|
576 </p> |
|
577 |
|
578 <m:sparql-example name="examples/rdf-blonde-and-brunette"/> |
|
579 |
|
580 <p> |
|
581 <i> |
|
582 Dad, what is this place? |
|
583 Where are we? |
|
584 Is there anyone here?<br/> |
|
585 No. Just us. |
|
586 </i> |
|
587 </p> |
|
588 |
|
589 <m:sparql-example name="examples/rdf-return"/> |
|
590 |
|
591 <h2>P.S.</h2> |
|
592 <p> |
|
593 <i> |
|
594 If you got an impression that RDF is just a poor relational database with a single table consisting of mere three columns |
|
595 and with freaky SQL dialect, please be assured that this example shows just a small fraction of the wonderful RDF world. |
|
596 </i> |
|
597 </p> |
|
598 |
|
599 |
|
600 </text> |
|
601 |
|
602 </stránka> |