relpipe-data/examples-xpath-filtering-transforming.xml
author František Kučera <franta-hg@frantovo.cz>
Mon, 21 Feb 2022 01:21:22 +0100
branchv_0
changeset 330 70e7eb578cfa
parent 329 5bc2bb8b7946
permissions -rw-r--r--
Added tag relpipe-v0.18 for changeset 5bc2bb8b7946
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
329
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     1
<stránka
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     2
	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     3
	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     4
	
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     5
	<nadpis>Filtering and transforming relational data with XPath</nadpis>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     6
	<perex>do simple restrictions and projections using a well-established query language</perex>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     7
	<m:pořadí-příkladu>04700</m:pořadí-příkladu>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     8
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     9
	<text xmlns="http://www.w3.org/1999/xhtml">
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    10
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    11
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    12
			In <m:name/> <m:a href="release-v0.18">v0.18</m:a> we got a new powerful language for filtering and transformations: XPath.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    13
			It is now part of the toolset consisting of SQL, AWK, Scheme and others.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    14
			However XPath is originally a language designed for XML, in <m:name/> we can use it for relational data coming from various sources, not only XML,
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    15
			and also for data that violates the rules of normal forms.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    16
			We can process quite complex tree structures entangled in records but we can also write simple and intuitive expressions like <code>x = "a" or y = 123</code>.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    17
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    18
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    19
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    20
		<h2>Basic filtering</h2>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    21
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    22
		<p>Let us have some CSV data:</p>		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    23
		<m:pre jazyk="text" src="examples/film-1.csv"/>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    24
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    25
		<p>That look like this formatted as a table:</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    26
		<m:pre jazyk="text" src="examples/film-1.tabular"/>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    27
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    28
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    29
		<p>Attributes of particular relations are available in XPath under their names, so we can directly reference them in our queries:</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    30
		<m:pre jazyk="bash"><![CDATA[cat relpipe-data/examples/film-1.csv               \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    31
	| relpipe-in-csv --relation "film"             \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    32
	| relpipe-tr-xpath                             \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    33
		--relation '.*'                            \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    34
			--where 'year >= 1980 and year < 1990' \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    35
	| relpipe-out-tabular]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    36
	
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    37
		<p>filtered result:</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    38
		<m:pre jazyk="text" src="examples/film-1.filtered-1.tabular"/>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    39
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    40
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    41
			n.b. If there were any characters that are not valid XML name, they would be escaped in the same way as <code>relpipe-in-*table</code> commands do it
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    42
			i.e. by adding underscores and unicode codepoints of given characters – e.g. the <code>weird:field</code> attribute will be available as <code>weird_3a_field</code> in XPath.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    43
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    44
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    45
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    46
		<h2>Filtering records with tree structures</h2>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    47
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    48
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    49
			The CSV above is not a best example of data modeling.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    50
			Actually, it is quite terrible.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    51
			But in the real world, we often have to deal with such data – either work with them directly or give them some better shape before we start doing our job.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    52
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    53
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    54
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    55
			Usually the best way is to normalize the model – follow the rules of <a href="https://en.wikipedia.org/wiki/Database_normalization#Normal_forms">Normal forms</a>.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    56
			In this case, we would break this denormalized CSV table into several relations:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    57
			<code>film</code>, <code>director</code>, <code>screenwriter</code>…
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    58
			or rather <code>film</code>, <code>role</code>, <code>person</code>, <code>film_person_role</code>…
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    59
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    60
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    61
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    62
			But for now, we will keep the data denormalized and just give them a better and machine-readable structure instead of limited and ambiguous notation of <code>screenwriter = name1 + name2</code>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    63
			(that makes trouble when the value contains certain characters and requires writing a parser for <em>never-specified syntax</em>).
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    64
			So, we will keep some data in classic relational attributes and some in nested XML structure.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    65
			This approach allows us to combine rigid attributes with free-form rich tree structures.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    66
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    67
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    68
		<m:pre jazyk="text" src="examples/film-2.tabular"/>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    69
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    70
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    71
			The <code>relpipe-tr-xpath</code> seamlessly integrates the schema-backed (<code>year</code>) and schema-free (<code>metadata/film</code>) parts of our data model.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    72
			We use the same language syntax and principles for both kinds of attributes:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    73
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    74
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    75
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    76
		<m:pre jazyk="bash"><![CDATA[cat relpipe-data/examples/film-2.csv                                            \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    77
	| relpipe-in-csv --relation "film"                                          \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    78
	| relpipe-tr-xpath                                                          \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    79
		--relation '.*'                                                         \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    80
			--xml-attribute 'metadata'                                          \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    81
			--where 'year = 1986 or metadata/film/screenwriter = "John Hughes"' \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    82
	| relpipe-out-tabular]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    83
	
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    84
		<p>Filtered result:</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    85
	
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    86
		<m:pre jazyk="text" src="examples/film-2.filtered-1.tabular"/>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    87
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    88
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    89
			n.b. In current version, we have to mark the attributes containing XML: <code>--xml-attribute 'metadata'</code>.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    90
			In later versions, there will be a dedicated data type for XML, so these hints will not be necessary.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    91
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    92
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    93
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    94
			This way, we can work with free-form attributes containing multiple values or run various functions on them.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    95
			We can e.g. list films that have more than one screenwriter:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    96
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    97
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    98
		<m:pre jazyk="bash">--where 'count(metadata/film/screenwriter) &gt; 1'</m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    99
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   100
		<p>Well, well… here we are:</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   101
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   102
		<m:pre jazyk="text" src="examples/film-2.filtered-2.tabular"/>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   103
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   104
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   105
			We can also run XPath from SQL queries (<code>relpipe-tr-sql</code>) e.g. in PostgreSQL.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   106
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   107
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   108
		<!--
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   109
		cat relpipe-data/examples/film-2.csv \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   110
			| relpipe-in-csv -\-relation 'film' \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   111
			| relpipe-tr-sql \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   112
				-\-data-source-name myPostgreSQL \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   113
				-\-relation film_1 "SELECT * FROM film WHERE (xpath('count(/film/screenwriter)', metadata::xml))[1]::text::integer > 1" \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   114
				-\-relation film_2 "SELECT * FROM film WHERE (xpath('count(/film/screenwriter) > 1', metadata::xml))[1]::text::boolean" \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   115
			| relpipe-out-tabular
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   116
		-->
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   117
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   118
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   119
		<h2>Adding new attributes and transforming data</h2>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   120
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   121
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   122
			The <code>relpipe-tr-xpath</code> does not only restriction but also projection.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   123
			It can add, remove or modify the attributes while converting the input to the result set.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   124
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   125
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   126
		<m:pre jazyk="bash"><![CDATA[cat relpipe-data/examples/film-2.csv                                                            \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   127
	| relpipe-in-csv --relation "film"                                                          \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   128
	| relpipe-tr-xpath                                                                          \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   129
		--relation '.*'                                                                         \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   130
			--xml-attribute 'metadata'                                                          \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   131
			--output-attribute 'title'              string  'title'                             \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   132
			--output-attribute 'director'           string  'metadata/film/director'            \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   133
			--output-attribute 'screenwriter_count' integer 'count(metadata/film/screenwriter)' \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   134
	| relpipe-out-tabular]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   135
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   136
		<p>We removed some attributes and created new ones:</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   137
		<m:pre jazyk="text" src="examples/film-2.filtered-3.tabular"/>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   138
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   139
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   140
		<p>Or we may concatenate the values:</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   141
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   142
		<m:pre jazyk="bash"><![CDATA[cat relpipe-data/examples/film-2.csv       \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   143
	| relpipe-in-csv                       \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   144
	| relpipe-tr-xpath                     \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   145
		--relation '.*'                    \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   146
			--xml-attribute 'metadata'     \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   147
			--output-attribute 'sentence' string 'concat("The film ", title, " was directed by ", metadata/film/director, " in year ", year, ".")' \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   148
	| relpipe-out-nullbyte | tr \\0 \\n]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   149
		<!-- alias relpipe-out-lines='relpipe-out-nullbyte | tr \\0 \\n' -->
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   150
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   151
		<p>and build some sentences:</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   152
		<m:pre jazyk="text" src="examples/film-2.filtered-4.txt"/>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   153
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   154
		<h2>Exctracting values from multiple XML files</h2>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   155
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   156
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   157
			Input data may come not only from some kind of database or some carefully designed data set,
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   158
			they may be e.g. scattered on our filesystem in some already defined file format never intended for use as a database…
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   159
			despite this fact, we can still collect and query such data in a relational way.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   160
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   161
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   162
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   163
			For example, Maven (a build system for Java) describe its modules in XML format in <code>pom.xml</code> files.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   164
			Using the <code>find</code> and <code>relpipe-in-filesystem</code> we collect them and create a relation containing names and contents of such files:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   165
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   166
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   167
		<m:pre jazyk="bash"><![CDATA[find -type f -name 'pom.xml' -print0                                                    \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   168
	| relpipe-in-filesystem                                                             \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   169
		--relation 'module'                                                             \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   170
		--file path                                                                     \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   171
		--file content                                                                  \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   172
	| relpipe-tr-xpath                                                                  \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   173
		--namespace 'm' 'http://maven.apache.org/POM/4.0.0'                             \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   174
		--relation '.*'                                                                 \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   175
			--xml-attribute 'content'                                                   \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   176
			--output-attribute 'path'        string 'path'                              \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   177
			--output-attribute 'group_id'    string 'content/m:project/m:groupId'       \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   178
			--output-attribute 'artifact_id' string 'content/m:project/m:artifactId'    \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   179
			--output-attribute 'version'     string 'content/m:project/m:version'       \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   180
	| relpipe-out-tabular]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   181
		<!-- see also relpipe-in-filesystem -\-streamlet xpath -->
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   182
	
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   183
		<p>Then we extract desired values using <code>relpipe-tr-xpath</code> and get:</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   184
		<m:pre jazyk="text" src="examples/xpath-maven-1.tabular"/>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   185
	
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   186
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   187
			This way we can harvest useful values from XML files – and not only XML files, also from various alternative formats, after we convert them (on-the-fly) to XML.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   188
			Such conversions are already available for formats like <m:a href="examples-reading-querying-uniform-way">INI, ASN.1, MIME, HTML JSON, YAML etc.</m:a>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   189
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   190
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   191
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   192
		<h2>Post scriptum</h2>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   193
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   194
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   195
			The abovementioned combination of classic relational attributes with free-form XML structures is definitely not a design of first choice.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   196
			But sometimes it makes sense and sometimes we have to work with data not designed by us and need some tools to deal with them.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   197
			When we are designing the data model ourselves, we should always pursue the normalized form …and break the rules only if we have really good reason to do so.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   198
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   199
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   200
	</text>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   201
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   202
</stránka>