relpipe-data/release-v0.16.xml
author František Kučera <franta-hg@frantovo.cz>
Mon, 21 Feb 2022 00:43:11 +0100
branchv_0
changeset 329 5bc2bb8b7946
parent 299 dd7aeff5ef0c
permissions -rw-r--r--
Release v0.18

<stránka
	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
	
	<nadpis>Release v0.16</nadpis>
	<perex>new public release of Relational pipes</perex>
	<m:release>v0.16</m:release>

	<text xmlns="http://www.w3.org/1999/xhtml">
		<p>
			We are pleased to introduce you the new development version of <m:name/>.
			This release brings an abstraction layer (ODBC) in the SQL transformation + several smaller improvements.
		</p>
		
		<ul>
			<li>
				<strong>ODBC in the <code>relpipe-tr-sql</code> module</strong>: see details below</li>
			<li>
				<strong>new input <code>relpipe-in-jack</code> module</strong>: see details below</li>
			<li>
				<strong>keyboard shortcuts in the <code>relpipe-out-gui</code> module</strong>: use Ctrl+PgUp and Ctrl+PgDown to switch panels (relations) and Ctrl+Q to quit</li>
			<li>
				<strong>record count in the <code>relpipe-out-xhtml</code> command</strong>: number of records is printed under the table (this command part of <code>relpipe-out-xml</code>, not a standalone module)</li>
		</ul>
		
		<p>
			See the <m:a href="examples">examples</m:a> and <m:a href="screenshots">screenshots</m:a> pages for details.
		</p>
		
		<p>
			Please note that this is still a development release and thus the API (libraries, CLI arguments, formats) might and will change.
			Any suggestions, ideas and bug reports are welcome in our <m:a href="contact">mail box</m:a>.
		</p>
		
		<h2>ODBC in the SQL transformation module</h2>
		
		<p>
			Former versions of <code>relpipe-tr-sql</code> were tied to <a href="https://sqlite.org/">SQLite</a>
			and user had no option to change the <i>SQL engine</i>.
			However great SQLite is (and we are very thankful for it), having some particular DBMS (database management system) hard-coded in our program is too constraining.
			So we added an abstraction layer (ODBC) and get rid of the direct dependency on SQLite.
			Now any DBMS can be used with <m:name/>.
		</p>
		
		<p>
			ODBC (Open Database Connectivity) is an industry standard and provides API for accessing a DBMS.
			In late 80s several vendors (mostly from the Unix and database communities) established the SQL Access Group (SAG)
			and then specified the Call Level Interface (CLI). ODBC, which is based on CLI, was published in early 90s.
			ODBC is available on many operating systems and there are at least two free software implementations:
			<a href="http://www.unixodbc.org/">unixODBC</a> and <a href="http://www.iodbc.org/">iODBC</a>.
			We use unixODBC for development and testing.
			Future releases of <m:name/> should be tested also with other implementations and various database drivers.
		</p>
		
		<p>
			SQLite remains the default option
			(in the C++ implementation, while Java or other implementations may have different default and may use different abstraction layer like JDBC).
			We count on SQLite for future releases. It is the simplest way to get full SQL power in your relational pipeline.
			However, <code>relpipe-tr-sql</code> do not depend on SQLite and can be installed without it (and then used e.g. with PostgreSQL driver).
			Using different DBMS makes sense for two main reasons:
		</p>
		
		<ul>
			<li>
				We need specific features provided by the DBMS.
				It might be e.g. some functions for XML processing or some advanced SQL language constructs.
				Or maybe we have some business logic already implemented as SQL functions in e.g. PostgreSQL
				– now we can access this logic from our pipelines or seamlessly integrate it in our shell:
				<m:pre jazyk="bash"><![CDATA[cat source-data.csv | relpipe-in-csv \
	| relpipe-tr-sql \
		--data-source-name "MyDatabaseServer" \
		--relation "transformed_data" "
			SELECT
				some_csv_field AS id,
				our_special_function(some_other, third_one) AS result
			FROM csv" \
	| relpipe-out-xml \
	| xsltproc template.xsl - > some-fancy-report.xhtml
	# or just: | relpipe-out-xhtml > some-generic-report.xhtml]]></m:pre>
			</li>
			<li>
				We need access to data in an existing database.
				The <code>relpipe-tr-sql</code> and <code>relpipe-in-sql</code> can be used as a generic database clients
				and are able to load relational data to and from any DBMS.
				We can also write a pipeline to transfer data between two different DBMS, do some ETL (extract, transform, load) tasks
				or just cache some result sets from a remote database in our local SQLite file.
				We can cache e.g. some codelist tables or other data for offline use:
				<m:pre jazyk="bash"><![CDATA[relpipe-in-sql \
	--data-source-name "MyCompanyDatabase" \
	--relation "country"       "SELECT * FROM country" \
	--relation "currency"      "SELECT * FROM currency" \
	--relation "exchange_rate" "SELECT * FROM exchange_rate WHERE …" \
	--relation "phonebook"     "SELECT * FROM phonebook" \
	| relpipe-tr-sql \
		--data-source-string 'Driver=SQLite3;Database=file:MyCachedCompanyData.sqlite']]></m:pre>
				In previous versions, we needed <a href="https://sql-dk.globalcode.info/">SQL-DK</a> for this scenario,
				now it is possible solely in <m:name/> without any other tools.
				But SQL-DK is still useful – especially if we have a JDBC driver but do not have an ODBC one
				(JDBC drivers and Java are also much more portable).
			</li>
		</ul>
		
		<p>
			n.b. However it still looks like executing a local command, we should be aware that while using a remote data source,
			our data travel to given remote server – this impacts performance and our privacy.
			Never use untrustworthy remote server for processing sensitive data (even if using just a temporary schema or tables).
			If SQLite is „too small“ then PostgreSQL installed on <i>localhost</i> is usually a good option.
		</p>
		
		<p>
			There are ODBC drivers for any conceivable database system.
			We can also write a custom driver for any other resource and just plug it in <m:name/>
			without recompiling (a driver is a shared library – simply an <code>.so</code> file).
		</p>
		
		<p>
			This release also comes with better diagnostics. This feature is not specific to ODBC, but was implemented during the rewrite of the database layer.
			So if we make a mistake in our query or try to create a table with the same name as already exists in the DB, we will get a useful message with detailed description of the problem
			(instead of a pointless failure notice in the previous version).
		</p>
		
		<p>
			The new implementation of <code>relpipe-tr-sql</code> is still a bit <i>raw</i> and will be tuned in the upcoming versions,
			but it seems working quite well (with SQLite, PostgreSQL and MySQL on GNU/Linux).
			As always, testers are welcomed.
		</p>
		
		<p>
			More details in the example: <m:a href="examples-tr-sql-odbc">Accessing SQLite, PostgreSQL and MySQL through ODBC</m:a>.
		</p>
		
		<h2>JACK (MIDI) input module</h2>
		
		<p>
			A powerful audio system called <a href="https://jackaudio.org/">JACK</a> allows us to
			build pipelines consisting of audio interfaces, players, recorders, filters and effects…
			and route sound streams (both PCM and MIDI) through them.
			MIDI messages can come from keyboards or other hardware MIDI controllers or from MIDI players and other software.
			Sometimes it is useful to check what is happening under the hood and examine particular MIDI messages
			instead of just playing them on a sound module or synthesizer.
			Now we can bridge two seemingly unrelated worlds: real-time audio and relational pipes.
		</p>
		
		<m:img src="img/jack-connections-1.png"/>
		
		<p>
			We can join the JACK graph with <code>relpipe-in-jack</code> command.
			It does not consume STDIN, it gets events from JACK instead, so no other input data are needed.
		</p>
		
		<p>
			More details in the example: <m:a href="examples-jack-midi-monitoring">Monitoring MIDI messages using JACK</m:a>.
		</p>
		
		<h2>Feature overview</h2>
		
		<h3>Data types</h3>
		<ul>
			<li m:since="v0.8">boolean</li>
			<li m:since="v0.15">variable-length signed integer (SLEB128)</li>
			<li m:since="v0.8">string in UTF-8</li>
		</ul>
		<h3>Inputs</h3>
		<ul>
			<li m:since="v0.11">Recfile</li>
			<li m:since="v0.9">XML</li>
			<li m:since="v0.13">XMLTable</li>
			<li m:since="v0.9">CSV</li>
			<li m:since="v0.9">file system</li>
			<li m:since="v0.8">CLI</li>
			<li m:since="v0.8">fstab</li>
			<li m:since="v0.14">SQL script</li>
			<li m:since="v0.16">JACK</li>
		</ul>
		<h3>Transformations</h3>
		<ul>
			<li m:since="v0.13">sql: filtering and transformations using the SQL language</li>
			<li m:since="v0.12">awk: filtering and transformations using the classic AWK tool and language</li>
			<li m:since="v0.10">guile: filtering and transformations defined in the Scheme language using GNU Guile</li>
			<li m:since="v0.8">grep: regular expression filter, removes unwanted records from the relation</li>
			<li m:since="v0.8">cut: regular expression attribute cutter (removes or duplicates attributes and can also DROP whole relation)</li>
			<li m:since="v0.8">sed: regular expression replacer</li>
			<li m:since="v0.8">validator: just a pass-through filter that crashes on invalid data</li>
			<li m:since="v0.8">python: highly experimental</li>
		</ul>
		<h3>Streamlets</h3>
		<ul>
			<li m:since="v0.15">xpath (example, unstable)</li>
			<li m:since="v0.15">hash (example, unstable)</li>
			<li m:since="v0.15">jar_info (example, unstable)</li>
			<li m:since="v0.15">mime_type (example, unstable)</li>
			<li m:since="v0.15">exiftool (example, unstable)</li>
			<li m:since="v0.15">pid (example, unstable)</li>
			<li m:since="v0.15">cloc (example, unstable)</li>
			<li m:since="v0.15">exiv2 (example, unstable)</li>
			<li m:since="v0.15">inode (example, unstable)</li>
			<li m:since="v0.15">lines_count (example, unstable)</li>
			<li m:since="v0.15">pdftotext (example, unstable)</li>
			<li m:since="v0.15">pdfinfo (example, unstable)</li>
			<li m:since="v0.15">tesseract (example, unstable)</li>
		</ul>
		<h3>Outputs</h3>
		<ul>
			<li m:since="v0.11">ASN.1 BER</li>
			<li m:since="v0.11">Recfile</li>
			<li m:since="v0.9">CSV</li>
			<li m:since="v0.8">tabular</li>
			<li m:since="v0.8">XML</li>
			<li m:since="v0.8">nullbyte</li>
			<li m:since="v0.8">GUI in Qt</li>
			<li m:since="v0.8">ODS (LibreOffice)</li>
		</ul>
		
		<h2>New examples</h2>
		<ul>
			<li><m:a href="examples-tr-sql-odbc">Accessing SQLite, PostgreSQL and MySQL through ODBC</m:a></li>
			<li><m:a href="examples-jack-midi-monitoring">Monitoring MIDI messages using JACK</m:a></li>
		</ul>
		
		<h2>Backward incompatible changes</h2>
		
		<p>
			The options <code>--file</code> and <code>--file-keep</code> in <code>relpipe-tr-sql</code> (and <code>relpipe-in-sql</code>, which is an alias for the same binary)
			have been dropped.
			These options were specific to SQLite and make no sense now, when we do not depend on particular DBMS and can use any <i>engine</i> for SQL processing
			(even a remote one somewhere on the network that could not reach our local files).
			However SQLite is still the default option and the:
		</p>
		
		<m:pre jazyk="bash">relpipe-tr-sql --file 'myDatabase.sqlite'</m:pre>
		
		<p>can be simply replaced by:</p>
		
		<m:pre jazyk="bash">relpipe-tr-sql --data-source-string 'Driver=SQLite3;Database=file:myDatabase.sqlite'</m:pre>
		
		<p>
			Bash-completion works and will suggest even the <code>Driver=SQLite3;Database=file:</code> part, so it is not necessary to memorize the connection string.
			Frequently used databases can be configured in the <code>~/.odbc.ini</code> file and then referenced just by their names using <code>--data-source-name</code>
			(the data source names – DSN – are also suggested by Bash-completion).
		</p>
		
		<p>
			There is no built-in replacement for the <code>--file-keep</code> option.
			But if the user wants to create a temporary file and delete it at the end of the transformation,
			he can simply add <code>rm -f myDatabase.sqlite</code> to his script.
		</p>
		
		<h2>Installation</h2>
		
		<p>
			Instalation was tested on Debian GNU/Linux 10.2.
			The process should be similar on other distributions.
		</p>
		
		<m:pre src="examples/release-v0.16.sh" jazyk="bash" odkaz="ano"/>
		
		<p>
			<m:name/> are modular thus you can download and install only parts you need (the libraries are needed always).
			Tools <code>out-gui.qt</code> and <code>tr-python</code> require additional libraries and are not built by default.
		</p>
		
	</text>

</stránka>