relpipe-data/examples-tr-sql-odbc.xml
branchv_0
changeset 297 192b0059a6c4
child 300 b9bd0f06b4a1
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/relpipe-data/examples-tr-sql-odbc.xml	Sat Jun 06 01:57:24 2020 +0200
@@ -0,0 +1,286 @@
+<stránka
+	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
+	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
+	
+	<nadpis>Accessing SQLite, PostgreSQL and MySQL through ODBC</nadpis>
+	<perex>use various DBMS for SQL transformations or data access</perex>
+	<m:pořadí-příkladu>04200</m:pořadí-příkladu>
+
+	<text xmlns="http://www.w3.org/1999/xhtml">
+		
+		<p>
+			Since <m:a href="release-v0.16">v0.16</m:a> the <code>relpipe-tr-sql</code> module
+			uses the ODBC abstraction layer and thus we can access data in any DBMS (database management system).
+			Our program depends only on the generic API and the driver for particular DBMS is loaded dynamically depending on the configuration.
+		</p>
+		
+		<blockquote>
+			<p>
+				ODBC (Open Database Connectivity) is an industry standard and provides API for accessing a DBMS.
+				In late 80s several vendors (mostly from the Unix and database communities) established the SQL Access Group (SAG)
+				and then specified the Call Level Interface (CLI). ODBC, which is based on CLI, was published in early 90s.
+				ODBC is available on many operating systems and there are at least two free software implementations:
+				<a href="http://www.unixodbc.org/">unixODBC</a> and <a href="http://www.iodbc.org/">iODBC</a>.
+			</p>
+		</blockquote>
+		
+		<p>For more information see the <m:a href="release-v0.16">v0.16 release notes</m:a>.</p>
+		
+		<h2>General concepts and configuration</h2>
+		
+		<p>
+			<strong>ODBC</strong>:
+			the API consisting of C functions; see the files <code>sql.h</code> and <code>sqlext.h</code> e.g. in unixODBC.
+		</p>
+		<p>
+			<strong>Database driver</strong>:
+			a shared library (an <code>.so</code> file) 
+			that implements the API and connects to particular DBMS (SQLite, PostgreSQL, MySQL, MariaDB, Firebird etc.);
+			is usually provided by the authors of given DBMS, sometimes writen by a third-party
+		</p>
+		<p>
+			<strong>Client</strong>:
+			a program that calls the API in order to access a database; our <code>relpipe-tr-sql</code> is a client
+		</p>
+		<p>
+			<strong>Data Source Name (DSN)</strong>:
+			the name of a preconfigured data source – when connecting, we need to know only the DSN – all parameters
+			(like server name, user name, password etc.) can be then looked-up in the configuration
+		</p>
+		<p>
+			<strong>Connection string</strong>:
+			a text string consisting of serialized parameters needed for connecting
+			– we can specify all parameters ad-hoc in the connection string without creating any permanent configuration;
+			a connection string can also refer to a DSN and add or override some parameters
+		</p>
+
+		<p>
+			There is some global configuration in the <code>/etc</code> directory.
+			In <code>/etc/odbcinst.ini</code> we can a find list of ODBC drivers.
+			Thanks to it, we can refer to a driver by its name (e.g. <code>SQLite3</code>)
+			instead of the path to the shared library (e.g. <code>/usr/lib/x86_64-linux-gnu/odbc/libsqlite3odbc.so</code>).
+			In <code>/etc/odbc.ini</code> we can find a list of global (for given computer) data sources.
+			It is uncommon to put complete configurations in this file, because anyone would be able to read the passwords,
+			but we can provide here just a <i>template</i> with public parameters like server name, port etc.
+			and user will supply his own user name and password in the connection string or in his personal configuration file.
+		</p>
+		
+		<p>
+			The <code>~/.odbc.ini</code> contains personal configuration of given user.
+			There are usually data sources including the passwords.
+			Thus this file must be readable only by given user (<code>chmod 600 ~/.odbc.ini</code>).
+			Providing passwords in connection strings passed as CLI arguments is not a good practice due to security reasons:
+			by default it is stored in the shell history and it is also visible to other users of the same machine in the list of running processes.
+		</p>
+		
+		<p>
+			The section name – in the <code>[]</code> brackets – is the DSN.
+			Then there are parameters in form of <code>key=value</code> on each line.
+		</p>
+		
+		
+		<h2>CLI options</h2>
+		
+		<p>
+			The <code>relpipe-tr-sql</code> and <code>relpipe-in-sql</code> support these relevant CLI options:
+		</p>
+		
+		<ul>
+			<li>
+				<code>--list-data-sources</code>:
+				lists available (configured) data sources in relational format (so we pipe the output to some output filter e.g. to <code>relpipe-out-tabular</code>)
+			</li>
+			<li>
+				<code>--data-source-name</code>:
+				specifies the DSN of a configured data source
+			</li>
+			<li>
+				<code>--data-source-string</code>:
+				specifies the connections string for ad-hoc connection without need of any configuration
+			</li>
+		</ul>
+		
+		<pre><![CDATA[$ relpipe-tr-sql --list-data-sources | relpipe-out-tabular 
+data_source:
+ ╭───────────────┬──────────────────────╮
+ │ name (string) │ description (string) │
+ ├───────────────┼──────────────────────┤
+ │ sqlite-memory │ SQLite3              │
+ │ relpipe       │ PostgreSQL Unicode   │
+ ╰───────────────┴──────────────────────╯
+Record count: 2]]></pre>
+
+		<p>
+			Because output of this command is relational, we can further process it in our relational pipelines.
+			This output is also used for the Bash-completion for suggesting the DSN.
+		</p>
+		
+		<p>
+			If neither <code>--data-source-name</code> nor <code>--data-source-string</code> option is provided,
+			a temporary in-memory SQLite database is used as default.
+		</p>
+		
+		<h2>SQLite</h2>
+		
+		<p>In Debian GNU/Linux and similar distributions we can install <a href="https://sqlite.org/">SQLite</a> ODBC driver by this command:</p>
+		
+		<pre>apt install libsqliteodbc</pre>
+		
+		<p>Which also installs the SQLite library that is all we need (because SQLite is a <i>serverless and self-contained</i> database).</p>
+		
+		<p>
+			Then we can use the default in-memory temporary database or specify the connection string ad-hoc, 
+			<m:a href="examples-in-sql-selecting-existing-database">access existing SQLite databases</m:a>
+			or <m:a href="examples-in-filesystem-tr-sql-indexing">create new ones</m:a>	– e.g. this command:
+		</p>
+		
+		<pre>… | relpipe-tr-sql --data-source-string 'Driver=SQLite3;Database=file:MyDatabase.sqlite'</pre>
+		
+		<p>will create the <code>MyDatabase.sqlite</code> file and fill it with relations that came from STDIN.</p>
+		
+		<p>For frequently used databases it is convenient to configure a data source in <code>~/.odbc.ini</code>:</p>
+		
+		<m:pre jazyk="ini"><![CDATA[[MyDatabase]
+Driver=SQLite3
+Database=file:/home/hacker/MyDatabase.sqlite]]></m:pre>
+
+		<p>
+			and then connect to it simply using <code>--data-source-name MyDatabase</code>
+			(both the option and the name will be suggested by Bash-completion).
+		</p>
+		
+		<p>
+			The <a href="http://www.ch-werner.de/sqliteodbc/html/index.html">SQLite ODBC driver</a> supports several parameters that are described in its documentation.
+			One of them is <code>LoadExt</code> that loads SQLite extensions:
+		</p>
+		
+		<m:pre jazyk="ini"><![CDATA[LoadExt=/home/hacker/libdemo.so]]></m:pre>
+		
+		<p>
+			So we can write our own SQLite extension with custom functions or other features 
+			(<a href="https://blog.frantovo.cz/c/383/Komplexita%20softwaru%3A%20%C5%98e%C5%A1en%C3%AD%20a%C2%A0prevence#toc_sqlite">example</a>)
+			or chose some existing one and load it into the SQLite connected through ODBC.
+		</p>
+
+		
+		<h2>PostgreSQL</h2>
+		
+		<p>In Debian GNU/Linux and similar distributions we can install <a href="https://www.postgresql.org/">PostgreSQL</a> ODBC driver by this command:</p>
+		
+		<pre>apt install odbc-postgresql</pre>
+		
+		<p>
+			PostgreSQL is very powerful DBMS (probably most advanced free software relational database system)
+			and utilizes the client-server architecture.
+			This means that we also needs a server (can be also installed through <code>apt</code> like the driver).
+		</p>
+		
+		<p>
+			Once we have a server – remote or local – we need to create a user (role).
+			For SQL transformations we configure a dedicated role that has no persistent schema and uses the temporary one as default,
+			which means that all relations we create are lost at the end of the session (when the <code>relpipe-tr-sql</code> command finishes),
+			thus it behaves very similar to the SQLite in-memory database.
+		</p>
+		
+		<m:pre jazyk="sql"><![CDATA[CREATE USER relpipe WITH PASSWORD 'someSecretPassword';
+ALTER ROLE relpipe SET search_path TO 'pg_temp';]]></m:pre>
+
+		<p>
+			And then we <a href="https://odbc.postgresql.org/docs/config.html">configure</a> the ODBC data source:
+		</p>
+
+		<m:pre jazyk="ini"><![CDATA[[postgresql-temp]
+Driver=PostgreSQL Unicode
+Database=postgres
+Servername=localhost
+Port=5432
+Username=relpipe
+Password=someSecretPassword]]></m:pre>
+
+		<p>
+			Now we can use advanced PostgreSQL features for transforming data in our pipelines.
+			We can also configure a DSN for another database that contains some useful data and other database objects, 
+			call existing business functions installed in such database, load data to or from this DB etc.
+		</p>
+		
+		
+		<h2>MySQL</h2>
+		
+		<p>
+			If the <code>libmyodbc</code> package is missing in our distribution,
+			the	ODBC driver for <a href="https://dev.mysql.com/downloads/connector/odbc/">MySQL</a> can be downloaded from their website.
+			We can get a binary package (<code>.deb</code>, <code>.rpm</code> etc.) or source code.
+			If we are compiling from sources, we do something like this:
+		</p>
+		
+		<m:pre jazyk="bash"><![CDATA[cd mysql-connector-odbc-*-src/
+mkdir build
+cd build
+cmake ../ -DWITH_UNIXODBC=1
+make]]></m:pre>
+
+		<p>
+			We should use the driver in the same or similar version as the MySQL client library installed on our system.
+			For example 8.x driver will not work with 5.x library.
+			Successful compilation results in <code>libmyodbc*.so</code> files.
+		</p>
+		
+		<p>
+			Like PostgreSQL, also MySQL is a client-server,
+			so we need a server where we create a database and some user account.
+			As root through the <code>mysql mysql</code> command we execute:
+		</p>
+		
+		<m:pre jazyk="sql"><![CDATA[CREATE DATABASE relpipe CHARACTER SET = utf8;
+CREATE USER 'relpipe'@'localhost' IDENTIFIED BY 'someSecretPassword';
+GRANT ALL PRIVILEGES ON relpipe.* TO 'relpipe'@'localhost';
+FLUSH PRIVILEGES;]]></m:pre>
+
+		<p>As a normal user we add new data source to our <code>~/.odbc.ini</code> file:</p>
+
+		<m:pre jazyk="ini"><![CDATA[[mysql-relpipe-localhost]
+Driver=/home/hacker/src/mysql/build/lib/libmyodbc5w.so
+Server=localhost
+Port=3306
+Socket=/var/run/mysqld/mysqld.sock
+User=relpipe
+Password=someSecretPassword
+Database=relpipe
+InitStmt=SET SQL_MODE=ANSI_QUOTES;
+Charset=utf8]]></m:pre>
+
+		<p>
+			See that we have compiled the ODBC driver in our home directory
+			and even without installing it anywhere and registering it in the <code>/etc/odbcinst.ini</code> file,
+			we can simply refer to the <code>.so</code> file from our <code>~/.odbc.ini</code>.
+		</p>
+
+		<p>
+			If we set <code>Server=localhost</code>, the client-server communication does not go through TCP/IP
+			but rather through the unix domain socket specified in the <code>Socket</code> field.
+			If we set <code>Server=127.0.0.1</code> or some remote IP address or domain name, the communication goes through TCP/IP on given port.
+		</p>
+		
+		<p>
+			The <code>SET SQL_MODE=ANSI_QUOTES;</code> init statement is important,
+			because it tells MySQL server that it should support standard SQL "quoted" identifiers
+			instead of that `weird` MySQL style.
+			We use the standard SQL while creating the tables.
+		</p>
+		
+		<p>
+			There are many other parameters, quite well 
+			<a href="https://dev.mysql.com/doc/connector-odbc/en/connector-odbc-configuration-connection-parameters.html">documented</a>.
+		</p>
+		
+		<p>
+			Now we can use MySQL as the <i>SQL engine</i> for transformations in our pipelines
+			and we can also access existing MySQL databases,
+			load data to and from them
+			or call functions and procedures installed on the server. 
+		</p>
+		
+		
+	</text>
+
+</stránka>