relpipe-data/examples-tr-sql-odbc.xml
author František Kučera <franta-hg@frantovo.cz>
Sun, 09 May 2021 19:43:32 +0200
branchv_0
changeset 324 3cbce8bb28c3
parent 300 b9bd0f06b4a1
permissions -rw-r--r--
relpipe-tr-sql and relpipe-in-sql --list-data-sources option has boolean parameter since v0.18

<stránka
	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
	
	<nadpis>Accessing SQLite, PostgreSQL and MySQL through ODBC</nadpis>
	<perex>use various DBMS for SQL transformations or data access</perex>
	<m:pořadí-příkladu>04200</m:pořadí-příkladu>

	<text xmlns="http://www.w3.org/1999/xhtml">
		
		<p>
			Since <m:a href="release-v0.16">v0.16</m:a> the <code>relpipe-tr-sql</code> module
			uses the ODBC abstraction layer and thus we can access data in any DBMS (database management system).
			Our program depends only on the generic API and the driver for particular DBMS is loaded dynamically depending on the configuration.
		</p>
		
		<blockquote>
			<p>
				ODBC (Open Database Connectivity) is an industry standard and provides API for accessing a DBMS.
				In late 80s several vendors (mostly from the Unix and database communities) established the SQL Access Group (SAG)
				and then specified the Call Level Interface (CLI). ODBC, which is based on CLI, was published in early 90s.
				ODBC is available on many operating systems and there are at least two free software implementations:
				<a href="http://www.unixodbc.org/">unixODBC</a> and <a href="http://www.iodbc.org/">iODBC</a>.
			</p>
		</blockquote>
		
		<p>For more information see the <m:a href="release-v0.16">v0.16 release notes</m:a>.</p>
		
		<h2>General concepts and configuration</h2>
		
		<p>
			<strong>ODBC</strong>:
			the API consisting of C functions; see the files <code>sql.h</code> and <code>sqlext.h</code> e.g. in unixODBC.
		</p>
		<p>
			<strong>Database driver</strong>:
			a shared library (an <code>.so</code> file) 
			that implements the API and connects to particular DBMS (SQLite, PostgreSQL, MySQL, MariaDB, Firebird etc.);
			is usually provided by the authors of given DBMS, sometimes writen by a third-party
		</p>
		<p>
			<strong>Client</strong>:
			a program that calls the API in order to access a database; our <code>relpipe-tr-sql</code> is a client
		</p>
		<p>
			<strong>Data Source Name (DSN)</strong>:
			the name of a preconfigured data source – when connecting, we need to know only the DSN – all parameters
			(like server name, user name, password etc.) can be then looked-up in the configuration
		</p>
		<p>
			<strong>Connection string</strong>:
			a text string consisting of serialized parameters needed for connecting
			– we can specify all parameters ad-hoc in the connection string without creating any permanent configuration;
			a connection string can also refer to a DSN and add or override some parameters
		</p>

		<p>
			There is some global configuration in the <code>/etc</code> directory.
			In <code>/etc/odbcinst.ini</code> we can a find list of ODBC drivers.
			Thanks to it, we can refer to a driver by its name (e.g. <code>SQLite3</code>)
			instead of the path to the shared library (e.g. <code>/usr/lib/x86_64-linux-gnu/odbc/libsqlite3odbc.so</code>).
			In <code>/etc/odbc.ini</code> we can find a list of global (for given computer) data sources.
			It is uncommon to put complete configurations in this file, because anyone would be able to read the passwords,
			but we can provide here just a <i>template</i> with public parameters like server name, port etc.
			and user will supply his own user name and password in the connection string or in his personal configuration file.
		</p>
		
		<p>
			The <code>~/.odbc.ini</code> contains personal configuration of given user.
			There are usually data sources including the passwords.
			Thus this file must be readable only by given user (<code>chmod 600 ~/.odbc.ini</code>).
			Providing passwords in connection strings passed as CLI arguments is not a good practice due to security reasons:
			by default it is stored in the shell history and it is also visible to other users of the same machine in the list of running processes.
		</p>
		
		<p>
			The section name – in the <code>[]</code> brackets – is the DSN.
			Then there are parameters in form of <code>key=value</code> on each line.
		</p>
		
		
		<h2>CLI options</h2>
		
		<p>
			The <code>relpipe-tr-sql</code> and <code>relpipe-in-sql</code> support these relevant CLI options:
		</p>
		
		<ul>
			<li>
				<code>--list-data-sources</code>:
				whether to
				lists available (configured) data sources in relational format (so we pipe the output to some output filter e.g. to <code>relpipe-out-tabular</code>)
			</li>
			<li>
				<code>--data-source-name</code>:
				specifies the DSN of a configured data source
			</li>
			<li>
				<code>--data-source-string</code>:
				specifies the connections string for ad-hoc connection without need of any configuration
			</li>
		</ul>
		
		<pre><![CDATA[$ relpipe-tr-sql --list-data-sources true | relpipe-out-tabular 
data_source:
 ╭───────────────┬──────────────────────╮
 │ name (string) │ description (string) │
 ├───────────────┼──────────────────────┤
 │ sqlite-memory │ SQLite3              │
 │ relpipe       │ PostgreSQL Unicode   │
 ╰───────────────┴──────────────────────╯
Record count: 2]]></pre>

		<p>
			Because output of this command is relational, we can further process it in our relational pipelines.
			This output is also used for the Bash-completion for suggesting the DSN.
		</p>
		
		<p>
			If neither <code>--data-source-name</code> nor <code>--data-source-string</code> option is provided,
			a temporary in-memory SQLite database is used as default.
		</p>
		
		<h2>SQLite</h2>
		
		<p>In Debian GNU/Linux and similar distributions we can install <a href="https://sqlite.org/">SQLite</a> ODBC driver by this command:</p>
		
		<pre>apt install libsqliteodbc</pre>
		
		<p>Which also installs the SQLite library that is all we need (because SQLite is a <i>serverless and self-contained</i> database).</p>
		
		<p>
			Then we can use the default in-memory temporary database or specify the connection string ad-hoc, 
			<m:a href="examples-in-sql-selecting-existing-database">access existing SQLite databases</m:a>
			or <m:a href="examples-in-filesystem-tr-sql-indexing">create new ones</m:a>	– e.g. this command:
		</p>
		
		<pre>… | relpipe-tr-sql --data-source-string 'Driver=SQLite3;Database=file:MyDatabase.sqlite'</pre>
		
		<p>will create the <code>MyDatabase.sqlite</code> file and fill it with relations that came from STDIN.</p>
		
		<p>For frequently used databases it is convenient to configure a data source in <code>~/.odbc.ini</code>:</p>
		
		<m:pre jazyk="ini"><![CDATA[[MyDatabase]
Driver=SQLite3
Database=file:/home/hacker/MyDatabase.sqlite]]></m:pre>

		<p>
			and then connect to it simply using <code>--data-source-name MyDatabase</code>
			(both the option and the name will be suggested by Bash-completion).
		</p>
		
		<p>
			The <a href="http://www.ch-werner.de/sqliteodbc/html/index.html">SQLite ODBC driver</a> supports several parameters that are described in its documentation.
			One of them is <code>LoadExt</code> that loads SQLite extensions:
		</p>
		
		<m:pre jazyk="ini"><![CDATA[LoadExt=/home/hacker/libdemo.so]]></m:pre>
		
		<p>
			So we can write our own SQLite extension with custom functions or other features 
			(<a href="https://blog.frantovo.cz/c/383/Komplexita%20softwaru%3A%20%C5%98e%C5%A1en%C3%AD%20a%C2%A0prevence#toc_sqlite">example</a>)
			or chose some existing one and load it into the SQLite connected through ODBC.
		</p>

		
		<h2>PostgreSQL</h2>
		
		<p>In Debian GNU/Linux and similar distributions we can install <a href="https://www.postgresql.org/">PostgreSQL</a> ODBC driver by this command:</p>
		
		<pre>apt install odbc-postgresql</pre>
		
		<p>
			PostgreSQL is very powerful DBMS (probably most advanced free software relational database system)
			and utilizes the client-server architecture.
			This means that we also needs a server (can be also installed through <code>apt</code> like the driver).
		</p>
		
		<p>
			Once we have a server – remote or local – we need to create a user (role).
			For SQL transformations we configure a dedicated role that has no persistent schema and uses the temporary one as default,
			which means that all relations we create are lost at the end of the session (when the <code>relpipe-tr-sql</code> command finishes),
			thus it behaves very similar to the SQLite in-memory database.
		</p>
		
		<m:pre jazyk="sql"><![CDATA[CREATE USER relpipe WITH PASSWORD 'someSecretPassword';
ALTER ROLE relpipe SET search_path TO 'pg_temp';]]></m:pre>

		<p>
			And then we <a href="https://odbc.postgresql.org/docs/config.html">configure</a> the ODBC data source:
		</p>

		<m:pre jazyk="ini"><![CDATA[[postgresql-temp]
Driver=PostgreSQL Unicode
Database=postgres
Servername=localhost
Port=5432
Username=relpipe
Password=someSecretPassword]]></m:pre>

		<p>
			Now we can use advanced PostgreSQL features for transforming data in our pipelines.
			We can also configure a DSN for another database that contains some useful data and other database objects, 
			call existing business functions installed in such database, load data to or from this DB etc.
		</p>
		
		
		<h2>MySQL</h2>
		
		<p>
			If the <code>libmyodbc</code> package is missing in our distribution,
			the	ODBC driver for <a href="https://dev.mysql.com/downloads/connector/odbc/">MySQL</a> can be downloaded from their website.
			We can get a binary package (<code>.deb</code>, <code>.rpm</code> etc.) or source code.
			If we are compiling from sources, we do something like this:
		</p>
		
		<m:pre jazyk="bash"><![CDATA[cd mysql-connector-odbc-*-src/
mkdir build
cd build
cmake ../ -DWITH_UNIXODBC=1
make]]></m:pre>

		<p>
			We should use the driver in the same or similar version as the MySQL client library installed on our system.
			For example 8.x driver will not work with 5.x library.
			Successful compilation results in <code>libmyodbc*.so</code> files.
		</p>
		
		<p>
			Like PostgreSQL, also MySQL is a client-server,
			so we need a server where we create a database and some user account.
			As root through the <code>mysql mysql</code> command we execute:
		</p>
		
		<m:pre jazyk="sql"><![CDATA[CREATE DATABASE relpipe CHARACTER SET = utf8;
CREATE USER 'relpipe'@'localhost' IDENTIFIED BY 'someSecretPassword';
GRANT ALL PRIVILEGES ON relpipe.* TO 'relpipe'@'localhost';
FLUSH PRIVILEGES;]]></m:pre>

		<p>As a normal user we add new data source to our <code>~/.odbc.ini</code> file:</p>

		<m:pre jazyk="ini"><![CDATA[[mysql-relpipe-localhost]
Driver=/home/hacker/src/mysql/build/lib/libmyodbc5a.so
Server=localhost
Port=3306
Socket=/var/run/mysqld/mysqld.sock
User=relpipe
Password=someSecretPassword
Database=relpipe
InitStmt=SET SQL_MODE=ANSI_QUOTES;
Charset=utf8]]></m:pre>

		<p>
			See that we have compiled the ODBC driver in our home directory
			and even without installing it anywhere and registering it in the <code>/etc/odbcinst.ini</code> file,
			we can simply refer to the <code>.so</code> file from our <code>~/.odbc.ini</code>.
		</p>

		<p>
			If we set <code>Server=localhost</code>, the client-server communication does not go through TCP/IP
			but rather through the unix domain socket specified in the <code>Socket</code> field.
			If we set <code>Server=127.0.0.1</code> or some remote IP address or domain name, the communication goes through TCP/IP on given port.
		</p>
		
		<p>
			The <code>SET SQL_MODE=ANSI_QUOTES;</code> init statement is important,
			because it tells MySQL server that it should support standard SQL "quoted" identifiers
			instead of that `weird` MySQL style.
			We use the standard SQL while creating the tables.
		</p>
		
		<p>
			There are many other parameters, quite well 
			<a href="https://dev.mysql.com/doc/connector-odbc/en/connector-odbc-configuration-connection-parameters.html">documented</a>.
		</p>
		
		<p>
			Now we can use MySQL as the <i>SQL engine</i> for transformations in our pipelines
			and we can also access existing MySQL databases,
			load data to and from them
			or call functions and procedures installed on the server. 
		</p>
		
		
	</text>

</stránka>