--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/relpipe-data/examples-tr-sql-odbc.xml Sat Jun 06 01:57:24 2020 +0200
@@ -0,0 +1,286 @@
+<stránka
+ xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
+ xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
+
+ <nadpis>Accessing SQLite, PostgreSQL and MySQL through ODBC</nadpis>
+ <perex>use various DBMS for SQL transformations or data access</perex>
+ <m:pořadí-příkladu>04200</m:pořadí-příkladu>
+
+ <text xmlns="http://www.w3.org/1999/xhtml">
+
+ <p>
+ Since <m:a href="release-v0.16">v0.16</m:a> the <code>relpipe-tr-sql</code> module
+ uses the ODBC abstraction layer and thus we can access data in any DBMS (database management system).
+ Our program depends only on the generic API and the driver for particular DBMS is loaded dynamically depending on the configuration.
+ </p>
+
+ <blockquote>
+ <p>
+ ODBC (Open Database Connectivity) is an industry standard and provides API for accessing a DBMS.
+ In late 80s several vendors (mostly from the Unix and database communities) established the SQL Access Group (SAG)
+ and then specified the Call Level Interface (CLI). ODBC, which is based on CLI, was published in early 90s.
+ ODBC is available on many operating systems and there are at least two free software implementations:
+ <a href="http://www.unixodbc.org/">unixODBC</a> and <a href="http://www.iodbc.org/">iODBC</a>.
+ </p>
+ </blockquote>
+
+ <p>For more information see the <m:a href="release-v0.16">v0.16 release notes</m:a>.</p>
+
+ <h2>General concepts and configuration</h2>
+
+ <p>
+ <strong>ODBC</strong>:
+ the API consisting of C functions; see the files <code>sql.h</code> and <code>sqlext.h</code> e.g. in unixODBC.
+ </p>
+ <p>
+ <strong>Database driver</strong>:
+ a shared library (an <code>.so</code> file)
+ that implements the API and connects to particular DBMS (SQLite, PostgreSQL, MySQL, MariaDB, Firebird etc.);
+ is usually provided by the authors of given DBMS, sometimes writen by a third-party
+ </p>
+ <p>
+ <strong>Client</strong>:
+ a program that calls the API in order to access a database; our <code>relpipe-tr-sql</code> is a client
+ </p>
+ <p>
+ <strong>Data Source Name (DSN)</strong>:
+ the name of a preconfigured data source – when connecting, we need to know only the DSN – all parameters
+ (like server name, user name, password etc.) can be then looked-up in the configuration
+ </p>
+ <p>
+ <strong>Connection string</strong>:
+ a text string consisting of serialized parameters needed for connecting
+ – we can specify all parameters ad-hoc in the connection string without creating any permanent configuration;
+ a connection string can also refer to a DSN and add or override some parameters
+ </p>
+
+ <p>
+ There is some global configuration in the <code>/etc</code> directory.
+ In <code>/etc/odbcinst.ini</code> we can a find list of ODBC drivers.
+ Thanks to it, we can refer to a driver by its name (e.g. <code>SQLite3</code>)
+ instead of the path to the shared library (e.g. <code>/usr/lib/x86_64-linux-gnu/odbc/libsqlite3odbc.so</code>).
+ In <code>/etc/odbc.ini</code> we can find a list of global (for given computer) data sources.
+ It is uncommon to put complete configurations in this file, because anyone would be able to read the passwords,
+ but we can provide here just a <i>template</i> with public parameters like server name, port etc.
+ and user will supply his own user name and password in the connection string or in his personal configuration file.
+ </p>
+
+ <p>
+ The <code>~/.odbc.ini</code> contains personal configuration of given user.
+ There are usually data sources including the passwords.
+ Thus this file must be readable only by given user (<code>chmod 600 ~/.odbc.ini</code>).
+ Providing passwords in connection strings passed as CLI arguments is not a good practice due to security reasons:
+ by default it is stored in the shell history and it is also visible to other users of the same machine in the list of running processes.
+ </p>
+
+ <p>
+ The section name – in the <code>[]</code> brackets – is the DSN.
+ Then there are parameters in form of <code>key=value</code> on each line.
+ </p>
+
+
+ <h2>CLI options</h2>
+
+ <p>
+ The <code>relpipe-tr-sql</code> and <code>relpipe-in-sql</code> support these relevant CLI options:
+ </p>
+
+ <ul>
+ <li>
+ <code>--list-data-sources</code>:
+ lists available (configured) data sources in relational format (so we pipe the output to some output filter e.g. to <code>relpipe-out-tabular</code>)
+ </li>
+ <li>
+ <code>--data-source-name</code>:
+ specifies the DSN of a configured data source
+ </li>
+ <li>
+ <code>--data-source-string</code>:
+ specifies the connections string for ad-hoc connection without need of any configuration
+ </li>
+ </ul>
+
+ <pre><![CDATA[$ relpipe-tr-sql --list-data-sources | relpipe-out-tabular
+data_source:
+ ╭───────────────┬──────────────────────╮
+ │ name (string) │ description (string) │
+ ├───────────────┼──────────────────────┤
+ │ sqlite-memory │ SQLite3 │
+ │ relpipe │ PostgreSQL Unicode │
+ ╰───────────────┴──────────────────────╯
+Record count: 2]]></pre>
+
+ <p>
+ Because output of this command is relational, we can further process it in our relational pipelines.
+ This output is also used for the Bash-completion for suggesting the DSN.
+ </p>
+
+ <p>
+ If neither <code>--data-source-name</code> nor <code>--data-source-string</code> option is provided,
+ a temporary in-memory SQLite database is used as default.
+ </p>
+
+ <h2>SQLite</h2>
+
+ <p>In Debian GNU/Linux and similar distributions we can install <a href="https://sqlite.org/">SQLite</a> ODBC driver by this command:</p>
+
+ <pre>apt install libsqliteodbc</pre>
+
+ <p>Which also installs the SQLite library that is all we need (because SQLite is a <i>serverless and self-contained</i> database).</p>
+
+ <p>
+ Then we can use the default in-memory temporary database or specify the connection string ad-hoc,
+ <m:a href="examples-in-sql-selecting-existing-database">access existing SQLite databases</m:a>
+ or <m:a href="examples-in-filesystem-tr-sql-indexing">create new ones</m:a> – e.g. this command:
+ </p>
+
+ <pre>… | relpipe-tr-sql --data-source-string 'Driver=SQLite3;Database=file:MyDatabase.sqlite'</pre>
+
+ <p>will create the <code>MyDatabase.sqlite</code> file and fill it with relations that came from STDIN.</p>
+
+ <p>For frequently used databases it is convenient to configure a data source in <code>~/.odbc.ini</code>:</p>
+
+ <m:pre jazyk="ini"><![CDATA[[MyDatabase]
+Driver=SQLite3
+Database=file:/home/hacker/MyDatabase.sqlite]]></m:pre>
+
+ <p>
+ and then connect to it simply using <code>--data-source-name MyDatabase</code>
+ (both the option and the name will be suggested by Bash-completion).
+ </p>
+
+ <p>
+ The <a href="http://www.ch-werner.de/sqliteodbc/html/index.html">SQLite ODBC driver</a> supports several parameters that are described in its documentation.
+ One of them is <code>LoadExt</code> that loads SQLite extensions:
+ </p>
+
+ <m:pre jazyk="ini"><![CDATA[LoadExt=/home/hacker/libdemo.so]]></m:pre>
+
+ <p>
+ So we can write our own SQLite extension with custom functions or other features
+ (<a href="https://blog.frantovo.cz/c/383/Komplexita%20softwaru%3A%20%C5%98e%C5%A1en%C3%AD%20a%C2%A0prevence#toc_sqlite">example</a>)
+ or chose some existing one and load it into the SQLite connected through ODBC.
+ </p>
+
+
+ <h2>PostgreSQL</h2>
+
+ <p>In Debian GNU/Linux and similar distributions we can install <a href="https://www.postgresql.org/">PostgreSQL</a> ODBC driver by this command:</p>
+
+ <pre>apt install odbc-postgresql</pre>
+
+ <p>
+ PostgreSQL is very powerful DBMS (probably most advanced free software relational database system)
+ and utilizes the client-server architecture.
+ This means that we also needs a server (can be also installed through <code>apt</code> like the driver).
+ </p>
+
+ <p>
+ Once we have a server – remote or local – we need to create a user (role).
+ For SQL transformations we configure a dedicated role that has no persistent schema and uses the temporary one as default,
+ which means that all relations we create are lost at the end of the session (when the <code>relpipe-tr-sql</code> command finishes),
+ thus it behaves very similar to the SQLite in-memory database.
+ </p>
+
+ <m:pre jazyk="sql"><![CDATA[CREATE USER relpipe WITH PASSWORD 'someSecretPassword';
+ALTER ROLE relpipe SET search_path TO 'pg_temp';]]></m:pre>
+
+ <p>
+ And then we <a href="https://odbc.postgresql.org/docs/config.html">configure</a> the ODBC data source:
+ </p>
+
+ <m:pre jazyk="ini"><![CDATA[[postgresql-temp]
+Driver=PostgreSQL Unicode
+Database=postgres
+Servername=localhost
+Port=5432
+Username=relpipe
+Password=someSecretPassword]]></m:pre>
+
+ <p>
+ Now we can use advanced PostgreSQL features for transforming data in our pipelines.
+ We can also configure a DSN for another database that contains some useful data and other database objects,
+ call existing business functions installed in such database, load data to or from this DB etc.
+ </p>
+
+
+ <h2>MySQL</h2>
+
+ <p>
+ If the <code>libmyodbc</code> package is missing in our distribution,
+ the ODBC driver for <a href="https://dev.mysql.com/downloads/connector/odbc/">MySQL</a> can be downloaded from their website.
+ We can get a binary package (<code>.deb</code>, <code>.rpm</code> etc.) or source code.
+ If we are compiling from sources, we do something like this:
+ </p>
+
+ <m:pre jazyk="bash"><![CDATA[cd mysql-connector-odbc-*-src/
+mkdir build
+cd build
+cmake ../ -DWITH_UNIXODBC=1
+make]]></m:pre>
+
+ <p>
+ We should use the driver in the same or similar version as the MySQL client library installed on our system.
+ For example 8.x driver will not work with 5.x library.
+ Successful compilation results in <code>libmyodbc*.so</code> files.
+ </p>
+
+ <p>
+ Like PostgreSQL, also MySQL is a client-server,
+ so we need a server where we create a database and some user account.
+ As root through the <code>mysql mysql</code> command we execute:
+ </p>
+
+ <m:pre jazyk="sql"><![CDATA[CREATE DATABASE relpipe CHARACTER SET = utf8;
+CREATE USER 'relpipe'@'localhost' IDENTIFIED BY 'someSecretPassword';
+GRANT ALL PRIVILEGES ON relpipe.* TO 'relpipe'@'localhost';
+FLUSH PRIVILEGES;]]></m:pre>
+
+ <p>As a normal user we add new data source to our <code>~/.odbc.ini</code> file:</p>
+
+ <m:pre jazyk="ini"><![CDATA[[mysql-relpipe-localhost]
+Driver=/home/hacker/src/mysql/build/lib/libmyodbc5w.so
+Server=localhost
+Port=3306
+Socket=/var/run/mysqld/mysqld.sock
+User=relpipe
+Password=someSecretPassword
+Database=relpipe
+InitStmt=SET SQL_MODE=ANSI_QUOTES;
+Charset=utf8]]></m:pre>
+
+ <p>
+ See that we have compiled the ODBC driver in our home directory
+ and even without installing it anywhere and registering it in the <code>/etc/odbcinst.ini</code> file,
+ we can simply refer to the <code>.so</code> file from our <code>~/.odbc.ini</code>.
+ </p>
+
+ <p>
+ If we set <code>Server=localhost</code>, the client-server communication does not go through TCP/IP
+ but rather through the unix domain socket specified in the <code>Socket</code> field.
+ If we set <code>Server=127.0.0.1</code> or some remote IP address or domain name, the communication goes through TCP/IP on given port.
+ </p>
+
+ <p>
+ The <code>SET SQL_MODE=ANSI_QUOTES;</code> init statement is important,
+ because it tells MySQL server that it should support standard SQL "quoted" identifiers
+ instead of that `weird` MySQL style.
+ We use the standard SQL while creating the tables.
+ </p>
+
+ <p>
+ There are many other parameters, quite well
+ <a href="https://dev.mysql.com/doc/connector-odbc/en/connector-odbc-configuration-connection-parameters.html">documented</a>.
+ </p>
+
+ <p>
+ Now we can use MySQL as the <i>SQL engine</i> for transformations in our pipelines
+ and we can also access existing MySQL databases,
+ load data to and from them
+ or call functions and procedures installed on the server.
+ </p>
+
+
+ </text>
+
+</stránka>