diff -r 418e11eb6fea -r 192b0059a6c4 relpipe-data/examples-tr-sql-odbc.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/relpipe-data/examples-tr-sql-odbc.xml Sat Jun 06 01:57:24 2020 +0200 @@ -0,0 +1,286 @@ + + + Accessing SQLite, PostgreSQL and MySQL through ODBC + use various DBMS for SQL transformations or data access + 04200 + + + +

+ Since v0.16 the relpipe-tr-sql module + uses the ODBC abstraction layer and thus we can access data in any DBMS (database management system). + Our program depends only on the generic API and the driver for particular DBMS is loaded dynamically depending on the configuration. +

+ +

+
+ ODBC (Open Database Connectivity) is an industry standard and provides API for accessing a DBMS. + In late 80s several vendors (mostly from the Unix and database communities) established the SQL Access Group (SAG) + and then specified the Call Level Interface (CLI). ODBC, which is based on CLI, was published in early 90s. + ODBC is available on many operating systems and there are at least two free software implementations: + unixODBC and iODBC. +
+

+ +

For more information see the v0.16 release notes.

+ +

General concepts and configuration

+ +

+ ODBC: + the API consisting of C functions; see the files sql.h and sqlext.h e.g. in unixODBC. +

+ Database driver: + a shared library (an .so file) + that implements the API and connects to particular DBMS (SQLite, PostgreSQL, MySQL, MariaDB, Firebird etc.); + is usually provided by the authors of given DBMS, sometimes writen by a third-party +

+ Client: + a program that calls the API in order to access a database; our relpipe-tr-sql is a client +

+ Data Source Name (DSN): + the name of a preconfigured data source – when connecting, we need to know only the DSN – all parameters + (like server name, user name, password etc.) can be then looked-up in the configuration +

+ Connection string: + a text string consisting of serialized parameters needed for connecting + – we can specify all parameters ad-hoc in the connection string without creating any permanent configuration; + a connection string can also refer to a DSN and add or override some parameters +

+ +

+ There is some global configuration in the /etc directory. + In /etc/odbcinst.ini we can a find list of ODBC drivers. + Thanks to it, we can refer to a driver by its name (e.g. SQLite3) + instead of the path to the shared library (e.g. /usr/lib/x86_64-linux-gnu/odbc/libsqlite3odbc.so). + In /etc/odbc.ini we can find a list of global (for given computer) data sources. + It is uncommon to put complete configurations in this file, because anyone would be able to read the passwords, + but we can provide here just a template with public parameters like server name, port etc. + and user will supply his own user name and password in the connection string or in his personal configuration file. +

+ +

+ The ~/.odbc.ini contains personal configuration of given user. + There are usually data sources including the passwords. + Thus this file must be readable only by given user (chmod 600 ~/.odbc.ini). + Providing passwords in connection strings passed as CLI arguments is not a good practice due to security reasons: + by default it is stored in the shell history and it is also visible to other users of the same machine in the list of running processes. +

+ +

+ The section name – in the [] brackets – is the DSN. + Then there are parameters in form of key=value on each line. +

+ + +

CLI options

+ +

+ The relpipe-tr-sql and relpipe-in-sql support these relevant CLI options: +

+ +

+ --list-data-sources: + lists available (configured) data sources in relational format (so we pipe the output to some output filter e.g. to relpipe-out-tabular) +
+ --data-source-name: + specifies the DSN of a configured data source +
+ --data-source-string: + specifies the connections string for ad-hoc connection without need of any configuration +

+ +

+ Because output of this command is relational, we can further process it in our relational pipelines. + This output is also used for the Bash-completion for suggesting the DSN. +

+ +

+ If neither --data-source-name nor --data-source-string option is provided, + a temporary in-memory SQLite database is used as default. +

+ +

SQLite

+ +

In Debian GNU/Linux and similar distributions we can install SQLite ODBC driver by this command:

+ +

apt install libsqliteodbc

+ +

Which also installs the SQLite library that is all we need (because SQLite is a serverless and self-contained database).

+ +

+ Then we can use the default in-memory temporary database or specify the connection string ad-hoc, + access existing SQLite databases + or create new ones – e.g. this command: +

+ +

… | relpipe-tr-sql --data-source-string 'Driver=SQLite3;Database=file:MyDatabase.sqlite'

+ +

will create the MyDatabase.sqlite file and fill it with relations that came from STDIN.

+ +

For frequently used databases it is convenient to configure a data source in ~/.odbc.ini:

+ + + +

+ and then connect to it simply using --data-source-name MyDatabase + (both the option and the name will be suggested by Bash-completion). +

+ +

+ The SQLite ODBC driver supports several parameters that are described in its documentation. + One of them is LoadExt that loads SQLite extensions: +

+ + + +

+ So we can write our own SQLite extension with custom functions or other features + (example) + or chose some existing one and load it into the SQLite connected through ODBC. +

+ + +

PostgreSQL

+ +

In Debian GNU/Linux and similar distributions we can install PostgreSQL ODBC driver by this command:

+ +

apt install odbc-postgresql

+ +

+ PostgreSQL is very powerful DBMS (probably most advanced free software relational database system) + and utilizes the client-server architecture. + This means that we also needs a server (can be also installed through apt like the driver). +

+ +

+ Once we have a server – remote or local – we need to create a user (role). + For SQL transformations we configure a dedicated role that has no persistent schema and uses the temporary one as default, + which means that all relations we create are lost at the end of the session (when the relpipe-tr-sql command finishes), + thus it behaves very similar to the SQLite in-memory database. +

+ + + +

+ And then we configure the ODBC data source: +

+ + + +

+ Now we can use advanced PostgreSQL features for transforming data in our pipelines. + We can also configure a DSN for another database that contains some useful data and other database objects, + call existing business functions installed in such database, load data to or from this DB etc. +

+ + +

MySQL

+ +

+ If the libmyodbc package is missing in our distribution, + the ODBC driver for MySQL can be downloaded from their website. + We can get a binary package (.deb, .rpm etc.) or source code. + If we are compiling from sources, we do something like this: +

+ + + +

+ We should use the driver in the same or similar version as the MySQL client library installed on our system. + For example 8.x driver will not work with 5.x library. + Successful compilation results in libmyodbc*.so files. +

+ +

+ Like PostgreSQL, also MySQL is a client-server, + so we need a server where we create a database and some user account. + As root through the mysql mysql command we execute: +

+ + + +

As a normal user we add new data source to our ~/.odbc.ini file:

+ + + +

+ See that we have compiled the ODBC driver in our home directory + and even without installing it anywhere and registering it in the /etc/odbcinst.ini file, + we can simply refer to the .so file from our ~/.odbc.ini. +

+ +

+ If we set Server=localhost, the client-server communication does not go through TCP/IP + but rather through the unix domain socket specified in the Socket field. + If we set Server=127.0.0.1 or some remote IP address or domain name, the communication goes through TCP/IP on given port. +

+ +

+ The SET SQL_MODE=ANSI_QUOTES; init statement is important, + because it tells MySQL server that it should support standard SQL "quoted" identifiers + instead of that `weird` MySQL style. + We use the standard SQL while creating the tables. +

+ +

+ There are many other parameters, quite well + documented. +

+ +

+ Now we can use MySQL as the SQL engine for transformations in our pipelines + and we can also access existing MySQL databases, + load data to and from them + or call functions and procedures installed on the server. +

+ + + + +