# HG changeset patch
# User František Kučera
This index allows us to do fast searches and various analysis.
@@ -34,7 +34,7 @@
Released versions
+
And then we can make a „popularity contest“ and find 20 most often used libraries:
+ n.b. if we use a database frequently it is convenient to configure it as a data source in the ~/.odbc.ini
file
+ – and then connect to it using the --data-source-name
option and its name.
+
Both the relpipe-tr-sql
and relpipe-in-sql
- support the option --file
and can store data in a file instead of just in-memory.
- Primary purpose of this option is to offload to HDD and be able to process more data whithout consuming too much RAM.
- So such file is usually a temporary one and is deleted immediately after the transformation.
- But we can also direct it to an existing file and append new relations to it.
- Or we can create a new file and do not delete it – using the --file-keep true
option.
+ support options --data-source-name
and --data-source-string
+ that can be used for connecting to a different database than the in-memory SQLite one.
+ This is useful if we want to offload to HDD and be able to process more data whithout consuming too much RAM.
+ Or if we want to access an existing database instead of just transforming records from the STDIN to STDOUT.
+ These options allow connecting to any DBMS.
Thus we can use the relpipe-in-sql
as a database client to access existing SQLite files and SELECT from them (and then convert the results to any supported format).
- An we can use the relpipe-tr-sql
as an output filter which converts relational data to a SQLite file that can be queried later.
+ And we can use the relpipe-tr-sql
as an output filter which converts relational data to a SQLite file that can be queried later.
@@ -35,11 +35,11 @@
- We can then query it using sqlite.fossil
using
We are connected to the JACK daemon now, but no data are routed to us.
diff -r 418e11eb6fea -r 192b0059a6c4 relpipe-data/examples-tr-sql-odbc.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/relpipe-data/examples-tr-sql-odbc.xml Sat Jun 06 01:57:24 2020 +0200
@@ -0,0 +1,286 @@
+
+ Since
+ ODBC (Open Database Connectivity) is an industry standard and provides API for accessing a DBMS.
+ In late 80s several vendors (mostly from the Unix and database communities) established the SQL Access Group (SAG)
+ and then specified the Call Level Interface (CLI). ODBC, which is based on CLI, was published in early 90s.
+ ODBC is available on many operating systems and there are at least two free software implementations:
+ unixODBC and iODBC.
+ For more information see the
+ ODBC:
+ the API consisting of C functions; see the files
+ Database driver:
+ a shared library (an
+ Client:
+ a program that calls the API in order to access a database; our
+ Data Source Name (DSN):
+ the name of a preconfigured data source – when connecting, we need to know only the DSN – all parameters
+ (like server name, user name, password etc.) can be then looked-up in the configuration
+
+ Connection string:
+ a text string consisting of serialized parameters needed for connecting
+ – we can specify all parameters ad-hoc in the connection string without creating any permanent configuration;
+ a connection string can also refer to a DSN and add or override some parameters
+
+ There is some global configuration in the
+ The
+ The section name – in the
+ The
+ Because output of this command is relational, we can further process it in our relational pipelines.
+ This output is also used for the Bash-completion for suggesting the DSN.
+
+ If neither In Debian GNU/Linux and similar distributions we can install SQLite ODBC driver by this command: Which also installs the SQLite library that is all we need (because SQLite is a serverless and self-contained database).
+ Then we can use the default in-memory temporary database or specify the connection string ad-hoc,
+ will create the For frequently used databases it is convenient to configure a data source in
+ and then connect to it simply using
+ The SQLite ODBC driver supports several parameters that are described in its documentation.
+ One of them is
+ So we can write our own SQLite extension with custom functions or other features
+ (example)
+ or chose some existing one and load it into the SQLite connected through ODBC.
+ In Debian GNU/Linux and similar distributions we can install PostgreSQL ODBC driver by this command:
+ PostgreSQL is very powerful DBMS (probably most advanced free software relational database system)
+ and utilizes the client-server architecture.
+ This means that we also needs a server (can be also installed through
+ Once we have a server – remote or local – we need to create a user (role).
+ For SQL transformations we configure a dedicated role that has no persistent schema and uses the temporary one as default,
+ which means that all relations we create are lost at the end of the session (when the
+ And then we configure the ODBC data source:
+
+ Now we can use advanced PostgreSQL features for transforming data in our pipelines.
+ We can also configure a DSN for another database that contains some useful data and other database objects,
+ call existing business functions installed in such database, load data to or from this DB etc.
+
+ If the
+ We should use the driver in the same or similar version as the MySQL client library installed on our system.
+ For example 8.x driver will not work with 5.x library.
+ Successful compilation results in
+ Like PostgreSQL, also MySQL is a client-server,
+ so we need a server where we create a database and some user account.
+ As root through the As a normal user we add new data source to our
+ See that we have compiled the ODBC driver in our home directory
+ and even without installing it anywhere and registering it in the
+ If we set
+ The
+ There are many other parameters, quite well
+ documented.
+
+ Now we can use MySQL as the SQL engine for transformations in our pipelines
+ and we can also access existing MySQL databases,
+ load data to and from them
+ or call functions and procedures installed on the server.
+ relpipe-tr-sql
module
+ uses the ODBC abstraction layer and thus we can access data in any DBMS (database management system).
+ Our program depends only on the generic API and the driver for particular DBMS is loaded dynamically depending on the configuration.
+
+
+
+ General concepts and configuration
+
+ sql.h
and sqlext.h
e.g. in unixODBC.
+ .so
file)
+ that implements the API and connects to particular DBMS (SQLite, PostgreSQL, MySQL, MariaDB, Firebird etc.);
+ is usually provided by the authors of given DBMS, sometimes writen by a third-party
+ relpipe-tr-sql
is a client
+ /etc
directory.
+ In /etc/odbcinst.ini
we can a find list of ODBC drivers.
+ Thanks to it, we can refer to a driver by its name (e.g. SQLite3
)
+ instead of the path to the shared library (e.g. /usr/lib/x86_64-linux-gnu/odbc/libsqlite3odbc.so
).
+ In /etc/odbc.ini
we can find a list of global (for given computer) data sources.
+ It is uncommon to put complete configurations in this file, because anyone would be able to read the passwords,
+ but we can provide here just a template with public parameters like server name, port etc.
+ and user will supply his own user name and password in the connection string or in his personal configuration file.
+ ~/.odbc.ini
contains personal configuration of given user.
+ There are usually data sources including the passwords.
+ Thus this file must be readable only by given user (chmod 600 ~/.odbc.ini
).
+ Providing passwords in connection strings passed as CLI arguments is not a good practice due to security reasons:
+ by default it is stored in the shell history and it is also visible to other users of the same machine in the list of running processes.
+ []
brackets – is the DSN.
+ Then there are parameters in form of key=value
on each line.
+ CLI options
+
+ relpipe-tr-sql
and relpipe-in-sql
support these relevant CLI options:
+
+
+
+
+
+ --list-data-sources
:
+ lists available (configured) data sources in relational format (so we pipe the output to some output filter e.g. to relpipe-out-tabular
)
+ --data-source-name
:
+ specifies the DSN of a configured data source
+ --data-source-string
:
+ specifies the connections string for ad-hoc connection without need of any configuration
+ --data-source-name
nor --data-source-string
option is provided,
+ a temporary in-memory SQLite database is used as default.
+ SQLite
+
+ apt install libsqliteodbc
+
+ … | relpipe-tr-sql --data-source-string 'Driver=SQLite3;Database=file:MyDatabase.sqlite'
+
+ MyDatabase.sqlite
file and fill it with relations that came from STDIN.~/.odbc.ini
:--data-source-name MyDatabase
+ (both the option and the name will be suggested by Bash-completion).
+ LoadExt
that loads SQLite extensions:
+ PostgreSQL
+
+ apt install odbc-postgresql
+
+ apt
like the driver).
+ relpipe-tr-sql
command finishes),
+ thus it behaves very similar to the SQLite in-memory database.
+ MySQL
+
+ libmyodbc
package is missing in our distribution,
+ the ODBC driver for MySQL can be downloaded from their website.
+ We can get a binary package (.deb
, .rpm
etc.) or source code.
+ If we are compiling from sources, we do something like this:
+ libmyodbc*.so
files.
+ mysql mysql
command we execute:
+ ~/.odbc.ini
file:/etc/odbcinst.ini
file,
+ we can simply refer to the .so
file from our ~/.odbc.ini
.
+ Server=localhost
, the client-server communication does not go through TCP/IP
+ but rather through the unix domain socket specified in the Socket
field.
+ If we set Server=127.0.0.1
or some remote IP address or domain name, the communication goes through TCP/IP on given port.
+ SET SQL_MODE=ANSI_QUOTES;
init statement is important,
+ because it tells MySQL server that it should support standard SQL "quoted" identifiers
+ instead of that `weird` MySQL style.
+ We use the standard SQL while creating the tables.
+
+
+ n.b. This method is obsolete since .so
file) in the INI file or ad-hoc in the connection string.
+
+
By default, relpipe-tr-sql
links to the SQLite library available in our distribution (e.g. 3.22).
As we can check:
+ n.b. the --file
and --file-keep
have been dropped in
See the
+ n.b. the --file
and --file-keep
have been dropped in
See the
+ We are pleased to introduce you the new development version of
relpipe-tr-sql
module: see details belowrelpipe-in-jack
module: see details belowrelpipe-out-gui
module: use Ctrl+PgUp and Ctrl+PgDown to switch panels (relations) and Ctrl+Q to quitrelpipe-out-xhtml
command: number of records is printed under the table (this command part of relpipe-out-xml
, not a standalone module)
+ See the
+ Please note that this is still a development relasease and thus the API (libraries, CLI arguments, formats) might and will change.
+ Any suggestions, ideas and bug reports are welcome in our
+ Former versions of relpipe-tr-sql
were tied to SQLite
+ and user had no option to change the SQL engine.
+ However great SQLite is (and we are very thankful for it), having some particular DBMS (database management system) hard-coded in our program is too constraining.
+ So we added an abstraction layer (ODBC) and get rid of the direct dependency on SQLite.
+ Now any DBMS can be used with
+ ODBC (Open Database Connectivity) is an industry standard and provides API for accessing a DBMS.
+ In late 80s several vendors (mostly from the Unix and database communities) established the SQL Access Group (SAG)
+ and then specified the Call Level Interface (CLI). ODBC, which is based on CLI, was published in early 90s.
+ ODBC is available on many operating systems and there are at least two free software implementations:
+ unixODBC and iODBC.
+ We use unixODBC for development and testing.
+ Future releases of
+ SQLite remains the default option
+ (in the C++ implementation, while Java or other implementations may have different default and may use different abstraction layer like JDBC).
+ We count on SQLite for future releases. It is the simplest way to get full SQL power in your relational pipeline.
+ However, relpipe-tr-sql
do not depend on SQLite and can be installed without it (and then used e.g. with PostgreSQL driver).
+ Using different DBMS makes sense for two main reasons:
+
relpipe-tr-sql
and relpipe-in-sql
can be used as a generic database clients
+ and are able to load relational data to and from any DBMS.
+ We can also write a pipeline to transfer data between two different DBMS, do some ETL (extract, transform, load) tasks
+ or just cache some result sets from a remote database in our local SQLite file.
+ We can cache e.g. some codelist tables or other data for offline use:
+ + n.b. However it still looks like executing a local command, we should be aware that while using a remote data source, + our data travel to given remote server – this impacts performance and our privacy. + Never use untrustworthy remote server for processing sensitive data (even if using just a temporary schema or tables). + If SQLite is „too small“ then PostgreSQL installed on localhost is usually a good option. +
+ +
+ There are ODBC drivers for any conceivable database system.
+ We can also write a custom driver for any other resource and just plug it in .so
file).
+
+ This release also comes with better diagnostics. This feature is not specific to ODBC, but was implemented during the rewrite of the database layer. + So if we make a mistake in our query or try to create a table with the same name as already exists in the DB, we will get a useful message with detailed description of the problem + (instead of a pointless failure notice in the previous version). +
+ +
+ The new implementation of relpipe-tr-sql
is still a bit raw and will be tuned in the upcoming versions,
+ but it seems working quite well (with SQLite, PostgreSQL and MySQL on GNU/Linux).
+ As always, testers are welcomed.
+
+ More details in the example:
+ A powerful audio system called JACK allows us to + build pipelines consisting of audio interfaces, players, recorders, filters and effects… + and route sound streams (both PCM and MIDI) through them. + MIDI messages can come from keyboards or other hardware MIDI controllers or from MIDI players and other software. + Sometimes it is useful to check what is happening under the hood and examine particular MIDI messages + instead of just playing them on a sound module or synthesizer. + Now we can bridge two seemingly unrelated worlds: real-time audio and relational pipes. +
+ +
+ We can join the JACK graph with relpipe-in-jack
command.
+ It does not consume STDIN, it gets events from JACK instead, so no other input data are needed.
+
+ More details in the example:
+ The options --file
and --file-keep
in relpipe-tr-sql
(and relpipe-in-sql
, which is an alias for the same binary)
+ have been dropped.
+ These options were specific to SQLite and make no sense now, when we do not depend on particular DBMS and can use any engine for SQL processing
+ (even a remote one somewhere on the network that could not reach our local files).
+ However SQLite is still the default option and the:
+
can be simply replaced by:
+ +
+ Bash-completion works and will suggest even the Driver=SQLite3;Database=file:
part, so it is not necessary to memorize the connection string.
+ Frequently used databases can be configured in the ~/.odbc.ini
file and then referenced just by their names using --data-source-name
+ (the data source names – DSN – are also suggested by Bash-completion).
+
+ There is no built-in replacement for the --file-keep
option.
+ But if the user wants to create a temporary file and delete it at the end of the transformation,
+ he can simply add rm -f myDatabase.sqlite
to his script.
+
+ Instalation was tested on Debian GNU/Linux 10.2. + The process should be similar on other distributions. +
+ +
+ out-gui.qt
and tr-python
require additional libraries and are not built by default.
+
Releases for discussion and verification of the format and API design. @@ -195,6 +195,7 @@
Other libraries and tools: