# HG changeset patch # User František Kučera # Date 1559071100 -7200 # Node ID 2868d772c27e807e492ef5ec540af6bcda6e4f74 # Parent a390662645097860e62e46acbb50864eb47e0993 Release v0.12 – AWK diff -r a39066264509 -r 2868d772c27e relpipe-data/download.xml --- a/relpipe-data/download.xml Tue Apr 09 22:53:40 2019 +0200 +++ b/relpipe-data/download.xml Tue May 28 21:18:20 2019 +0200 @@ -33,6 +33,7 @@ hg clone https://hg.globalcode.info/relpipe/relpipe-out-recfile.cpp; hg clone https://hg.globalcode.info/relpipe/relpipe-out-tabular.cpp; hg clone https://hg.globalcode.info/relpipe/relpipe-out-xml.cpp; +hg clone https://hg.globalcode.info/relpipe/relpipe-tr-awk.cpp; hg clone https://hg.globalcode.info/relpipe/relpipe-tr-cut.cpp; hg clone https://hg.globalcode.info/relpipe/relpipe-tr-grep.cpp; hg clone https://hg.globalcode.info/relpipe/relpipe-tr-guile.cpp; @@ -58,6 +59,7 @@
  • 2019-01-18: v0.9
  • 2019-02-20: v0.10
  • 2019-04-08: v0.11
  • +
  • 2019-05-28: v0.12
  • diff -r a39066264509 -r 2868d772c27e relpipe-data/examples-awk-aggregations.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/relpipe-data/examples-awk-aggregations.xml Tue May 28 21:18:20 2019 +0200 @@ -0,0 +1,90 @@ + + + Aggregating data with AWK + counting records and computing sum or appending new records + 02600 + + + +

    + We have filtered records, modified attribute values, added and removed attributes, dropped a relation… + and there is one more operation that we can do with AWK: INSERT resp. appending or preppending additional records to the relation + – and we can also completely replace the record set by skipping the original records. +

    + +

    Adding records

    + +

    + Using options --before-records and --after-records we can pass additional AWK code that will be executed – once for given relation. + The record() function will then generate an additional record (can be called multiple times and generate more records): +

    + + + +

    Which will INSERT one new record:

    + +
    + +

    Counting and summarizing values

    + +

    We can also compute some statistics like COUNT() and SUM():

    + + + +

    and get result:

    + +
    + +

    Where the total_size is the same as will du compute:

    + +
    find . -type f -print0 | du -b -c --files0-from=-
    + +

    Analogously we can compute minimum, maximum etc. using AWK transformation.

    + +
    + +
    diff -r a39066264509 -r 2868d772c27e relpipe-data/examples-awk-boolean-logic.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/relpipe-data/examples-awk-boolean-logic.xml Tue May 28 21:18:20 2019 +0200 @@ -0,0 +1,61 @@ + + + Using boolean logic with AWK + boolean data type and logical AND, OR, XOR operators + 02500 + + + +

    Expect that we have a relation with four combinations of logical values:

    + +
    + +

    + we can use logical operators and functions: +

    + + + + +

    and append their results to the relation as additional attributes:

    + +
    + +
    + +
    diff -r a39066264509 -r 2868d772c27e relpipe-data/examples-awk-changing-structure.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/relpipe-data/examples-awk-changing-structure.xml Tue May 28 21:18:20 2019 +0200 @@ -0,0 +1,112 @@ + + + Changing structures with AWK + adding or removing attributes or dropping a relation + 02400 + + + +

    + The AWK transformations can also change the structure of transformed relation. + It means adding or removing attributes or dropping the whole relation. +

    + +

    Adding attributes with AWK

    + +

    + Using --output-attribute we can specify the output attributes. + If we do not want to explicitly specify all of them and just want to add some new ones, we will use --input-attributes-append (or --input-attributes-prepend), + which will preserve also the input attributes: +

    + + + +

    This adds one new attribute with ordinal numbers:

    + +
    + + +

    Remnoving attributes with AWK

    + +

    Or we can omit omit attributes unless explicitly specified ones:

    + + + +

    which effectively removes unlisted attributes:

    + +
    + + +

    AWK is a powerful language so we can use conditions, for cycles etc. and write much more complex transformations.

    + +

    Dropping a relation

    + +

    + A relation can be „dropped“ which means that transformation will run but no relational output will be generated for it + (even the header will be omitted, so it differs from just eliminating all records by a condition). + Using AWK for such a simple operation like DROP seems weird but sometimes it might make sense due to intentional side effects. +

    + +

    + Because the AWK code is executed for each record, we can e.g. write some output to a file or to STDERR: +

    + + "/dev/stderr" }' \ + --drop]]> + +

    Which prints text:

    + +
    + +

    + Then relpipe-tr-awk works much like an output filter (converts relational data to another format). + However, if there are more relations and some of theme are not matched by --relation, they will be passed through and delivered to the STDOUT in the relational format. + STDERR might be occasionally polluted by some warning messages, so using a dedicated file for such output is a safer way. +

    + + +
    + +
    diff -r a39066264509 -r 2868d772c27e relpipe-data/examples-awk-changing-values.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/relpipe-data/examples-awk-changing-values.xml Tue May 28 21:18:20 2019 +0200 @@ -0,0 +1,45 @@ + + + Changing values with AWK + regular expression text replacement + 02300 + + + +

    + Besides filtering, we can use an AWK transformation to modify attribute values. + This means simply rewriting the value of given variable in AWK and calling record() function at the end. +

    + +

    For example we can move all volumes mounted under /mnt/ to another directory using regular expressions:

    + + + +

    which will result in:

    + +
    + +

    + We can modify multiple attributes in a single transformation + and we can also use other AWK functions like toupper(), tolower() etc. +

    + +
    + +
    diff -r a39066264509 -r 2868d772c27e relpipe-data/examples-awk-debugging.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/relpipe-data/examples-awk-debugging.xml Tue May 28 21:18:20 2019 +0200 @@ -0,0 +1,129 @@ + + + Debugging AWK transformations + discovering variable mappings and transformation internals + 02200 + + + +

    In most cases, AWK transformations should be quite straightforward, but sometimes we need to look inside the box.

    + +

    Mapping attributes to variables

    + +

    + Relations have named attributes but in a language like AWK we work with named variables. + In most cases, the names will match 1:1. But not always. + The mapping is needed because not all valid attribute names are also valid variable names in particular language, thus sometimes some escaping or prefixing is necessary. + So there is --debug-variable-mapping option for printing the mappings between attributes and variables. +

    + + + +

    This option prepends additional relation with these metadata to the stream:

    + +
    + +

    If we are interested only in the mappings, we should use it in combination with --drop option:

    + + + +

    which skips the actual data:

    + +
    + +

    Because there were no collisions, variables have same names as attributes. But in this case:

    + + + +

    mapping rules come in to the play:

    + +
    + +

    in order to make variable names valid in AWK.

    + + +

    Inspecting the internals of an AWK transformation

    + +

    + The relpipe-tr-awk calls AWK as a child process and passes data of given relation to it for actual processing. + Because it executes awk program found on $PATH, we can easily switch the AWK implementations. + In the source code repository, there is scripts/awk – a wrapper script. + We can modify the $PATH, so this wrapper will be called by relpipe-tr-awk. + This script captures CLI arguments, STDIN, STDOUT, STDERR and the exit code and saves them to files in the temp directory. + Using GNU Screen and the inotifywait we can build a kind of IDE and watch what happens inside during the transformation: +

    + + + +

    + So we can inspect the generated AWK code and the inputs and outputs of the AWK process. + Recommended usage is described in the scripts/awk script. +

    + +
    + +
    diff -r a39066264509 -r 2868d772c27e relpipe-data/examples-awk-filtering.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/relpipe-data/examples-awk-filtering.xml Tue May 28 21:18:20 2019 +0200 @@ -0,0 +1,108 @@ + + + Complex filtering with AWK + filtering records with AND, OR and functions + 02100 + + + +

    + If we need more complex filtering than relpipe-tr-grep can offer, we can write an AWK transformation. + Then we can use AND and OR operators and functions like regular expression matching or numerical formulas. +

    + +

    + The tool relpipe-tr-awk calls real AWK program (usually GNU AWK) installed on our system and passes data of given relation to it. + Thus we can use any AWK feature in our pipeline while processing relational data. + Relational attributes are mapped to AWK variables, so we can reference them by their names instead of mere field numbers. +

    + +

    + The --for-each option is used for both filtering (instead of --where) + and arbitrary code execution (for data modifications, adding records, computations or intentional side effects). + In AWK, filtering conditions are surrounded by (…) and actions by {…}. + Both can be combined together and multiple expressions can be separated by ; semicolon. + The record() function should be called instead of AWK print (which should never be used directly). + Calling record() is not necessary, when only filtering is done (and there are no data modifications). +

    + +

    Filtering numbers

    + +

    With AWK we can filter records using standard numeric operators like ==, <, >, >= etc.

    + + 2000)' \ + | relpipe-out-tabular]]> + +

    and e.g. list files with certain sizes:

    + +
    + + +

    Filtering strings

    + +

    String values can be searched for certain regular expression:

    + + + +

    e.g. fstab records having cdrom in the mount_point:

    + +
    + +

    Case-insensitive search can be switched on by adding:

    + +
    --define IGNORECASE integer 1
    + +

    AND and OR

    + +

    We can combine multiple conditions using || and && logical operators:

    + + + +

    and build arbitrary complex filters

    + +
    + +

    Nested (…) work as expected.

    + +

    + And AWK can do much more – it offers plenty of functions and language constructs that we can use in our transformations. + Comperhensive documentation can be found here: Gawk: Effective AWK Programming. +

    + +
    + +
    diff -r a39066264509 -r 2868d772c27e relpipe-data/examples-guile-filtering.xml --- a/relpipe-data/examples-guile-filtering.xml Tue Apr 09 22:53:40 2019 +0200 +++ b/relpipe-data/examples-guile-filtering.xml Tue May 28 21:18:20 2019 +0200 @@ -27,7 +27,7 @@ We are looking for „satanistic“ icons in our filesystem – those that have size = 666 bytes.

    - diff -r a39066264509 -r 2868d772c27e relpipe-data/examples/release-v0.12.sh --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/relpipe-data/examples/release-v0.12.sh Tue May 28 21:18:20 2019 +0200 @@ -0,0 +1,44 @@ +# Install dependencies as root: +su -c "apt install g++ make cmake mercurial pkg-config" +su -c "apt install libxerces-c-dev" # needed only for relpipe-in-xml module +su -c "apt install guile-2.2-dev" # needed only for relpipe-tr-guile module; guile-2.0-dev also works but requires a patch (see below) +su -c "apt install gawk" # needed only for relpipe-tr-awk module + +# Run rest of installation as a non-root user: +export RELPIPE_VERSION="v0.12" +export RELPIPE_SRC=~/src +export RELPIPE_BUILD=~/build +export RELPIPE_INSTALL=~/install +export PKG_CONFIG_PATH="$RELPIPE_INSTALL/lib/pkgconfig/:$PKG_CONFIG_PATH" +export PATH="$RELPIPE_INSTALL/bin:$PATH" + +rm -rf "$RELPIPE_BUILD"/relpipe-* +mkdir -p "$RELPIPE_SRC" "$RELPIPE_BUILD" "$RELPIPE_INSTALL" + +# Helper functions: +relpipe_download() { for m in "$@"; do cd "$RELPIPE_SRC" && ([[ -d "relpipe-$m.cpp" ]] && hg pull -R "relpipe-$m.cpp" && hg update -R "relpipe-$m.cpp" "$RELPIPE_VERSION" || hg clone -u "$RELPIPE_VERSION" https://hg.globalcode.info/relpipe/relpipe-$m.cpp) || break; done; } +relpipe_install() { for m in "$@"; do cd "$RELPIPE_BUILD" && mkdir -p relpipe-$m.cpp && cd relpipe-$m.cpp && cmake -DCMAKE_INSTALL_PREFIX:PATH="$RELPIPE_INSTALL" "$RELPIPE_SRC/relpipe-$m.cpp" && make && make install || break; done; } + +# Download all sources: +relpipe_download lib-protocol lib-reader lib-writer lib-cli lib-xmlwriter in-cli in-fstab in-xml in-csv in-filesystem in-recfile out-gui.qt out-nullbyte out-ods out-tabular out-xml out-csv out-asn1 out-recfile tr-cut tr-grep tr-python tr-sed tr-validator tr-guile tr-awk + +# Optional: At this point, we have all dependencies and sources downloaded, so we can disconnect this computer from the internet in order to verify that our build process is sane, deterministic and does not depend on any external resources. + +# Build and install libraries: +relpipe_install lib-protocol lib-reader lib-writer lib-cli lib-xmlwriter + +# Build and install tools: +relpipe_install in-fstab in-cli in-fstab in-xml in-csv in-recfile tr-cut tr-grep tr-sed tr-guile tr-awk out-nullbyte out-ods out-tabular out-xml out-csv out-asn1 out-recfile + +# relpipe_install in-filesystem # requires GCC 8 or patching (see below) + +# Clean-up: +unset -f relpipe_install +unset -f relpipe_download +unset -v RELPIPE_VERSION +unset -v RELPIPE_SRC +unset -v RELPIPE_BUILD +unset -v RELPIPE_INSTALL + +# Filter your your fstab using AWK and view it like on an 80s green screen terminal! +relpipe-in-fstab | relpipe-tr-awk --relation 'fstab' --for-each '(pass == 1 || type == "swap")' | relpipe-out-tabular diff -r a39066264509 -r 2868d772c27e relpipe-data/img/awk-wrapper-debug-1.png Binary file relpipe-data/img/awk-wrapper-debug-1.png has changed diff -r a39066264509 -r 2868d772c27e relpipe-data/implementation.xml --- a/relpipe-data/implementation.xml Tue Apr 09 22:53:40 2019 +0200 +++ b/relpipe-data/implementation.xml Tue May 28 21:18:20 2019 +0200 @@ -33,6 +33,7 @@ relpipe-out-recfile.cpp executable output c++ GNU GPLv3+ relpipe-out-tabular.cpp executable output c++ GNU GPLv3+ relpipe-out-xml.cpp executable output c++ GNU GPLv3+ + relpipe-tr-awk.cpp executable transformation c++ GNU GPLv3+ relpipe-tr-cut.cpp executable transformation c++ GNU GPLv3+ relpipe-tr-grep.cpp executable transformation c++ GNU GPLv3+ relpipe-tr-guile.cpp executable transformation c++ GNU GPLv3+ diff -r a39066264509 -r 2868d772c27e relpipe-data/release-v0.12.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/relpipe-data/release-v0.12.xml Tue May 28 21:18:20 2019 +0200 @@ -0,0 +1,111 @@ + + + Release v0.12 + fifth public release of Relational pipes + v0.12 + + +

    + We are pleased to introduce you the new development version of . + This release brings AWK support and some smaller changes: +

    + +
      +
    • + AWK transformations: + now it is possible to write transformations using the classic AWK tool and its language. + Relational data can be filtered and modified (incuding additions of new records) or any AWK code can be executed for given relation or records. + Structural changes are also possible (adding or removing attributes or dropping relations). + The command line syntax is mostly the same as for the Guile transformation. + The AWK and Guile transformations are now the most powerful ones in the Relpipe world. +
    • + +
    • + AWK and Guile transformations: + option --debug-variable-mapping was added, so it is possible to print mappings (in relational format, of course) + between relational attributes and AWK or Guile variables. + The mapping is needed because not all valid attribute names are also valid variable names in particular language, + thus sometimes some escaping or prefixing is necessary. +
    • + +
    + +

    + See the examples and screenshots pages for details. +

    + +

    + Please note that this is still a development relasease and thus the API (libraries, CLI arguments, formats) might and will change. + Any suggestions, ideas and bug reports are welcome in our mailing list. +

    + +

    Data types

    +
      +
    • boolean
    • +
    • variable unsigned integer (prototype)
    • +
    • string in UTF-8
    • +
    +

    Inputs

    +
      +
    • Recfile
    • +
    • XML
    • +
    • CSV
    • +
    • file system
    • +
    • CLI
    • +
    • fstab
    • +
    +

    Transformations

    +
      +
    • awk: filtering and transformations using the classic AWK tool and language
    • +
    • guile: filtering and transformations defined in the Scheme language using GNU Guile
    • +
    • grep: regular expression filter, removes unwanted records from the relation
    • +
    • cut: regular expression attribute cutter (removes or duplicates attributes and can also DROP whole relation)
    • +
    • sed: regular expression replacer
    • +
    • validator: just a pass-through filter that crashes on invalid data
    • +
    • python: highly experimental
    • +
    +

    Outputs

    +
      +
    • ASN.1 BER
    • +
    • Recfile
    • +
    • CSV
    • +
    • tabular
    • +
    • XML
    • +
    • nullbyte
    • +
    • GUI in Qt
    • +
    • ODS (LibreOffice)
    • +
    + +

    + Instalation was tested on Debian GNU/Linux 9.6. + The process should be similar on other distributions. +

    + + + +

    + are modular thus you can download and install only parts you need (the libraries are needed always). + Tools out-gui.qt and tr-python require additional libraries and are not built by default. +

    + +

    + The module relpipe-in-filesystem uses C++ filesystem API which is supported since GCC 8. + This module can be compiled and seems usable even with GCC 6, but requires some patching (switch to the experimental API): +

    + + @#include @g' -i "$RELPIPE_SRC"/relpipe-in-filesystem.cpp/src/FileAttributeFinder.h "$RELPIPE_SRC"/relpipe-in-filesystem.cpp/src/XattrAttributeFinder.h "$RELPIPE_SRC"/relpipe-in-filesystem.cpp/src/FilesystemCommand.h "$RELPIPE_SRC"/relpipe-in-filesystem.cpp/src/AttributeFinder.h +sed 's@std::filesystem@std::experimental::filesystem@g' -i "$RELPIPE_SRC"/relpipe-in-filesystem.cpp/src/FileAttributeFinder.h "$RELPIPE_SRC"/relpipe-in-filesystem.cpp/src/XattrAttributeFinder.h "$RELPIPE_SRC"/relpipe-in-filesystem.cpp/src/FilesystemCommand.h "$RELPIPE_SRC"/relpipe-in-filesystem.cpp/src/AttributeFinder.h +sed 's/.*PROPERTY CXX_STANDARD.*/#\0/g' -i "$RELPIPE_SRC"/relpipe-in-filesystem.cpp/src/CMakeLists.txt]]> + +

    + The module relpipe-tr-guile uses GNU Guile 2.2 but can also work with 2.0. + In such case, it requires this patch: +

    + + + +
    + +
    diff -r a39066264509 -r 2868d772c27e relpipe-data/roadmap.xml --- a/relpipe-data/roadmap.xml Tue Apr 09 22:53:40 2019 +0200 +++ b/relpipe-data/roadmap.xml Tue May 28 21:18:20 2019 +0200 @@ -16,7 +16,7 @@ Released versions are described on the download page.

    -

    v0.12, v0.13, v0.14 etc.

    +

    v0.13, v0.14, v0.15 etc.

    Releases for discussion and verification of the format and API design. @@ -48,7 +48,6 @@

    Transformations

    Outputs