# HG changeset patch # User František Kučera # Date 1578700611 -3600 # Node ID 6f15f18d2abfecc22f1c05248e22b741e7f2bb3b # Parent 9172bd97ae99edf5be8b8e7b30915b0e21d663e0 field group --exec, replaces --script and --hash, starts reusable sub-program that returns set of attributes for all records during its runtime (no fork/exec for each record like with --script) diff -r 9172bd97ae99 -r 6f15f18d2abf bash-completion.sh --- a/bash-completion.sh Mon Nov 11 14:42:13 2019 +0100 +++ b/bash-completion.sh Sat Jan 11 00:56:51 2020 +0100 @@ -54,29 +54,19 @@ "dublincore.rights" ) - HASH_FIELDS=( - "md5" - "sha1" - "sha256" - "sha512" - ) - - if [[ "$w1" == "--relation" && "x$w0" == "x" ]]; then COMPREPLY=("''") elif [[ "$w1" == "--as" && "x$w0" == "x" ]]; then COMPREPLY=("''") elif [[ "$w1" == "--option" && "x$w0" == "x" ]]; then COMPREPLY=("''") elif [[ "$w2" == "--option" && "x$w0" == "x" ]]; then COMPREPLY=("''") elif [[ "$w1" == "--file" ]]; then COMPREPLY=($(compgen -W "${FILE_FIELDS[*]}" -- "$w0")) elif [[ "$w1" == "--xattr" ]]; then COMPREPLY=($(compgen -W "${XATTR_FIELDS[*]}" -- "$w0")) - elif [[ "$w1" == "--hash" ]]; then COMPREPLY=($(compgen -W "${HASH_FIELDS[*]}" -- "$w0")) - elif [[ "$w1" == "--script" ]]; then COMPREPLY=($(compgen -W "$(_relpipe_in_filesystem_scripts)" -- "$w0")) + elif [[ "$w1" == "--exec" ]]; then COMPREPLY=($(compgen -W "$(_relpipe_in_filesystem_scripts)" -- "$w0")) else OPTIONS=( "--relation" "--file" "--xattr" - "--hash" - "--script" + "--exec" "--as" "--option" ) diff -r 9172bd97ae99 -r 6f15f18d2abf nbproject/configurations.xml --- a/nbproject/configurations.xml Mon Nov 11 14:42:13 2019 +0100 +++ b/nbproject/configurations.xml Sat Jan 11 00:56:51 2020 +0100 @@ -46,10 +46,9 @@ CLIParser.h Configuration.h FileAttributeFinder.h - HashAttributeFinder.h RequestedField.h - ScriptAttributeFinder.h - SystemProcess.h + SubProcess.cpp + SubProcess.h XattrAttributeFinder.h relpipe-in-filesystem.cpp @@ -76,7 +75,7 @@ false - + @@ -100,25 +99,11 @@ true - - - - - - - - - + + + - - - - - - - - - + @@ -165,13 +150,11 @@ - - - + - + diff -r 9172bd97ae99 -r 6f15f18d2abf script-examples/__relpipe_in_filesystem_script_inode --- a/script-examples/__relpipe_in_filesystem_script_inode Mon Nov 11 14:42:13 2019 +0100 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,29 +0,0 @@ -#!/bin/bash - -# Relational pipes -# Copyright © 2019 František Kučera (Frantovo.cz, GlobalCode.info) -# -# This program is free software: you can redistribute it and/or modify -# it under the terms of the GNU General Public License as published by -# the Free Software Foundation, version 3 of the License. -# -# This program is distributed in the hope that it will be useful, -# but WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -# GNU General Public License for more details. -# -# You should have received a copy of the GNU General Public License -# along with this program. If not, see . - - -# returns the inode number of given file -# not very useful – just a demo returning an integer attribute - -if [[ $# == 0 ]]; then - echo "1"; - echo "integer"; -elif [[ -f "$1" || -d "$1" ]]; then - ls -d -i "$1" | cut -d' ' -f1 | tr -d '\n'; -else - exit 40; -fi diff -r 9172bd97ae99 -r 6f15f18d2abf script-examples/__relpipe_in_filesystem_script_mime-type --- a/script-examples/__relpipe_in_filesystem_script_mime-type Mon Nov 11 14:42:13 2019 +0100 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,28 +0,0 @@ -#!/bin/bash - -# Relational pipes -# Copyright © 2019 František Kučera (Frantovo.cz, GlobalCode.info) -# -# This program is free software: you can redistribute it and/or modify -# it under the terms of the GNU General Public License as published by -# the Free Software Foundation, version 3 of the License. -# -# This program is distributed in the hope that it will be useful, -# but WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -# GNU General Public License for more details. -# -# You should have received a copy of the GNU General Public License -# along with this program. If not, see . - - -# returns the MIME type of given file - -if [[ $# == 0 ]]; then - echo "1"; - echo "string"; -elif [[ -f "$1" || -d "$1" ]]; then - file --preserve-date --brief --mime-type --dereference "$1" | tr -d '\n'; -else - exit 40; -fi diff -r 9172bd97ae99 -r 6f15f18d2abf script-examples/__relpipe_in_filesystem_script_pdf --- a/script-examples/__relpipe_in_filesystem_script_pdf Mon Nov 11 14:42:13 2019 +0100 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,46 +0,0 @@ -#!/bin/bash - -# Relational pipes -# Copyright © 2019 František Kučera (Frantovo.cz, GlobalCode.info) -# -# This program is free software: you can redistribute it and/or modify -# it under the terms of the GNU General Public License as published by -# the Free Software Foundation, version 3 of the License. -# -# This program is distributed in the hope that it will be useful, -# but WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -# GNU General Public License for more details. -# -# You should have received a copy of the GNU General Public License -# along with this program. If not, see . - - -# Quite dirty hack to get some information about given PDF file -# TODO: better field names, more stable API -# TODO: call a PDF library rather than parse output of a commandline tool with human readable output - -if [[ $# == 0 ]]; then - echo "1"; - if [[ "x$field" == "xPages" ]]; then echo "integer"; - elif [[ -z "${field+x}" ]]; then echo "boolean"; - else echo "string"; - fi -elif [[ -f "$1" || -d "$1" ]]; then - info="`pdfinfo -isodates "$1"`"; - valid=$?; - if [[ "x$field" == "xPages" ]]; then - if [[ $valid == 0 ]]; then - echo "$info" | grep "^$field:" | sed -E 's/[^:]+:\s+(.*)/\1/g' | tr -d '\n'; - else - printf 0; - # exit 40; # TODO: null - fi - elif [[ -z "${field+x}" ]]; then - if [[ $valid == 0 ]]; then printf "true"; else printf "false"; fi - else - echo "$info" | grep "^$field:" | sed -E 's/[^:]+:\s+(.*)/\1/g' | tr -d '\n'; - fi -else - exit 40; -fi diff -r 9172bd97ae99 -r 6f15f18d2abf script-examples/__relpipe_in_filesystem_script_xpath --- a/script-examples/__relpipe_in_filesystem_script_xpath Mon Nov 11 14:42:13 2019 +0100 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,51 +0,0 @@ -#!/usr/bin/perl - -# Relational pipes -# Copyright © 2019 František Kučera (Frantovo.cz, GlobalCode.info) -# -# This program is free software: you can redistribute it and/or modify -# it under the terms of the GNU General Public License as published by -# the Free Software Foundation, version 3 of the License. -# -# This program is distributed in the hope that it will be useful, -# but WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -# GNU General Public License for more details. -# -# You should have received a copy of the GNU General Public License -# along with this program. If not, see . - -use strict; -use warnings; - -use XML::LibXML; # documentation: https://metacpan.org/pod/XML::LibXML - -if (@ARGV == 0) { - print "1\n"; - if ($ENV{type}) { print "$ENV{type}\n"; } else { print "string\n"; } -} else { - my $dom = XML::LibXML->new->parse_file($ARGV[0]); - my $xpath = XML::LibXML::XPathContext->new($dom); - - # You can add your favorite XML namespaces here: - # $xpath->registerNs('relpipe', 'tag:globalcode.info,2018:relpipe'); - # $xpath->registerNs('xhtml', 'http://www.w3.org/1999/xhtml'); - # $xpath->registerNs('svg', 'http://www.w3.org/2000/svg'); - # $xpath->registerNs('atom', 'http://www.w3.org/2005/Atom'); - # $xpath->registerNs('maven', 'http://maven.apache.org/POM/4.0.0'); - # - # Or set environmental variables: - # export xmlns_r='tag:globalcode.info,2018:relpipe' - - # Load XML namespaces from options: - # usage: --option 'env:xmlns_r' 'tag:globalcode.info,2018:relpipe' → r="tag:globalcode.info,2018:relpipe" - for my $name (keys %ENV) { - if ($name =~ /xmlns_(.*)/) { $xpath->registerNs($1, $ENV{$name}); } - } - - # Execute XPath and concatenate results (usually should be only one): - # usage: --option env:xpath '//r:name' - for my $value ($xpath->find($ENV{xpath})) { - print $value; - } -} diff -r 9172bd97ae99 -r 6f15f18d2abf src/AttributeFinder.h --- a/src/AttributeFinder.h Mon Nov 11 14:42:13 2019 +0100 +++ b/src/AttributeFinder.h Sat Jan 11 00:56:51 2020 +0100 @@ -53,7 +53,7 @@ */ virtual void writeEmptyField(RelationalWriter* writer, const RequestedField& field) { // TODO: better handling of null values (when null values are supported by the format specification) - for (AttributeMetadata m : toMetadata(field)) { + for (AttributeMetadata m : toMetadata(writer, field)) { switch (m.typeId) { case TypeId::BOOLEAN: writer->writeAttribute(L"false"); @@ -75,10 +75,11 @@ /** * Single requested fields might generate multiple attributes in the relation. * But usually it is 1:1. + * @param writer can be used for TypeId coversion from string_t * @param field requested field from the user (usually from CLI arguments) * @return attribute metadata to be used in the RelationalWriter.startRelation() */ - virtual vector toMetadata(const RequestedField& field) = 0; + virtual vector toMetadata(RelationalWriter* writer, const RequestedField& field) = 0; /** * Writing of the record for current file is starting. diff -r 9172bd97ae99 -r 6f15f18d2abf src/CLIParser.h --- a/src/CLIParser.h Mon Nov 11 14:42:13 2019 +0100 +++ b/src/CLIParser.h Sat Jan 11 00:56:51 2020 +0100 @@ -51,8 +51,7 @@ static const string_t OPTION_FILE; static const string_t OPTION_XATTR; - static const string_t OPTION_HASH; - static const string_t OPTION_SCRIPT; + static const string_t OPTION_EXEC; static const string_t OPTION_AS; static const string_t OPTION_OPTION; static const string_t OPTION_RELATION; @@ -69,7 +68,7 @@ for (int i = 0; i < arguments.size();) { string_t option = readNext(arguments, i); - if (option == CLIParser::OPTION_FILE || option == CLIParser::OPTION_XATTR || option == CLIParser::OPTION_HASH || option == CLIParser::OPTION_SCRIPT) { + if (option == CLIParser::OPTION_FILE || option == CLIParser::OPTION_XATTR || option == CLIParser::OPTION_EXEC) { addField(c, currentGroup, currentName, currentAliases, currentOptions); // previous field currentGroup = option.substr(2); // cut off -- currentName = readNext(arguments, i); @@ -105,6 +104,8 @@ // c.fields.push_back(RequestedField(RequestedField::GROUP_XATTR, L"user.xdg.origin.url")); } + for (int i = 0; i < c.fields.size(); i++) c.fields[i].id = i; + return c; } @@ -114,8 +115,7 @@ const string_t CLIParser::OPTION_FILE = L"--" + RequestedField::GROUP_FILE; const string_t CLIParser::OPTION_XATTR = L"--" + RequestedField::GROUP_XATTR; -const string_t CLIParser::OPTION_HASH = L"--" + RequestedField::GROUP_HASH; -const string_t CLIParser::OPTION_SCRIPT = L"--" + RequestedField::GROUP_SCRIPT; +const string_t CLIParser::OPTION_EXEC = L"--" + RequestedField::GROUP_EXEC; const string_t CLIParser::OPTION_AS = L"--as"; const string_t CLIParser::OPTION_OPTION = L"--option"; const string_t CLIParser::OPTION_RELATION = L"--relation"; diff -r 9172bd97ae99 -r 6f15f18d2abf src/CMakeLists.txt --- a/src/CMakeLists.txt Mon Nov 11 14:42:13 2019 +0100 +++ b/src/CMakeLists.txt Sat Jan 11 00:56:51 2020 +0100 @@ -29,6 +29,7 @@ # Executable output: add_executable( ${EXECUTABLE_FILE} + SubProcess.cpp relpipe-in-filesystem.cpp ) diff -r 9172bd97ae99 -r 6f15f18d2abf src/ExecAttributeFinder.h --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src/ExecAttributeFinder.h Sat Jan 11 00:56:51 2020 +0100 @@ -0,0 +1,139 @@ +/** + * Relational pipes + * Copyright © 2019 František Kučera (Frantovo.cz, GlobalCode.info) + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, version 3 of the License. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + */ +#pragma once + +#include +#include +#include +#include + +#include +#include +#include + +#include "RequestedField.h" +#include "SubProcess.h" +#include "AttributeFinder.h" +#include "ExecMsg.h" + +namespace relpipe { +namespace in { +namespace filesystem { + +namespace fs = std::filesystem; +using namespace relpipe::writer; + +class ExecAttributeFinder : public AttributeFinder { +private: + std::wstring_convert> convertor; // TODO: support also other encodings. + std::map> subProcesses; + std::map> cachedMetadata; + + string_t getExecCommand(const RequestedField& field) { + // TODO: move to another directory, exec, not script + use custom $PATH with no prefix + return SCRIPT_PREFIX + field.name; + } + +protected: + + virtual void writeFieldOfExistingFile(RelationalWriter* writer, const RequestedField& field) override { + // TODO: paralelize also over records → fork multiple processes and distribute records across them; then collect results (with a lock) + if (field.group == RequestedField::GROUP_EXEC) { + + subProcesses[field.id]->write({ExecMsg::INPUT_ATTRIBUTE, L"0", convertor.from_bytes(currentFileRaw), L"false"}); // index, value, isNull + subProcesses[field.id]->write({ExecMsg::WAITING_FOR_OUTPUT_ATTRIBUTES}); + + for (auto metadata : cachedMetadata[field.id]) { + SubProcess::Message m = subProcesses[field.id]->read(); + if (m.code == ExecMsg::OUTPUT_ATTRIBUTE) writer->writeAttribute(m.parameters[0]); + else throw RelpipeWriterException(L"Protocol violation from exec sub-process while reading: „" + metadata.attributeName + L"“. Expected OUTPUT_ATTRIBUTE but got: " + m.toString()); + } + + SubProcess::Message m = subProcesses[field.id]->read(); + if (m.code != ExecMsg::WAITING_FOR_INPUT_ATTRIBUTES) throw RelpipeWriterException(L"Protocol violation from exec sub-process. Expected WAITING_FOR_INPUT_ATTRIBUTES but got: " + m.toString()); + // TODO: generic protocol violation error messages / method for checking responses + } + } + +public: + + static const string_t SCRIPT_PREFIX; + + virtual vector toMetadata(RelationalWriter* writer, const RequestedField& field) override { + if (field.group == RequestedField::GROUP_EXEC) { + + if (cachedMetadata.count(field.id)) { + return cachedMetadata[field.id]; + } else { + + std::vector commandLine = {getExecCommand(field)}; + std::map environment; + + for (auto mn : ExecMsg::getMessageNames()) { + environment[L"EXEC_MSG_" + mn.second] = std::to_wstring(mn.first); + environment[L"EXEC_MSG_" + std::to_wstring(mn.first)] = mn.second; + } + + shared_ptr subProcess(SubProcess::create(commandLine, environment, false)); + subProcesses[field.id] = subProcess; + + string_t version = L"1"; + subProcess->write({ExecMsg::VERSION_SUPPORTED, version}); + subProcess->write({ExecMsg::WAITING_FOR_VERSION}); + SubProcess::Message versionMessage = subProcess->read(); + if (versionMessage.code == ExecMsg::VERSION_ACCEPTED && versionMessage.parameters[0] == version) { + subProcess->write({ExecMsg::RELATION_START}); + subProcess->write({ExecMsg::INPUT_ATTRIBUTE_METADATA, L"path", L"string"}); + for (string_t alias : field.getAliases()) subProcess->write({ExecMsg::OUTPUT_ATTRIBUTE_ALIAS, alias}); + for (int i = 0; i < field.options.size();) subProcess->write({ExecMsg::OPTION, field.options[i++], field.options[i++]}); + subProcess->write({ExecMsg::WAITING_FOR_OUTPUT_ATTRIBUTES_METADATA}); + + vector metadata; + while (true) { + SubProcess::Message m = subProcess->read(); + if (m.code == ExecMsg::OUTPUT_ATTRIBUTE_METADATA) metadata.push_back({m.parameters[0], writer->toTypeId(m.parameters[1])}); + else if (m.code == ExecMsg::WAITING_FOR_INPUT_ATTRIBUTES) break; + } + + cachedMetadata[field.id] = metadata; + return metadata; + } else { + throw RelpipeWriterException(L"Incompatible exec sub-process version or message: " + versionMessage.toString()); + } + } + } else { + return {}; + } + } + + virtual ~ExecAttributeFinder() override { + for (auto s : subProcesses) { + try { + s.second->write({ExecMsg::RELATION_END}); + s.second->wait(); + } catch (...) { + std::wcerr << L"Exception caught during closing sub-process #" + std::to_wstring(s.first) + L" and waiting for its end." << std::endl; + } + } + } +}; + +const relpipe::writer::string_t ExecAttributeFinder::SCRIPT_PREFIX = L"__relpipe_in_filesystem_script_"; + +} +} +} diff -r 9172bd97ae99 -r 6f15f18d2abf src/ExecMsg.h --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src/ExecMsg.h Sat Jan 11 00:56:51 2020 +0100 @@ -0,0 +1,86 @@ +// This file was generated from the specification. + +#include +#include + +namespace relpipe { +namespace in { +namespace filesystem { + +class ExecMsg { +public: + + static const int VERSION_SUPPORTED; + static const int WAITING_FOR_VERSION; + static const int VERSION_ACCEPTED; + static const int RELATION_START; + static const int INPUT_ATTRIBUTE_METADATA; + static const int OUTPUT_ATTRIBUTE_ALIAS; + static const int OPTION; + static const int COMPLETION_REQUEST; + static const int COMPLETION; + static const int COMPLETION_END; + static const int WAITING_FOR_OUTPUT_ATTRIBUTES_METADATA; + static const int OUTPUT_ATTRIBUTE_METADATA; + static const int WAITING_FOR_INPUT_ATTRIBUTES; + static const int INPUT_ATTRIBUTE; + static const int WAITING_FOR_OUTPUT_ATTRIBUTES; + static const int OUTPUT_ATTRIBUTE; + static const int EXECUTOR_ERROR; + static const int PROCESS_ERROR; + static const int PROCESS_WARNING; + static const int RELATION_END; + + static std::map getMessageNames() { + std::map m; + + m[VERSION_SUPPORTED] = L"VERSION_SUPPORTED"; + m[WAITING_FOR_VERSION] = L"WAITING_FOR_VERSION"; + m[VERSION_ACCEPTED] = L"VERSION_ACCEPTED"; + m[RELATION_START] = L"RELATION_START"; + m[INPUT_ATTRIBUTE_METADATA] = L"INPUT_ATTRIBUTE_METADATA"; + m[OUTPUT_ATTRIBUTE_ALIAS] = L"OUTPUT_ATTRIBUTE_ALIAS"; + m[OPTION] = L"OPTION"; + m[COMPLETION_REQUEST] = L"COMPLETION_REQUEST"; + m[COMPLETION] = L"COMPLETION"; + m[COMPLETION_END] = L"COMPLETION_END"; + m[WAITING_FOR_OUTPUT_ATTRIBUTES_METADATA] = L"WAITING_FOR_OUTPUT_ATTRIBUTES_METADATA"; + m[OUTPUT_ATTRIBUTE_METADATA] = L"OUTPUT_ATTRIBUTE_METADATA"; + m[WAITING_FOR_INPUT_ATTRIBUTES] = L"WAITING_FOR_INPUT_ATTRIBUTES"; + m[INPUT_ATTRIBUTE] = L"INPUT_ATTRIBUTE"; + m[WAITING_FOR_OUTPUT_ATTRIBUTES] = L"WAITING_FOR_OUTPUT_ATTRIBUTES"; + m[OUTPUT_ATTRIBUTE] = L"OUTPUT_ATTRIBUTE"; + m[EXECUTOR_ERROR] = L"EXECUTOR_ERROR"; + m[PROCESS_ERROR] = L"PROCESS_ERROR"; + m[PROCESS_WARNING] = L"PROCESS_WARNING"; + m[RELATION_END] = L"RELATION_END"; + + return m; + } + +}; + +const int ExecMsg::VERSION_SUPPORTED = 100; +const int ExecMsg::WAITING_FOR_VERSION = 101; +const int ExecMsg::VERSION_ACCEPTED = 102; +const int ExecMsg::RELATION_START = 103; +const int ExecMsg::INPUT_ATTRIBUTE_METADATA = 104; +const int ExecMsg::OUTPUT_ATTRIBUTE_ALIAS = 105; +const int ExecMsg::OPTION = 106; +const int ExecMsg::COMPLETION_REQUEST = 107; +const int ExecMsg::COMPLETION = 108; +const int ExecMsg::COMPLETION_END = 109; +const int ExecMsg::WAITING_FOR_OUTPUT_ATTRIBUTES_METADATA = 110; +const int ExecMsg::OUTPUT_ATTRIBUTE_METADATA = 111; +const int ExecMsg::WAITING_FOR_INPUT_ATTRIBUTES = 112; +const int ExecMsg::INPUT_ATTRIBUTE = 113; +const int ExecMsg::WAITING_FOR_OUTPUT_ATTRIBUTES = 114; +const int ExecMsg::OUTPUT_ATTRIBUTE = 115; +const int ExecMsg::EXECUTOR_ERROR = 116; +const int ExecMsg::PROCESS_ERROR = 117; +const int ExecMsg::PROCESS_WARNING = 118; +const int ExecMsg::RELATION_END = 120; + +} +} +} diff -r 9172bd97ae99 -r 6f15f18d2abf src/FileAttributeFinder.h --- a/src/FileAttributeFinder.h Mon Nov 11 14:42:13 2019 +0100 +++ b/src/FileAttributeFinder.h Sat Jan 11 00:56:51 2020 +0100 @@ -145,7 +145,7 @@ static const string_t FIELD_GROUP; static const string_t FIELD_CONTENT; - virtual vector toMetadata(const RequestedField& field) override { + virtual vector toMetadata(RelationalWriter* writer, const RequestedField& field) override { if (field.group == RequestedField::GROUP_FILE) { vector metadata; for (string_t alias : field.getAliases()) { diff -r 9172bd97ae99 -r 6f15f18d2abf src/FilesystemCommand.h --- a/src/FilesystemCommand.h Mon Nov 11 14:42:13 2019 +0100 +++ b/src/FilesystemCommand.h Sat Jan 11 00:56:51 2020 +0100 @@ -37,8 +37,7 @@ #include "AttributeFinder.h" #include "FileAttributeFinder.h" #include "XattrAttributeFinder.h" -#include "HashAttributeFinder.h" -#include "ScriptAttributeFinder.h" +#include "ExecAttributeFinder.h" namespace relpipe { namespace in { @@ -52,14 +51,12 @@ std::wstring_convert> convertor; // TODO: support also other encodings. FileAttributeFinder fileAttributeFinder; - HashAttributeFinder hashAttributeFinder; - ScriptAttributeFinder scriptAttributeFinder; + ExecAttributeFinder execAttributeFinder; XattrAttributeFinder xattrAttributeFinder; std::map attributeFinders{ {RequestedField::GROUP_FILE, &fileAttributeFinder}, - {RequestedField::GROUP_HASH, &hashAttributeFinder}, - {RequestedField::GROUP_SCRIPT, &scriptAttributeFinder}, + {RequestedField::GROUP_EXEC, &execAttributeFinder}, {RequestedField::GROUP_XATTR, &xattrAttributeFinder}}; void reset(std::stringstream& stream) { @@ -83,7 +80,7 @@ std::vector attributesMetadata; for (RequestedField field : configuration.fields) { AttributeFinder* finder = attributeFinders[field.group]; - if (finder) for (AttributeMetadata m : finder->toMetadata(field)) attributesMetadata.push_back(m); + if (finder) for (AttributeMetadata m : finder->toMetadata(writer.get(), field)) attributesMetadata.push_back(m); else throw RelpipeWriterException(L"Unsupported field group: " + field.group); } diff -r 9172bd97ae99 -r 6f15f18d2abf src/HashAttributeFinder.h --- a/src/HashAttributeFinder.h Mon Nov 11 14:42:13 2019 +0100 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,102 +0,0 @@ -/** - * Relational pipes - * Copyright © 2019 František Kučera (Frantovo.cz, GlobalCode.info) - * - * This program is free software: you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation, version 3 of the License. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program. If not, see . - */ -#pragma once - -#include -#include - -#include -#include -#include -#include - -#include "RequestedField.h" -#include "SystemProcess.h" -#include "AttributeFinder.h" - -namespace relpipe { -namespace in { -namespace filesystem { - -namespace fs = std::filesystem; -using namespace relpipe::writer; - -class HashAttributeFinder : public AttributeFinder { -private: - std::wstring_convert> convertor; // TODO: support also other encodings. - - std::wregex standardHashPattern = std::wregex(L"^([a-f0-9]+) .*"); - - string_t getStandardHash(const fs::path& file, const std::string& hashCommand) { - try { - SystemProcess process({hashCommand, currentFileRaw}); - string_t output = convertor.from_bytes(process.execute()); - - std::wsmatch match; - if (regex_search(output, match, standardHashPattern)) return match[1]; - else throw RelpipeWriterException(L"Hash command returned wrong output: " + output); - } catch (relpipe::cli::RelpipeCLIException& e) { - // TODO: print warnings? - // TODO: do not fork/exec if the file is not readable - return L""; - } - } -protected: - - virtual void writeFieldOfExistingFile(RelationalWriter* writer, const RequestedField& field) override { - // TODO: paralelization? - // TODO: other formats, not only hex, but also base64 or binary - if (field.group == RequestedField::GROUP_HASH) { - for (string_t alias : field.getAliases()) { - if (field.name == FIELD_MD5) writer->writeAttribute(getStandardHash(currentFile, "md5sum")); - else if (field.name == FIELD_SHA1) writer->writeAttribute(getStandardHash(currentFile, "sha1sum")); - else if (field.name == FIELD_SHA256) writer->writeAttribute(getStandardHash(currentFile, "sha256sum")); - else if (field.name == FIELD_SHA512) writer->writeAttribute(getStandardHash(currentFile, "sha512sum")); - else throw RelpipeWriterException(L"Unsupported field name in HashAttributeFinder: " + field.name); - } - } - } - -public: - - static const string_t FIELD_MD5; - static const string_t FIELD_SHA1; - static const string_t FIELD_SHA256; - static const string_t FIELD_SHA512; - - virtual vector toMetadata(const RequestedField& field) override { - if (field.group == RequestedField::GROUP_HASH) { - vector metadata; - for (string_t alias : field.getAliases()) metadata.push_back(AttributeMetadata{alias, TypeId::STRING}); - return metadata; - } else { - return {}; - } - } - - virtual ~HashAttributeFinder() override { - } -}; - -const string_t HashAttributeFinder::FIELD_MD5 = L"md5"; -const string_t HashAttributeFinder::FIELD_SHA1 = L"sha1"; -const string_t HashAttributeFinder::FIELD_SHA256 = L"sha256"; -const string_t HashAttributeFinder::FIELD_SHA512 = L"sha512"; - -} -} -} diff -r 9172bd97ae99 -r 6f15f18d2abf src/RequestedField.h --- a/src/RequestedField.h Mon Nov 11 14:42:13 2019 +0100 +++ b/src/RequestedField.h Sat Jan 11 00:56:51 2020 +0100 @@ -30,8 +30,8 @@ public: static const string_t GROUP_FILE; static const string_t GROUP_XATTR; - static const string_t GROUP_HASH; - static const string_t GROUP_SCRIPT; + static const string_t GROUP_EXEC; + integer_t id; string_t group; string_t name; std::vector aliases; @@ -58,8 +58,7 @@ const string_t RequestedField::GROUP_FILE = L"file"; const string_t RequestedField::GROUP_XATTR = L"xattr"; -const string_t RequestedField::GROUP_HASH = L"hash"; -const string_t RequestedField::GROUP_SCRIPT = L"script"; +const string_t RequestedField::GROUP_EXEC = L"exec"; } } diff -r 9172bd97ae99 -r 6f15f18d2abf src/ScriptAttributeFinder.h --- a/src/ScriptAttributeFinder.h Mon Nov 11 14:42:13 2019 +0100 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,123 +0,0 @@ -/** - * Relational pipes - * Copyright © 2019 František Kučera (Frantovo.cz, GlobalCode.info) - * - * This program is free software: you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation, version 3 of the License. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program. If not, see . - */ -#pragma once - -#include -#include - -#include -#include -#include -#include - -#include "RequestedField.h" -#include "SystemProcess.h" -#include "AttributeFinder.h" - -namespace relpipe { -namespace in { -namespace filesystem { - -namespace fs = std::filesystem; -using namespace relpipe::writer; - -class ScriptAttributeFinder : public AttributeFinder { -private: - std::wstring_convert> convertor; // TODO: support also other encodings. - - std::string getScriptCommand(const RequestedField& field) { - return SCRIPT_PREFIX + convertor.to_bytes(field.name); - } - - std::vector toEnvironmentalVariables(const std::vector& vector) { - std::vector result; - for (int i = 0; i < vector.size();) { - string_t name = vector[i++]; - string_t value = vector[i++]; - if (name.rfind(L"env:" == 0)) { - result.push_back(convertor.to_bytes(name.substr(4))); - result.push_back(convertor.to_bytes(value)); - } - } - return result; - } - - TypeId getAttributeType(const RequestedField& field, const string_t& alias) { - // TODO: put latest supported version in the environmental variable - // TODO: put alias in the environmental variable - SystemProcess process({getScriptCommand(field)}, toEnvironmentalVariables(field.options)); - std::string output = process.execute(); - std::regex pattern("(.*)\\n(.*)\\n"); - std::smatch match; - std::regex_match(output, match, pattern); - if (match.ready() && match[1] == "1") { - // TODO: move to a common library - if (match[2] == "boolean") return TypeId::BOOLEAN; - if (match[2] == "integer") return TypeId::INTEGER; - if (match[2] == "string") return TypeId::STRING; - throw RelpipeWriterException(L"Unsupported script data type – field: „" + field.name + L"“ type: „" + convertor.from_bytes(match[2]) + L"“"); - } else { - throw RelpipeWriterException(L"Unsupported script version – field: „" + field.name + L"“ output: „" + convertor.from_bytes(output) + L"“"); - } - - } - - string_t getScriptOutput(const fs::path& file, const RequestedField& field, const string_t& alias) { - try { - // TODO: put alias in the environmental variable - SystemProcess process({getScriptCommand(field), currentFileRaw}, toEnvironmentalVariables(field.options)); - return convertor.from_bytes(process.execute()); - } catch (relpipe::cli::RelpipeCLIException& e) { - // TODO: print warnings? - // TODO: do not fork/exec if the file is not readable - return L""; - } - } -protected: - - virtual void writeFieldOfExistingFile(RelationalWriter* writer, const RequestedField& field) override { - // TODO: paralelization? - if (field.group == RequestedField::GROUP_SCRIPT) { - for (string_t alias : field.getAliases()) { - writer->writeAttribute(getScriptOutput(currentFile, field, alias)); - } - } - } - -public: - - static const std::string SCRIPT_PREFIX; - - virtual vector toMetadata(const RequestedField& field) override { - if (field.group == RequestedField::GROUP_SCRIPT) { - vector metadata; - for (string_t alias : field.getAliases()) metadata.push_back(AttributeMetadata{alias, getAttributeType(field, alias)}); - return metadata; - } else { - return {}; - } - } - - virtual ~ScriptAttributeFinder() override { - } -}; - -const std::string ScriptAttributeFinder::SCRIPT_PREFIX = "__relpipe_in_filesystem_script_"; - -} -} -} diff -r 9172bd97ae99 -r 6f15f18d2abf src/SubProcess.cpp --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src/SubProcess.cpp Sat Jan 11 00:56:51 2020 +0100 @@ -0,0 +1,186 @@ +/** + * Relational pipes + * Copyright © 2020 František Kučera (Frantovo.cz, GlobalCode.info) + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, version 3 of the License. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + */ + +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "SubProcess.h" + +using namespace relpipe::writer; + +/** + * TODO: have a separate side process for forking new processes. + */ +class SubProcessImpl : public SubProcess { +private: + __pid_t subPid; + std::istream subOutputReader; + std::ostream subInputWriter; + __gnu_cxx::stdio_filebuf subOutputReaderBuffer; + __gnu_cxx::stdio_filebuf subInputWriterBuffer; + static const char SEPARATOR = '\0'; + + std::wstring_convert < std::codecvt_utf8> convertor; // TODO: support also other encodings. Or use always UTF-8 for communication with subprocesses. + + int readInt() { + return std::stoi(readString()); + } + + string_t readString() { + std::stringstream s; + for (char ch; subOutputReader.read(&ch, 1).good() && ch != SEPARATOR;) s.put(ch); + return convertor.from_bytes(s.str()); + } + + void write(string_t s) { + subInputWriter << convertor.to_bytes(s).c_str(); + subInputWriter.put(SEPARATOR); + subInputWriter.flush(); + if (subInputWriter.bad()) throw SubProcess::Exception(L"Unable to write to sub-process."); + } + + void write(int i) { + write(std::to_wstring(i)); + } + +public: + + /** + * TODO: move to a common library (copied from the AWK module) + * @param args + */ + static void execp(const std::vector& args) { + const char** a = new const char*[args.size() + 1]; + for (size_t i = 0; i < args.size(); i++) a[i] = args[i].c_str(); + a[args.size()] = nullptr; + + execvp(a[0], (char*const*) a); + + delete[] a; + throw SubProcess::Exception(L"Unable to do execvp()."); + } + + /** + * TODO: move to a common library (copied from the AWK module) + * @param readerFD + * @param writerFD + */ + static void createPipe(int& readerFD, int& writerFD) { + int fds[2]; + int result = pipe(fds); + readerFD = fds[0]; + writerFD = fds[1]; + if (result < 0) throw SubProcess::Exception(L"Unable to create a pipe."); + } + + /** + * TODO: move to a common library (copied from the AWK module) + */ + static void redirectFD(int oldfd, int newfd) { + int result = dup2(oldfd, newfd); + if (result < 0) throw SubProcess::Exception(L"Unable redirect FD."); + } + + /** + * TODO: move to a common library (copied from the AWK module) + */ + static void closeOrThrow(int fd) { + int error = close(fd); + if (error) throw SubProcess::Exception(L"Unable to close FD: " + std::to_wstring(fd) + L" from PID: " + std::to_wstring(getpid())); + } + + static SubProcess* createSubProcess(std::vector commandLine, std::map environment, bool dropErrorOutput) { + int subInputReaderFD; + int subInputWriterFD; + int subOutputReaderFD; + int subOutputWriterFD; + + createPipe(subInputReaderFD, subInputWriterFD); + createPipe(subOutputReaderFD, subOutputWriterFD); + + __pid_t subPid = fork(); + + if (subPid < 0) { + throw SubProcess::Exception(L"Unable to fork the hash process."); + } else if (subPid == 0) { + // Child process + redirectFD(subInputReaderFD, STDIN_FILENO); + redirectFD(subOutputWriterFD, STDOUT_FILENO); + closeOrThrow(subInputWriterFD); + closeOrThrow(subOutputReaderFD); + if (dropErrorOutput) redirectFD(open("/dev/null", O_RDWR), STDERR_FILENO); + + std::wstring_convert < std::codecvt_utf8> convertor; // TODO: support also other encodings. Or use always UTF-8 for communication with subprocesses. + for (auto const & entry : environment) setenv(convertor.to_bytes(entry.first).c_str(), convertor.to_bytes(entry.second).c_str(), true); + std::vector commandLineRaw; + for (string_t s : commandLine) commandLineRaw.push_back(convertor.to_bytes(s)); + execp(commandLineRaw); + throw SubProcess::Exception(L"Unexpected exception after execp(commandLineRaw)"); // will never happen, look inside the method above (throws exception) + } else { + // Parent process + closeOrThrow(subInputReaderFD); + closeOrThrow(subOutputWriterFD); + return new SubProcessImpl(subPid, subInputWriterFD, subOutputReaderFD); + } + } + + SubProcessImpl(__pid_t subPid, int subInputWriterFD, int subOutputReaderFD) : + subPid(subPid), + subOutputReaderBuffer(__gnu_cxx::stdio_filebuf(subOutputReaderFD, std::ios::in)), + subInputWriterBuffer(__gnu_cxx::stdio_filebuf(subInputWriterFD, std::ios::out)), + subOutputReader(&subOutputReaderBuffer), + subInputWriter(&subInputWriterBuffer) { + } + + virtual ~SubProcessImpl() { + } + + SubProcess::Message read() { + Message m; + m.code = readInt(); + int count = readInt(); + for (int i = 0; i < count; i++) m.parameters.push_back(readString()); + return m; + } + + void write(Message m) { + write(m.code); + write(m.parameters.size()); + for (auto p : m.parameters) write(p); + } + + int wait() { + closeOrThrow(subInputWriterBuffer.fd()); + closeOrThrow(subOutputReaderBuffer.fd()); + int status = -1; + ::waitpid(subPid, &status, 0); + return status; + } + +}; + +SubProcess* SubProcess::create(std::vector commandLine, std::map environment, bool dropErrorOutput) { + return SubProcessImpl::createSubProcess(commandLine, environment, dropErrorOutput); +} diff -r 9172bd97ae99 -r 6f15f18d2abf src/SubProcess.h --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src/SubProcess.h Sat Jan 11 00:56:51 2020 +0100 @@ -0,0 +1,84 @@ +/** + * Relational pipes + * Copyright © 2020 František Kučera (Frantovo.cz, GlobalCode.info) + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, version 3 of the License. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + */ +#pragma once + +#include +#include +#include +#include + +#include +#include + +/** + * TODO: move to a separate library → can be used later also in relpipe-tr-exec + */ +class SubProcess { +public: + + class Message { + public: + int code; + std::vector parameters; + + Message() { + } + + Message(int code) : code(code) { + } + + Message(int code, std::vector parameters) : code(code), parameters(parameters) { + } + + Message(int code, relpipe::writer::string_t p1) : code(code), parameters({p1}) { + } + + Message(int code, relpipe::writer::string_t p1, relpipe::writer::string_t p2) : code(code), parameters({p1, p2}) { + } + + Message(int code, relpipe::writer::string_t p1, relpipe::writer::string_t p2, relpipe::writer::string_t p3) : code(code), parameters({p1, p2, p3}) { + } + + relpipe::writer::string_t toString() { + std::wstringstream s; + s << L"Message(code: " << code << L", parameters: "; + for (int i = 0; i < parameters.size(); i++) { + if (i < parameters.size() - 1) s << parameters[i] << L","; + else s << parameters[i]; + } + s << L")"; + return s.str(); + } + }; + + class Exception : public relpipe::writer::RelpipeWriterException { + public: + + Exception(std::wstring message) : relpipe::writer::RelpipeWriterException(message) { + } + + }; + + static SubProcess* create(std::vector commandLine, std::map environment, bool dropErrorOutput = true); + + virtual Message read() = 0; + virtual void write(Message message) = 0; + virtual int wait() = 0; + + virtual ~SubProcess() = default; + +}; diff -r 9172bd97ae99 -r 6f15f18d2abf src/SystemProcess.h --- a/src/SystemProcess.h Mon Nov 11 14:42:13 2019 +0100 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,145 +0,0 @@ -/** - * Relational pipes - * Copyright © 2019 František Kučera (Frantovo.cz, GlobalCode.info) - * - * This program is free software: you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation, version 3 of the License. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program. If not, see . - */ -#pragma once - -#include -#include -#include -#include -#include -#include - -#include -#include - -namespace relpipe { -namespace in { -namespace filesystem { - -/** - * Simple wrapper for a system process (fork+exec) that captures and returns just the STDOUT. - */ -class SystemProcess { -private: - /** - * the command + its arguments - */ - std::vector commandLine; - std::vector environment; - int nullFile = -1; - - /** - * TODO: move to a common library (copied from the AWK module) - * @param args - */ - void execp(const std::vector& args) { - const char** a = new const char*[args.size() + 1]; - for (size_t i = 0; i < args.size(); i++) a[i] = args[i].c_str(); - a[args.size()] = nullptr; - - execvp(a[0], (char*const*) a); - - delete[] a; - throw relpipe::cli::RelpipeCLIException(L"Unable to do execvp().", relpipe::cli::CLI::EXIT_CODE_UNEXPECTED_ERROR); // TODO: better exception? - } - - /** - * TODO: move to a common library (copied from the AWK module) - * @param readerFD - * @param writerFD - */ - void createPipe(int& readerFD, int& writerFD) { - int fds[2]; - int result = pipe(fds); - readerFD = fds[0]; - writerFD = fds[1]; - if (result < 0) throw relpipe::cli::RelpipeCLIException(L"Unable to create a pipe.", relpipe::cli::CLI::EXIT_CODE_UNEXPECTED_ERROR); // TODO: better exception? - } - - /** - * TODO: move to a common library (copied from the AWK module) - */ - void redirectFD(int oldfd, int newfd) { - int result = dup2(oldfd, newfd); - if (result < 0) throw relpipe::cli::RelpipeCLIException(L"Unable redirect FD.", relpipe::cli::CLI::EXIT_CODE_UNEXPECTED_ERROR); // TODO: better exception? - } - - /** - * TODO: move to a common library (copied from the AWK module) - */ - void closeOrThrow(int fd) { - int error = close(fd); - if (error) throw relpipe::cli::RelpipeCLIException(L"Unable to close FD: " + to_wstring(fd) + L" from PID: " + to_wstring(getpid()), relpipe::cli::CLI::EXIT_CODE_UNEXPECTED_ERROR); // TODO: better exception? - } - -public: - - SystemProcess(const std::vector& commandLine, const std::vector& environment = {}) : commandLine(commandLine), environment(environment) { - nullFile = open("/dev/null", O_RDWR); - } - - virtual ~SystemProcess() { - close(nullFile); - } - - std::string execute() { - - std::stringstream result; - - // FIXME: different kinds of exception or return the exit code (now it enters infinite loop if the execp() fails) - // TODO: rename (not specific to hash) - int hashReaderFD; - int hashWriterFD; - createPipe(hashReaderFD, hashWriterFD); - - __pid_t hashPid = fork(); - - if (hashPid < 0) { - throw relpipe::cli::RelpipeCLIException(L"Unable to fork the hash process.", relpipe::cli::CLI::EXIT_CODE_UNEXPECTED_ERROR); // TODO: better exception? - } else if (hashPid == 0) { - // Child process - closeOrThrow(hashReaderFD); - redirectFD(nullFile, STDIN_FILENO); - redirectFD(nullFile, STDERR_FILENO); - redirectFD(hashWriterFD, STDOUT_FILENO); - for (int i = 0; i < environment.size();) { - std::string name = environment[i++]; - std::string value = environment[i++]; - setenv(name.c_str(), value.c_str(), true); - } - execp(commandLine); - } else { - // Parent process - closeOrThrow(hashWriterFD); - - __gnu_cxx::stdio_filebuf hashReaderBuffer(hashReaderFD, std::ios::in); - std::istream hashReader(&hashReaderBuffer); - - for (char ch; hashReader.read(&ch, 1).good();) result.put(ch); - - int waitError; - __pid_t waitPID = wait(&waitError); - if (waitError) throw relpipe::cli::RelpipeCLIException(L"The child process returned an error exit code.", relpipe::cli::CLI::EXIT_CODE_UNEXPECTED_ERROR); // TODO: better exception? - } - - return result.str(); - } -}; - -} -} -} diff -r 9172bd97ae99 -r 6f15f18d2abf src/XattrAttributeFinder.h --- a/src/XattrAttributeFinder.h Mon Nov 11 14:42:13 2019 +0100 +++ b/src/XattrAttributeFinder.h Sat Jan 11 00:56:51 2020 +0100 @@ -62,7 +62,7 @@ public: - virtual vector toMetadata(const RequestedField& field) override { + virtual vector toMetadata(RelationalWriter* writer, const RequestedField& field) override { if (field.group == RequestedField::GROUP_XATTR) { vector metadata; for (string_t alias : field.getAliases()) metadata.push_back(AttributeMetadata{alias, TypeId::STRING});