Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

CogStack Pipeline application can be run in different ways. It can be either run locally either:

  • Locally as a standalone Java application
or it can be run
  • ,
  • Run inside a Docker container (possibly, which will be deployed as a microservice inside an ecosystem).

The former way of running CogStack is highly recommended one and has been extensively covered by multiple examples in the Examples section.


Note
titleNote

Please note that to run a sample CogStack Pipeline application job it 's is also essential required to have a CogStack configuration file available defining the used properties, pipeline components, data processing, etc. Please refer to /wiki/spaces/COGEN/pages/37945560 part for a detailed description of available properties.



Panel
titleOn this page :

Table of Contents
maxLevel2






Anchor
run-

local

standalone
run-

local

standalone
Running as a standalone app

Prerequisites

Java Runtime Environment

CogStack Pipeline requires Java SE Runtime Environment in version >= 811.0 to be present in the system. It is usually recommended to use the official The most commonly used JDK distributions are:

JDK. However, the OpenJDK variant of Java Runtime Environment should also work here


Note
titleNote

Please note that with the change of licensing from Oracle coming with the new version 11 (requiring commercial license for production for support) we emphasise using the OpenJDK variant. More information about the licensing can be read >here<.

In our Docker images we are also now using a base image with OpenJDK.

External applications

There are some additional, external applications that selected components of CogStack Pipeline use when processing data. They need to be installed on the system prior running CogStack. These are:

  • TesseractOCR – for extracting text from images in version >= 4.0,
  • Image Magick – for performing conversion between image formats.


Note

Running

CogStack Pipeline is run as a command-line application – just type:

java [parameters] -jar cogstack-*.jar <directory>

where <directory> specifies the directory where the CogStack configuration file(s) are kept and which will be parsed by CogStack Pipeline application. This is the only one obligatory parameter to provide.

Moreover, CogStack Pipeline provides a number of [optional] parameters:

  • -DLOG_LEVEL=<level> (default: INFO ; available: DEBUG | INFO | ERROR) – specifies the logging verbosity level of the displayed to standard output,
  • -DLOG_FILE_NAME=<name> – specifies the filename where the application logs will be stored (in HTML format),
  • -DFILE_LOG_LEVEL=<level> (default: INFO available: DEBUG | INFO | ERROR) – logging verbosity level of the displayed to the file.
    titleNote

    Tesseract in version 4.0 introduced significant improvements in the quality of OCR process, hence CogStack in version 1.3.0 was also updated to use it. However, please note that on some older distributions of Debian / Ubuntu Tesseract may need to be installed manually or compiled from scratch. For more information, please refer to the official Tesseract wiki.

    Running locally

    Please see below: Running the pipeline.





    Anchor
    run-docker
    run-docker
    Running as a containerised app

    CogStack Pipeline application can be also run inside the container, using the official Docker image available from the official cogstacksystems Docker Hub. This is the highly recommended method to run CogStack Pipeline. Docker can provide lightweight virtualisation of a variety of microservices that CogStack makes use of. Hence, when coupled with the microservice orchestration docker compose technology, all of the components required to use CogStack can be set up with a few simple commands.

    There are two images available to use: cogstacksystems/cogstack-pipeline:latest (stable) and cogstacksystems/cogstack-pipeline:dev-latest (development) – see: Building CogStack for more information.

    The Dockerfile used  used to build both images is available in the main CogStack pipeline directory.

    Info
    titleInfo

    The base image used by CogStack Pipeline is OpenJDK JRE 11.


    Prerequisites

    The only one prerequisite is to have the Docker installed on the system in version >= 1.13.

    Running

    CogStack Pipeline can be run either as a single container or as a part of ecosystem communicating with other microservices.

    Using docker run

    To run CogStack Pipeline inside a single container using Docker one can type:

    docker run -it cogstacksystems/cogstack-pipeline:latest /bin/bash

    This which will launch the CogStack container and spawn a bash console. From the console, one can launch CogStack Pipeline as explained in Running locallypipeline.

    Using docker-compose 

    Running CogStack Pipeline as a container within a configured stack of microservices using Docker Compose is based on the provided microservices configuration file (Docker Compose file, in YAML format). Multiple sample configurations have been covered in the Examples part in the documentation.

    For example, using the docker-compose.yml file from Example 2, CogStack Pipeline service has been defined as:

    cogstack-pipeline:
      image: cogstacksystems/cogstack-pipeline:latest
      volumes:
        - ./cogstack:/usr/src/docker-cogstack/cogstack/cogstack_conf:ro
      environment:
        - SERVICES_USED=cogstack-job-repo:5432,samples-db:5432,elasticsearch-1:9200
        - LOG_LEVEL=info

        - FILE_LOG_LEVEL=off
      depends_on:
        - samples- pgsamplesdb
        - cogstack-job- postgresrepo
        - elasticsearch-1
      command: /cogstack/run_pipeline.sh /cogstack/cogstack-*.jar /cogstack/job_config

    It uses the latest version of cogstack-pipeline image from the Docker hub. It also specifies the mapping of the directories from the local machine ./cogstack directory to the host's directory /usr/src/docker-cogstack/cogstack/cogstack_conf config (there usually reside CogStack Pipeline configuration file(s)). When deployed, it will launch CogStack Pipeline application through run_pipeline.sh script and process the data according to the pipeline configuration file residing in the previously mounted /usr/src/docker-cogstack/cogstack/cogstack_conf directory config directory on the host.

    The run_pipeline.sh script is just a helper script that will launch pipeline component prior awaiting for services become available as specified by SERVICES_USED . However, the pipeline can be also run as specified in Running the pipeline part.

    To deploy the CogStack Pipeline application according to the specified microservices configuration and running as one of them, one only needs to type in the directory with the YAML file:

    docker-compose up

    For more examples with deploying the services, please see Examples part.



    Anchor
    run-pipeline
    run-pipeline
    Running the pipeline

    CogStack Pipeline is run as a command-line application – to run it, just type:

    java [parameters] -jar cogstack-*.jar <directory>

    where <directory> specifies the directory where the CogStack configuration file(s) are kept and which will be parsed by CogStack Pipeline application. This is the only one obligatory parameter to provide.

    Moreover, CogStack Pipeline provides a number of [optional] parameters:

    • -DLOG_LEVEL=<level> (default: INFO ; available: DEBUG | INFO | ERROR) – specifies the logging verbosity level of the displayed to standard output,
    • -DLOG_FILE_NAME=<name> – specifies the filename where the application logs will be stored (in HTML format),
    • -DFILE_LOG_LEVEL=<level> (default: INFO available: DEBUG | INFO | ERROR) – logging verbosity level of the displayed to the file.


    For a more detailed description of available properties please refer to /wiki/spaces/COGEN/pages/37945560 page. Moreover, there are multiple Examples available with sample job configuration.