Anchor | ||
---|---|---|
|
|
|
Prerequisites
Java Runtime Environment
CogStack Pipeline requires Java SE Runtime Environment in version >= 811.0 to be present in the system. It is usually recommended to use the official The most commonly used JDK distributions are:
JDK. However, the OpenJDK variant of Java Runtime Environment should also work hereNote | ||
---|---|---|
| ||
Please note that with the change of licensing from Oracle coming with the new version 11 (requiring commercial license for production for support) we emphasise using the OpenJDK variant. More information about the licensing can be read >here<. In our Docker images we are also now using a base image with OpenJDK. |
External applications
There are some additional, external applications that selected components of CogStack Pipeline use when processing data. They need to be installed on the system prior running CogStack. These are:
- TesseractOCR – for extracting text from images in version >= 4.0,
- Image Magick – for performing conversion between image formats.
Note |
---|
Running
CogStack Pipeline is run as a command-line application – just type:
java[parameters]
-jar cogstack-*.jar <directory>
where <directory>
specifies the directory where the CogStack configuration file(s) are kept and which will be parsed by CogStack Pipeline application. This is the only one obligatory parameter to provide.
Moreover, CogStack Pipeline provides a number of [optional]
parameters:
-DLOG_LEVEL=<level>
(default: INFO ;
available: DEBUG | INFO | ERROR
) – specifies the logging verbosity level of the displayed to standard output,-DLOG_FILE_NAME=<name>
– specifies the filename where the application logs will be stored (in HTML format),-DFILE_LOG_LEVEL=<level> (default: INFO ;
available: DEBUG | INFO | ERROR
)
– logging verbosity level of the displayed to the file.
| ||
Tesseract in version 4.0 introduced significant improvements in the quality of OCR process, hence CogStack in version 1.3.0 was also updated to use it. However, please note that on some older distributions of Debian / Ubuntu Tesseract may need to be installed manually or compiled from scratch. For more information, please refer to the official Tesseract wiki. |
Running locally
Please see below: Running the pipeline.
Anchor | ||||
---|---|---|---|---|
|
CogStack Pipeline application can be also run inside the container, using the official Docker image available from the official cogstacksystems Docker Hub. This is the highly recommended method to run CogStack Pipeline. Docker can provide lightweight virtualisation of a variety of microservices that CogStack makes use of. Hence, when coupled with the microservice orchestration docker compose technology, all of the components required to use CogStack can be set up with a few simple commands.
There are two images available to use: cogstacksystems/cogstack-pipeline:latest
(stable) and cogstacksystems/cogstack-pipeline:dev-latest
(development) – see: Building CogStack for more information.
The Dockerfile
used used to build both images is available in the main CogStack pipeline directory.
Info | ||
---|---|---|
| ||
The base image used by CogStack Pipeline is OpenJDK JRE 11. |
Prerequisites
The only one prerequisite is to have the Docker installed on the system in version >= 1.13.
Running
CogStack Pipeline can be run either as a single container or as a part of ecosystem communicating with other microservices.
Using docker run
To run CogStack Pipeline inside a single container using Docker one can type:
docker run -it cogstacksystems/cogstack-pipeline:latest /bin/bash
This which will launch the CogStack container and spawn a bash
console. From the console, one can launch CogStack Pipeline as explained in Running locallypipeline.
Using docker-compose
Running CogStack Pipeline as a container within a configured stack of microservices using Docker Compose is based on the provided microservices configuration file (Docker Compose file, in YAML format). Multiple sample configurations have been covered in the Examples part in the documentation.
For example, using the docker-compose.yml
file from Example 2, CogStack Pipeline service has been defined as:
cogstack-pipeline:
image: cogstacksystems/cogstack-pipeline:latest
volumes:
- ./cogstack:/usr/src/docker-cogstack/cogstack/cogstack_conf:ro
environment:
- SERVICES_USED=cogstack-job-repo:5432,samples-db:5432,elasticsearch-1:9200
- LOG_LEVEL=info- FILE_LOG_LEVEL=off
depends_on:
- samples- pgsamplesdb
- cogstack-job- postgresrepo
- elasticsearch-1
command: /cogstack/run_pipeline.sh /cogstack/cogstack-*.jar /cogstack/job_config
It uses the latest
version of cogstack-pipeline
image from the Docker hub. It also specifies the mapping of the directories from the local machine ./cogstack
directory to the host's directory /usr/src/docker-cogstack/cogstack/cogstack_conf config
(there usually reside CogStack Pipeline configuration file(s)). When deployed, it will launch CogStack Pipeline application through run_pipeline.sh
script and process the data according to the pipeline configuration file residing in the previously mounted /usr/src/docker-cogstack/cogstack/cogstack_conf
directory config
directory on the host.
The run_pipeline.sh
script is just a helper script that will launch pipeline component prior awaiting for services become available as specified by SERVICES_USED
. However, the pipeline can be also run as specified in Running the pipeline part.
To deploy the CogStack Pipeline application according to the specified microservices configuration and running as one of them, one only needs to type in the directory with the YAML file:
docker-compose up
For more examples with deploying the services, please see Examples part.
Anchor | ||||
---|---|---|---|---|
|
CogStack Pipeline is run as a command-line application – to run it, just type:
java[parameters]
-jar cogstack-*.jar <directory>
where <directory>
specifies the directory where the CogStack configuration file(s) are kept and which will be parsed by CogStack Pipeline application. This is the only one obligatory parameter to provide.
Moreover, CogStack Pipeline provides a number of [optional]
parameters:
-DLOG_LEVEL=<level>
(default:INFO ;
available:DEBUG | INFO | ERROR
) – specifies the logging verbosity level of the displayed to standard output,-DLOG_FILE_NAME=<name>
– specifies the filename where the application logs will be stored (in HTML format),-DFILE_LOG_LEVEL=<level> (default: INFO
;
available:DEBUG | INFO | ERROR
)
– logging verbosity level of the displayed to the file.
For a more detailed description of available properties please refer to /wiki/spaces/COGEN/pages/37945560 page. Moreover, there are multiple Examples available with sample job configuration.