CLJ Central Logging and Journalling

An Architecture Blueprint for a Central Logging System

Introduction

Logging is mostly treated as a local affair: Of an application or solution, a team, or even a single developer, maybe a group thereof. But the increasing complexity of software systems increases the effort to draw the right conclusions from a large heap of heterogeneous logging records.

The trend from large monolithic application building blocks to smaller — but therefore intensely interconnected — software components is present. Therefore s IT Solutions AT (IT subsidiary of Erste Bank and Sparkassen in Austria, https://www.s-itsolutions.at/) started a project to care for a central logging and journalling data lake.
The architectural pattern here is a description along the line of this "Central Logging & Journalling" solution/application of sIT. It is targeted not at smaller systems but tries to deal with enterprises and larger IT landscapes.

So, if you feel your path to wisdom by reading logs looks like this, you’re a happy developer already.

forest

But if it looks a bit more like this, further reading could maybe help you.

jungle

Goals

Logging is not an end in itself — it enables many use-cases that can be grouped into four partitions:

[A]. Support: Find out about what the system did in its runtime, in order to detect the source of problems or give information to other stakeholders. Mostly use-cases here investigate in exceptional program behaviour.

[B]. Compliance: The number of regulatory use-cases increases; the run-time behaviour and intermediate data of software must be documented, often for many years.

[C]. Monitoring & Alerting: When a stream of logging data exists, it is self-evident to use this stream also to find out about the system state as well as problems and report those via multiple channels.

[D]. Analytics & Intelligence: Sophisticated tools allow data mining, BI etc. to find out ways to improve the business, may it be by exploring customer behaviour, may it by predicting operations problems, or may it be something we don’t even dream of yet.

Use-case groups
Support [A]Compliance [B]Monitoring & Alerting [C]Analytics & Intelligence [D]

* Customer Care
* Issue research
→ Access Security
→ Searchable, neartime

* Regulatory queries → Long term
→ Safe data store
→ Ideally certified
→ Unfrequent queries

* Stream analysis
* Alerting endpoints
→ Needs rules
→ High performing

* Statistics
* Big Data Analysis
* Machine Learning
* Predictive Analysis
→ Highly specialized toolset

Use-case groups

Architecture

A possible architecture could make use of the following building blocks:

Logical architecture of CLJ

The functional as well as the non-functional, or quality, requirements of the use-case groups aforementioned are very different from each other. Therefore it makes sense to use different software products to fulfill those requirements.

Messaging Brick

This building block cares for a reliable (real 24/7) component where the applications can load their logging records up to. It is high-performant, lean, stable and therefore capable to also swallow extreme high-load peaks.

This building block also cares for the use-case group [C]. The type of product is queue-like, our implementation uses Apache Kafka (https://kafka.apache.org/). Another example would be to use Amazon’s Firehose/Kinesis in an AWS-based environment.

Online Research Store

This record store is responsible for the structured and fast search for log records and to find out about connections between those.

Our implementation uses ElasticSearch (https://www.elastic.co/de/) together with a self-written ReST service and an Angular-base front-end

Compliance Store

Selected log records (defined by the solution) are persisted in this record store. It is very reliable, needs a back-up, is fast with writing and stores the record before any tampering with it. On the other hand it does not need a super-sophisticated query facility.

In our implementation we decided for Apache Cassandra (http://cassandra.apache.org/). Other possibilities would be to store those selected records in flat files and archive them, or use a RDBMS.

The Client Side

The applications for logging can send their log records either directly into the messaging brick or by harvesting them from the filesystem or another data store. Both methods have pro’s and con’s.

Harvesting methods.
Direct transferLogfile harvesting

+ Fastest
+ Possibly eliminates one component (file system)
- Technically a tight coupling

+ Non-intrusive to existing applications
- Needs another process (ressources + monitoring)

Harvesting methods.

Direct transfer

Possibilities for applications to integrate are:

  • Own client libs of messaging brick

  • APIs for creating messages that fit to the data model of CLJ

  • Appenders for existing logging frameworks (e.g. log4j2 in Java, or log.net for C#)

Generally it is a good idea to offer integration libraries that care for the situations where the messaging brick suffers from a failure. In those cases the using application should not be brought down by logging, to mitigate the tight coupling issue.

Logfile harvesting

There are a lot of tools for that use-case, ranging from light-weight native apps that are integrated into the operating system up to full-scale ETL tools (https://en.wikipedia.org/wiki/Extract,_transform,_load).

A few examples:

In certain architectures, some of these products could server as the messaging brick itself.

Central Logging Datamodel

Partitioning of the log record space

Each record has its own id value, making it unique in all of the data stores.

For managing the stores (especially the online research store for [A]), though, it is necessary to organize the records in a number of dimensions. This separation then supports in the determination of

  • Access rights/permissions

  • Retention times

  • Backup strategy

With that, the integrated applications can gain a lot of control and flexibility for their data.

The dimensions that are suggested is a combination, fitting to the actual need, of those fields:

  • tenant (in case of a real multi-tenant system with separated accounts)

  • environment (if environments are not separated pyhsically or logically on the server side)

  • solution, this determines the organizational owner of the log records within a tenant

  • recordType, to distinguish between different needs of building blocks and types of logging and journal data.

Fields

This list of fields is a gross list of common values a log system could care for. Different applications in different contexts might use one or another subset of this enumeration, hardly setting them all. But, and that is the main reason for this list, values of similar semantics in a log record store should be named equally, to make traversing logs of different applications easier.
Mandatory fields are printed bold.

NDM fields
TypeField NameShort DescriptionLong Description
StringidTechnical id for the log recordThis can be set by the client (if trusted to care for uniqueness) or be omitted and set by the client. The server allow the id to be reused (=update) for semantics like records of timespans (like, e.g., sessions). Proposed algorithm is UUID.
Header Fields, meta data of each record

String

recordTypeType of the record.This is an unbounded enumeration, it’s free to select from the solution; anyway, it is recommended to use a known value (see subpage) to make recognition of the semantics of the record easier. Record types can be shared between solutions, e.g. session, activity and techInfo are record types that are used by applications. The record type is used for partitioning the CLJ data stores for the permission system, as well as a key in defining retention periods and archiving strategy.
StringrecordSubTypeAdditional field to identify the eventCan be used as a type of the source log record. For example, if the recordType is serverLog, the recordSubType could be "tomcat" oder "weblogic".
StringtenantInstitute numberIf needed, for organizations serving multiple iurisdictional tenant, this is the tenant code.
StringenvironmentEnvironment identifierIf needed, when the development, test, staging, production etc. environments are not separted by dedicated data store instances but merged in one, this identifier determines from which environment a log record originates.
TimestamprecordTimestampWhen the log record has been created

If the client does not provide this, or the given value cannot be parsed on server side, the processing engine will create a timestamp as next best guess.

LongsequenceDetermines orderOften the record timestamp is not sufficient to discriminate and order a set of logrecords. E.g. ElasticSearch does not care for finer granularity than miliseconds. In this case the sequence field could store micros or nanos. Another possibility to use this field is that a client can care for a gapless sequence to be sure that no records are lost during transmission, procession and retrieval. A logging front-end can use this field as sole default order attribute or as secondary order attribute after recordTimestamp.
StringlogLevelLevel of importance, as provided by many low-level logging systemsThis field is optional, it is also not normalized, meaning that whatever the client solution provides here will be taken as-is. A lot of logging libraries have their own mind on this topic.

User Info, information about the person or technical systems connected to the log record

String

user

Unique user id in its userType domain

This identifies the user or system uniquely within the domain given in "userType". This value gets more gain in the context of current data protection laws.

StringuserTypeDomain this user account belongs to>

Needed if different user domains should be distinguished — like internet users (customers) and intranet users (employees). Or when user domains ob subsidiaries are not clearly separated by the user ids.

Source Info, which component wrote this log record

StringsolutionCodeUnique identifier of a solutionIdentifies the Solution as unit in the IT landscape.
StringsolutionFunctionCodeId of functional building blockIf needed, more fine-grained organizational partitioning.
StringsourceApplicationBuilding blockMore technical/architectural paritioning key.
StringsourceHostnameSystem name of the server initiating the logging calle.g. DNS of physical or virtual system
StringsourceIpClient IP, originator of the logThe value might differ in regard of the nature of the originator (e.g. a browser-based application, or a batch)
StringuserAgentSoftware that initiated the call

This field is used when the software and its version of the user/client is relevant. e.g. In web front-ends this identifies the Browser that has been used. The writing solution can give any information if it thinks that information about its caller makes a difference.
StringagentVersionTODOdeprecated, might be removed in the future.
StringserverInstanceNameIdentifies the server instancee.g. the docker pod
Initiating solution
StringclientIdCode from initiating systemInititiating systems are mostly user front-ends or batch processes.

Harvesting Info, where was the log record first persistet, might be different from the source solution

StringsourceTypeSyntax of the incoming dataSyntax of the incoming data (into the messaging brick). generic means using this data model in JSON, this is the default value. If the syntax is not generic the central logging service might be able to to a proper transformation.
StringloggingHostnameServer Host Namelike sourceHostname
StringloggingHostIpServer IP addressThe system that provided the logging information, e.g. Apache host for access logs, or any other harvisting service running logstash, fume, rsyslog or a similar tool.
StringlogFilefile name and path from which the log record has been harvested, if applicableIf logrecords are not sent directly to the messaging building block, but harvested from a logfile (by Logstash or a similar software) here this filename and path of the appropriate format (Windows, Unix, Mainframe, …) can be sent if needed.

Context

StringparentIdHierarchical predecessor of this log record.Could be of a functional or sequential order Here a key of a hierarchical higher-level record can be set. So a tree-like structure of log records can be created.
StringcontextId1Mapping context id field 1Example: The id of a user session.
StringcontextId2Mapping context id field 2Example: The (use case) id of a user’s activity.
StringcontextId3Mapping context id field 3Example: The id of a explicit technical log record.
StringcontextId4Mapping context id field 4
TimestampstartDateStart date of the recordFor journalling records that have a time span, this field of the event signals the begin timestamp.
TimestampendDateEnd date of the sessionFor journalling records that have a time span, this field of the event signals the end timestamp.
StringcorrelationIdCorrelation ID for a synchronous or quasi-synchronous callUnique Id that is created as early as possible (ideally by the initiator) and then guided through the whole call hierarchy to create traces of calls.
Unstructured and semistructured data
StringmessageLog MessageAll the information that is not part of other fields
StringadditionalInfosemi-structured dataBusiness or other data. Technically this is a text field. It is recommended, though, to use JSON syntax, because the front-end can interpret it and display a tree structure. Special Case of additionalInfo: External Links. This can be rendered in the UI as Link with following Syntax: additionalInfo.extlink.ref : The URI for the external Link; additionalInfo.extlink.name : The DisplayName for the Link.
Result section
StringResultCodeCode if the record represents a task of any kindHTTP record code, Exception, Error
StringerrorMessageError MessageAny standardized code or message the sending solutions wants to log.
BooleanbusinessErrorBusiness ErrorSometimes business errors are stored as normal messages. It is up to the application to decide which message is a business error or a message. This value should be true for business errors
StatusnormalizedStatusStatus field red/yellow/greenThis field is for the user, giving a hint about whether this log record represents ok status, a warning or an error. enum Status { red yellow green }

Technical information

StringthreadName of the server thread
StringloggerSoftware originName of the class and method(optional) which logs this message
LongdurationMsDuration of a call in milliseconds
StringlogProcessingErrorStackTrace of the log processing error.This is not provided by the client solution but used if anything goes wrong in CLJ log record processing.

NDM fields

Resources

About CLJ

CLJ is a proposal to harmonize logging in an environment where multiple software building blocks are working together in order to fulfill shared requirements.

CLJ is a design blue-print, a proposal how to align a share logging environment.

  • Which building blocks to position in order to have smooth operations

  • Which fields to care for, having a common naming convention

  • Think about the use-cases that support the organization

  • Grounded in a running system of a not-so-small bank subsidiary

  • Feedback and contribution highly appreciated.

  • icon:github[] Source: CLJ’s asciidoc sources are hosted at CLJ sources.

  • icon:twitter[] Twitter: @mcaviti

Authored by the CLJ team in https://www.s-itsolutions.at [s IT Solutions AT], lead by Klemens Dickbauer.

comments powered by Disqus