4 min read

Building our Data Platform: Why DeNexus has chosen FluentD

Featured Image

DeNexus has selected FluentD to be apart of the DeNexus Knowledge Center. 

Selecting key technology and data partners is a central focus of the data engineering team at DeNexus. Building a platform that can scale and address a challenge as complex as cyber risk quantification requires a data ecosystem that can scale. FluentD has a role in realizing that aim with the DeNexus Knowledge Center for our customers. 

DeNexus Requirements

  1. Compatible with many use cases: we want to process syslogs, launch REST endpoints to easily gather data, execute external programs to receive or pull event logs... (and these are only some examples of use cases that we have already identified). 
  2. TLS compatible: some of the customers that send us data require compatibility with TLS authentication to trust the server that is going to receive their data.  
  3. High scalability: We want to be ready for Petabytes of data volume as we are rapidly growing. “Design for Infiniti: Big enough will not always be Good enough.” 
  4. Portability: The tool should be able to easily be deployed in other systems without surprises like “It works on my machine”. Furthermore, it should be lightweight. 
  5. High availability: We do not want to lose any event/file so the solution must be reliable. 
  6. No kind of vendor lock-in: We want to stick to the multi-cloud paradigm as much as possible. Solution must be decoupled from a certain target/cloud provider.

Considering the described use cases and needs, we analyzed several tools and concluded that FluentD was the correct choice. Here's why... 

The Solution: FluentD

It is an open-source data collector for a unified logging layer. Fluentd allows us to unify data collection and consumption for better use and understanding of data.  

Although it is written in Ruby, it is most performance-sensitive parts (like object serialization and networking layers) are written in C. Fluentd sacrifices some overall performance by having access to many plugins developed by the Ruby community, which allows it to achieve the status of the unified logging layer.   

FluentD solution

 
FluentD is not only an event collector and aggregator, but also allows us to implement functionalities such as: log-parsing, filtering, data conversion and data processing. That is, depending on the use case, FluentD can be implemented as an all-in-one tool that will: 

Despite its small memory footprint (30~40MB), FluentD has a little brother named Fluent Bit written entirely in C and whose memory footprint is about ten times smaller than FluentD's already small memory footprint.  

Fluent Bit  FluentD / Fluent Bit Comparison 

But despite being lighter and written entirely in C, the number of available plugins is much smaller for Fluent Bit than for FluentD (Fluent Bit only has around 80+ plugins compared to Fluentd's 600+), so Fluent Bit gains efficiency and performance by paying a prince in capabilities (especially the parsing ones).  

Why Does FluentD Meet our Needs? 

Proven in many use cases: The list of FluentD plugins already has more than 600 entries, satisfies all our identified use cases, and allows us some peace of mind with respect to possible new use cases that may arise in the future. One of our current use cases requires S3 compatibility (achieved thanks to the S3 plugin) and to keep S3 related costs under control (S3 API calls are charged per object, not per size: uploading 1-byte costs the same as uploading 1GB) as events are aggregated in single files based on time or file size. 

TLS support:  It was the main reason we decided to use FluentD instead of FluentBit.  Although Fluent BIt implements TLS support (by default) in all output plugins, this is not the case for all input plugins and, for one of our identified use cases (syslog), is not (yet) TLS compliant: https://github.com/fluent/fluent-bit/issues/2513  

In the case of FluentD, it is fully supported and is currently used in all our use cases (we even have a use case in which we use it to implement TLS Mutual Authentication).    

High scalability: We have seen that FluentD's memory footprint is only 30-40 mb and a regular PC box can handle 18,000 messages/second with a single process.  

Since scalability must be considered as if we were designing for infinity, in case of having higher processing needs we could horizontally scale or to apply solutions that combine FluentD with Fluent Bit: 

“The combination of Fluentd and Fluent Bit is becoming extremely popular in Kubernetes deployments because of the way they complement each other — Fluent Bit acting as a lightweight shipper collecting data from the different nodes in the cluster and forwarding the data to Fluentd for aggregation, processing and routing to any of the supported output destinations.”  

Portability: Easy installation, available as a service in many formats and systems: 

  • msi (WIndows)
  • dmg (MAC)
  • deb (Ubuntu/Debian)
  • Ruby Gem (without dependencies).
  • Docker: we create a container with the desired configuration, and we can deploy it any number of times without any further effort in its configuration. 

High availability. Since FluentD is not a serverless or managed solution, the high availability configuration depends entirely on us and on our architecture. However, different recommendations on how to implement FluentD for high availability can be found in official documentation. There are several buffer plugins that help us implement failure scenarios like forwarder/aggregator ones. 

NO vendor lock-in: Although there are Enterprise services available, FluentD is a fully open-source solution without any dependency on third party or cloud providers. Moreover, there are use cases where FluentD is used as a key tool to achieve a multi-cloud architecture. 

Last Thoughts

As 2,000+ data-driven companies rely on Fluentd (Even technology leaders like: Microsoft or Google) we can be quite sure that open source based technologies like FluentD are superior, from a technology perspective, than it’s proprietary peers. 

DeNexus Data

FluentD in DeNexus Data Platform 

So, this is why we have chosen FluentD - but how does FluentD fit into the DeNexus Knowledge Center exactly? Stay tuned - In next articles we will cover in detail how we have implemented some of our current use cases with FLuentD.  

To read more about other partners and components of the DeNexus Knowledge Center, check out my last blog on Databricks

 

"The success formula: solve your own problems and freely share the solutions.”

Naval Ravikant 

DeNexus Partners with Claroty to Mitigate Operational Technology (OT) Risk in the Critical Infrastructure Industry

Strategic partnership and native integration will allow users to

more simply and comprehensively quantify and manage OT risk

Read More

DeNexus Accelerates Momentum with Strategic Investments from Leading Energy and Insurance Companies

Investments to Fast-Track Company Mission to Establish the Global Standard for Industrial Cyber Risk Quantification

Read More

DeNexus Expands Cyber Risk Management Solution to Manufacturing and Energy T&D

DeRISK empowers companies to regain control over their cyber risk

Read More