Rudder 4.0 is out!

November 11, 2016

Rudder 4.0 changes for good the way we see IT management

Here we are! After months of hard work, nourished by years of feedback from the community and our clients, Rudder 4.0 is out. What makes this version so special we felt it deserved its own epic version instead of pursuing towards a more discreet 3.3? Because this new Rudder not only improves its longstanding components, but also adorns itself with disruptive new features that challenge the way IT infrastructure has been managed so far.

“Audit” mode

The main new feature in Rudder 4.0 is the “Audit” policy mode, in addition to the existing “Enforce” one. Both work autonomously and continuously on your managed nodes, but the new Audit mode allows to only check your policies without ever making any changes on your servers.

Audit mode can be configured globally, if you intend to use Rudder to check compliance drift and report on it. The global policy mode can be configured to be overridable. In that case, you will be able to configure policy mode on a node and directive basis. This is the perfect way to add new nodes to you Rudder managed infrastructure and only see what would happen to them before actually commiting changes, or do a dry run test of new configuration without threatening your production environment.

A lot of attention and several safeguards have been put in place to ensure that if you choose to use “Audit” for a target, nothing will be changed on the node for that target (except Rudder’s own configuration under /var/rudder).

From a technical point of view, the configuration and compliance assessing of Rudder was greatly improved in Rudder 4.0 to accomodate the new Audit policy mode:

The expected configurations calculated for each nodes are more precise, and as a bonus it should also assure a better resilience of results.
Nodes are fully aware of exactly what directives need to be executed in Audit or in Enforce mode, and the “rudder agent” command line has been enhanced to let you see the result in a glimpse.
In addition to pre-existing technical reports, new ones have been added to report on “audit-compliant” (the check was OK), “audit-non-compliant” (the check was done, but the result is not the one expected), “audit-not-applicable” (the check is not applicable for that node, for example because of a limitation on the OS type), “audit-error” (the check wasn’t able to finish correctly) status.

Future versions of Rudder will take advantage of this feature to enable ramp up deployment of new configurations, where your changes are gradually deployed in Audit mode and based on the feedback given by the compliance of that audit mode and configured thresholds, either automatically changed to “Enforce” or outliers are reported for human review.

Here are the slides of our latest talk at the O’Reilly Security conference that show how this new feature lets us bring the concepts of Continuous Configuration and Continuous Auditing to life:
O’Reilly Security – Continuous Auditing For Effective Compliance with Rudder from Normation

Global redesign of the web interface

Rudder’s interface hadn’t changed much since its version 3.0. It was beginning to age, while Rudder continues to offer revolutionary features. Version 4.0 was therefore the perfect opportunity to rejuvenate Rudder offering it a new, more modern and responsive, design.

Here is a non-exhaustive list of major improvements:

New menu layout
Integrating the new logo
New theme for (directives, groups, nodes,…) trees
Most forms are now built on Bootstrap

Files from shared folder can be selected using a file browser

Copy/pasting a path from the Rudder server filesystem can be annoying. To improve user experience and ergonomics, Rudder 4.0 allows users to directly select a file from the shared folder and insert its path in the appropriate field during a directive configuration of the Download a file from the shared folder technique. This can be done thanks to the new file manager integrated to this field.

Agents can be remotely launched via API

Remote execution of agents on a remote nodes, from the Rudder server, allows to force new rules application immediately, or to check compliance of one or several systems instantaneously.

The command “rudder remote run” lets you remotely run the agent, but that requires being logged in to the Rudder server, and cannot be automated easily. This is why we added an API call in Rudder 4.0 to run agents on one or all nodes via the API, and to allow for scripting these calls, or integrate them with other tools.

POST to the applyPolicy API method on a node to trigger a remote run

curl -H “X-API-Token:YOUR-TOKEN” -X POST https://server/rudder/api/latest/NODEID/applyPolicy
Start execution with config [20161108-154845-4c5b132e]

Hostname
M| State Technique Component Key Message
E| compliant Common Update Rudder policy, tools and ncf instance are already up to date. No action required.
E| compliant Common ncf Initialization The ncf initialization was correct
E| compliant Common Security parameters The internal environment security is acceptable
E| compliant Common Red Button Red Button is not in effect, continuing as normal…
E| n/a Common Process checking CFEngine proccesses check is done by the rudder-agent CRON job
E| compliant Common CRON Daemon Cron daemon status was correct
E| compliant Common Log system for reports Logging system for report centralization is already correctly configured
E| compliant Common Binaries update The CFEngine binaries in /var/rudder/cfengine-community/bin are up to date
E| compliant Inventory inventory Next inventory scheduled between 00:00 and 06:00
E| compliant MOTD MOTD Configuration The MOTD file was correct

## Summary #####################################################################
10 components verified in 3 directives
=> 10 components in Enforce mode
-> 9 compliant
-> 1 not-applicable
execution time: 1.78s
################################################################################

When called on a node, the API returns the agent output, and a summary of the run.

If no NodeId is given, all nodes will be contacted asynchronously, and the only information outputed is the list of nodes on which the agent will be executed.

curl -k -H “X-API-Token:YOUR-TOKEN” -X POST https://server/rudder/api/latest/nodes/applyPolicy

{“action”: “applyPolicyAllNodes”,”result”: “success”,”data”: [{“id”: “b6b9bede-3163-4c6b-bc4e-8d0603de3613″,”hostname”: “agent2.rudder.local”,”result”: “Started”},{“id”: “77b60e67-be0b-41dd-a9d0-77a43c9b9e14″,”hostname”: “agent3.rudder.local”,”result”: “Started”},{“id”: “ee74ada6-2822-4385-bcf5-080f81cbb44a”,”hostname”: “relay.rudder.local”,”result”: “Started”},{“id”: “root”,”hostname”: “server.rudder.local”,”result”: “Started”}]}

New file copy and node authentication protocol

What are we talking about ?

The file transfer protocol used for both copying policies and explicit file copies from the server to a node (including via relay servers).

The authentication mechanism used to allow accessing the policies and shared files has two levels:

“Allowed Networks” for accessing common policy files (in particular ncf) and shared files (in /opt/rudder/configuration-repository/shared-files).
Host level, for accessing host-specific policy and properties files.

See http://www.rudder-project.org/doc-4.0/_security_considerations.html for details.

Before 4.0

The “classic” file transfer protocol used plain text by default with the ability to encrypt file transfers by configuration.

ACLs were managed using IP addresses and hostnames. By default, this required having working reverse-DNS resolution to improve authentication of nodes. This lead to several issues:

Issues with DNS infrastructure (DNS entry added after spawning a node, missing entry in /etc/hosts, etc.)
Problem finding a good hostname : the fully qualified domain name isn’t always known at the host level, and this lead to situations where the Rudder inventory did not contain a valid hostname

What is new in 4.0

We changed the protocol to use a TLS session. That means everything is now always encrypted.

We also added new key-based ACLs in the server configuration. These are evaluated first, which means all nodes will use them instead of the old ACLs, except any nodes with agent version 3.1 or 3.2 running on initial promises – these will switch to key based authentication as soon as they received generated policy from a Rudder 4.0 server.

This will allow getting rid of all DNS/hostname issues in node-server communication and improve the security level once old ACLs are removed.

How to use it? Is it backwards-compatible?

There is nothing special to do. Everything is almost compatible, except when using a 4.0 agent when the server cannot resolve it (for example behind a NAT) with a pre-4.0 server, because the server does not have the key-based ACL, and the new protocol does not support the option to skip hostname validation, so these nodes may not be able to communicate with the server.

To avoid these issues, the recommended upgrade process is to upgrade the server before the nodes in any case. This will ensure a smooth migration and avoid any compatibility issues.

What are the next steps?

We will progressively remove the old protocol and ACL system. This will likely be an option in the settings soon, and will be completely removed when dropping compatibility with 3.1/3.2 (which can not happen before the next Rudder ESR version is chosen and 3.1 reaches EOL).

Improved performance

On large installations, with several thousand nodes, compliance computation could get quite slow (several minutes to display a list of 7000 nodes with their compliance), and policy generation took a really long time. To improve both of these points, we changed the database schema, the way we handle compliance, and the policy identification in Rudder 4.0.

Expected reports are now stored in JSON in database, per node, instead of spread over several tables, and grouped per rule. This change greatly simplifies the compliance data handling and computation, and more so on very large installations. Indeed, all complex queries to fetch compliance details have been removed, and on our test platforms, compliance computation is twice as fast on a setup with 1000 nodes.

The previous data model caused a dependency between all the target nodes of a given configuration rule: a change on a node could trigger a policy generation for all nodes (with the extreme case of accepting a new node which caused a complete policy generation on all nodes). In Rudder 4.0, this dependency has been removed thanks to the previously described change, and with a new unique policy configuration ID per node. Adding a new node only triggers a policy generation for the new node and its policy server in Rudder 4.0, which represents a massive improvement on large installations.

This evolution is the first step toward a new policy generation system, which will be node per node in the future, with a validation on the node-side rather than server-side, to allow for fast policy generation, regardless of the number of managed nodes.

Rudder agent status/start/stop commands (+ new options)

We continued the work making the rudder CLI command ever more useful.

First let’s have a look at new agent commands:

rudder agent factory-reset: Previous “reinit”, this is a better name, and harder to confuse with simple reset. This will reset your node completely and make it appear like a new one.
rudder agent health: Output current Rudder status and detect common problems. If you want to use it as a Nagios compatible NRPE plugin, just call it with the -n option and it will produce the usual 0/1/2 exit code and a single line output.
rudder agent {start,stop,status}: Manage rudder agent like you would do with the service command.
rudder agent run -u: Update policies just before running the agent. No need to run a separate update command anymore.

The command “rudder remote run” now takes new parameters:

-j to run remote agents in parallel
-a to run on all nodes at once
-g to run on all nodes in a given group

The last two options need a direct connection to the server with an API token to work – provide this with the -t option or in ~/.rudder.
This command doesn’t work yet for nodes that are behind relay servers and not directly reachable via a network connection on port 5309 from the server – but this is a work in progress for 4.1.

On the server side, we now have:

rudder server disable-policy-distribution: Immediately disable policy distribution. If you know there is a problem somewhere and the agents should not update their policies because they are broken, you can stop policy distribution.
rudder server enable-policy-distribution: The reverse of the previous command.
rudder server reload-techniques: Reload techniques definition, this is particularly useful for people developing new techniques.

Settings API

Rudder Rest API allow you to control and integrate with Rudder without using the web interface and gives you the possibility to automate complex tasks that would take a long time to achieve otherwise. It’s quite complete, but there were still some items that you can’t manage, like the general settings of Rudder.

There are several types of parameters:

How the agent behaves on your nodes (audit or enforce policy, run frequency, …) and communicate with Rudder server (reporting protocol, authentication, …)
Enable/disable features in Rudder and configure them

It’s a classic API with few methods:

A GET request to get all settings : /api/latest/settings
A GET request to get one value : /api/latest/settings/key
A POST request to set a parameter : /api/latest/settings/key

There are still some Settings (authorized networks, database cleaning …) that can’t be managed yet, but they will be included in the REST API soon.

Another next step will be to rebase our settings page on this new API so we can have a more reactive interface, that will be easier to maintain. We have already done this for the new Policy mode setting (Audit/Enforce) form in General Settings page.

Check it out!

Rudder 4.0 is available now. Here are a few easy ways to try it out:

Online demo: You can see a demo version of Rudder on http://demo.rudder-project.org/
Vagrant: Just want to test Rudder without worrying about a full installation, but with real virtual machines so that you can actually configure and test things, unlike the demo? We provide a Vagrant set-up. See Rudder Vagrant for getting started with Rudder on Vagrant.
Linux server packages: The easiest way to test and use Rudder for prototyping or production is to install it from the provided Linux packages. For the server, the main current .rpm-based (RedHat, CentOS, SLES) and .deb-based (Debian, Ubuntu) distributions are supported. See https://www.rudder-project.org/site/get-rudder/downloads/ for details about repository and available versions.