CFEngine Inventory of Windows Server 2012

I am working on setting up a “reporting portal” CFEngine Enterprise hub to aggregate inventory from several hubs in different parts of a company (managed by different organizations). This one “superhub” would allows executives instant insight into infrastructure integrity.

While demonstrating my prototype, an executive liked the idea of having data at her fingertips so much, she asked, can we put our Windows servers into CFEngine?

I said sure, but CFEngine inventory on Windows is not as detailed as it is for UNIX and Linux. The next question naturally then is how detailed is it?

To answer, I spun up a Windows Server 2012 VM in the Joyent public cloud (the Joyent UI is a delight to use, BTW, and I had my VM up in less than a minute) and bootstrapped it to a CFEngine hub in the same cloud. While I was able to pull policy immediately, the hub couldn’t connect to the Windows server on port 5308 to collect reports until I went into the Windows Firewall with Advanced Security control panel and opened up port 5308. ( has a decent write-up.)

Here is what you get out of the box in the way of inventory.

Name Value
Windows roles WinServer
System version BOCHS – 1
Host name ownerco-18v4p42
Hardware addresses 90:b8:d0:52:7c:09, 90:b8:d0:b5:c7:94
System manufacturer Joyent
Disk free (%) on main drive (C:) 69
BIOS version Bochs
Architecture x86_64
OS type windows
IPv4 addresses,
OS kernel Windows Server 2012
CFEngine ID SHA=1f6666e1e88b05a4c7a98604ffa429bc452dc209a22e78072abd2d6eccb5170c
System serial number 720f2caa
BIOS vendor Bochs
CFEngine version 3.7.4
Server class windows
Uptime minutes 46
OS Windows Server 2012

The basics are there – hostname, OS version, disk utilization, network addresses. And, just like the UNIX/Linux inventory, the Windows inventory is extensible.

And just for fun, here is a screenshot showing the CFEngine processes running on Windows (the first three in the “ps” output).

CFE on Windows screenshot

Posted in Uncategorized | Leave a comment

You think our training is expensive?

I charge US $3,000 per training day, plus a US $2,000 admin fee, to come on-site and train up to 12 staff using a training methodology that ensure that deep learning occurs. Some people have pushed back on the price as too expensive.

As Red Adair, the firefighter specializing in putting out oil well fires, once said:

“If you think it’s expensive to hire a professional to do the job, wait until you hire an amateur.”

I have heard horror stories of 5-day long classes where the “Instructor” sat at the front and droned through a PowerPoint presentation. He wouldn’t answer questions because he had hundreds of slides to get through. I’ve heard of Instructors dismissing class on Friday morning because they’ve “covered the material” already, yet the students still can’t do the actions required because they lack complete understanding.

When I train, I look at the faces of the students to see if they understand. You can see it in their eyes if you care to look. I don’t move on to the next module until everybody understands the current one.

The hallmark of our training is balancing theory with practical, so there are lab exercises after every module. It’s one thing to learn about engines in a text book, but you get a completely different level of understanding after you put one together with your own two hands!

Our materials are carefully laid out to cover all the basics and define all the terms and then and only then start on intermediate and advanced topics. Careful attention to fundamentals is how experienced users come out raving how much they’ve learned.

I have never had anyone complain about price after our training. I have had a couple of people express that ours was the best training they ever had, anywhere.

Posted in Uncategorized | Leave a comment

Graphing within psql

I mentioned this on HN years ago but it’s nifty so add it here.

You can graph SQL output with gnuplot without leaving the psql (Postgres client) command-line.

Because @fusiongyro commented “This is incredible! I only wish it were a little easier to do on the fly,” I inquired on the amazingly helpful pgsql-general list.

There are two approaches: client-side and server-side.

  • Ian Barwick explained how to put all the prep stuff into a psql script, define your query and invoke the script.
    barwick@localhost:~$ psql -U postgres testdb
      psql (9.2.3)
      Type "help" for help.
      testdb=# \set plot_query 'SELECT * FROM plot'
      testdb=# \i tmp/plot.psql
                                        My Graph
        4 ++---------+-----------+----------+----------+-----------+---------**
          +          +           +          +          +           +     **** +
          |                                                          ****     |
      3.5 ++                                                     ****        ++
          |                                                  ****             |
          |                                              ****                 |
        3 ++                                         ****                    ++
          |                                      ****                         |
      2.5 ++                                *****                            ++
          |                             ****                                  |
          |                         ****                                      |
        2 ++                    ****                                         ++
          |                 ****                                              |
          |             ****                                                  |
      1.5 ++        ****                                                     ++
          |     ****                                                          |
          + ****     +           +          +          +           +          +
        1 **---------+-----------+----------+----------+-----------+---------++
          1         1.5          2         2.5         3          3.5         4
  • Sergey Konoplev explained how to do it with a server-side function – you need gnuplot installed on the db server and then you can use

select just_for_fun_graph('select ... from ...', 'My Graph', 78, 24, ...)

Posted in Uncategorized | Leave a comment

Binding an SSH launcher to a GNU Screen hotkey

I have a confession to make. I use SSH to access servers.

I tell the sysadmins I teach to make changes to their servers using configuration management, but:

(a) most clients I work with are just starting to use configuration management so we use SSH to access the systems that aren’t under in configuration management yet, and

(b) I enjoy troubleshooting issues rather than just shooting my IT infrastructure in the head and instantiating a new one that might have the same issue. But this post isn’t about immutable infrastructures. It’s about SSHing to servers.

From “things that make me happy”, I added two lines near the top of my GNU Screen config file, .screenrc:

# start ssh launcher loop

screen -t launcher /bin/sh -c 'while true; do echo -n "Hostname: "; read host;  screen -t $host ssh $host; clear; done'

# bind ctrl-K to "switch to window 0 which contains the SSH launcher"

bindkey "\013" select 0

Now when I want to open a new session, I press Ctrl-K and enter the hostname, and GNU Screen will start a new window, titled with the name of the host, running an SSH session to that host.

It’s the little things in life.


Now that I’ve used this for a day, I remembered the problem with this setup — after you launch an ssh session, if you press the screen command key twice to go back to the previous window, you end up in the launcher window instead.

When I worked at EarthLink, I made a little shell script similar to this that I called with screen’s “exec” command and did some gnarly input/output redirection where the script took my input and it’s output was fed back to the screen as if it was user input and that contained the command to launch the ssh session. I didn’t save it; looks like I’ll have to reconstruct it.

Also, the latest version of GNU Screen is rather improved: you can renumber windows, split windows vertically, etc.

Yes, I know about tmux. I’m just used to screen. 🙂

Posted in Uncategorized | Leave a comment

Time Management and Git training at Ohio Linux Fest

I am pleased to announce that Mike Weilgart and I will be delivering professional training for Ohio Linux Fest Institute in October.

I will teach “Time Management for System Administrators” and Mike will teach “Git Foundations: Unlocking the Mysteries”.

You can now register for Ohio Linux Fest.

On a personal note, I enjoy walking about Columbus, lots of history there, and I love walking into German Village.

Posted in Uncategorized | Leave a comment

Using Ansible to change sshd configuration

One of my clients is at the ssh “for” loop stage of automation maturity, so I installed Ansible. Because of selinux and Python version issues, I’m using the “raw” mode (which doesn’t require Python on the hosts, it just runs raw shell commands).

What follows is an example of using Ansible raw mode to make changes at scale.


A developer requested:

Please, activate these option on XYZ servers in the /etc/ssh/sshd_config
so I can stay connected while debugging:

ClientAliveInterval 15
ClientAliveCountMax 3


First, check current setting, so that I know what we have in place now (starting point):

$ ansible all -i /tmp/hosts  -m raw -a "grep ClientAliveCountMax /etc/ssh/sshd_config"  --ask-pass --one-line --user=root
SSH password:
X | success | rc=0 | (stdout) #ClientAliveCountMax 3

Y | success | rc=0 | (stdout) #ClientAliveCountMax 3

Z | success | rc=0 | (stdout) #ClientAliveCountMax 3


Uncomment the line, enabling the setting:

$ ansible all -i /tmp/hosts -m raw -a "sed -i /etc/ssh/sshd_config -e 's:.ClientAliveCountMax 3:ClientAliveCountMax 3:'; grep ClientAliveInterval /etc/ssh/sshd_config"  --ask-pass --one-line --user=root

$ ansible all -i /tmp/hosts -m raw -a "grep ClientAliveCountMax /etc/ssh/sshd_config" --ask-pass --one-line --user=root
SSH password:
X | success | rc=0 | (stdout) ClientAliveCountMax 3

Y | success | rc=0 | (stdout) ClientAliveCountMax 3

Z | success | rc=0 | (stdout) ClientAliveCountMax 3


Summary of changes:
Before: #ClientAliveCountMax 3
After: ClientAliveCountMax 3

Rince and repeat for ClientAliveInterval.

Now reload SSHd config:

$ ansible all -i /tmp/hosts -m raw -a "/etc/init.d/sshd reload"  --ask-pass --one-line --user=root
SSH password:
X | success | rc=0 | (stdout) Reloading sshd: [  OK  ]

Y | success | rc=0 | (stdout) Reloading sshd: [  OK  ]

Z | success | rc=0 | (stdout) Reloading sshd: [  OK  ]

Posted in Uncategorized | Leave a comment

Infrastructure Management at Scale

I recently spoke at Digital Media Educators Conference (DMEC) on Infrastructure Management at Scale and the skills educators need to impart to up and coming system administrators.

This conference serves the California community college system, which is dear to my heart. My mother worked at West Los Angeles College library her entire professional life in America, since we arrived in 1988. I used to volunteer and help her out with shelving in the summer. I was a very poor helper since I kept getting distracted by all the delicous books and did more reading than shelving.

While in high school I took computer programming, math and English at West Los Angeles College and at Santa Monica Community College, at first during summer break and then concurrent with eleventh grade, which allowed me to go to University instead of going to 12th grade.

So I have a personal connection to the California community college system and I jumped at the chance to contribute a talk:

Cover slide

Because my presentation was in the Data Representation track, I focused on Inventory and Compliance Reporting so I could show off CFEngine’s slick UI.

I started by laying out CFEngine’s philosophic groundwork:
Promise Theory and the advantages of voluntary cooperation and distributed work over the limitations of imposed direct control.
– The advantages of pull over push (see “Push versus pull” in Deconstructing the `CAP theorem’ for CM and DevOps by the author of CFEngine for more on this), and
– The Dunbar numbers which constrain the quality and quantity of relationships sysadmins are able to have with their infrastructures. The rest of the talk demonstrated how the design of CFEngine uses Dunbar numbers to focus the information it presents.

Dunbar numbers

We also talked about what computer system administration IS, and what the challenges are and how we handle them.

Then I introduced the CFEngine dashboard:

Dashboard 1

I pointed out the header which holds the host count (2, including the hub itself) and the health indicator (OK); the graph of Changes made by CFEngine, the fact that both of our hosts have Software Updates available (1 alert triggered on 2 hosts), and that we have 100% compliance on promise compliance and system health (green check-marks).

The next slide, adding a third host (notice the hosts indicator up top), shows how the Alert for Software Updates changes to a 2/3 arc, as, right after adding the host, as at this point the hub knows 2 out of 3 hosts are missing software updates. Once the agent runs on the third host and the hub collects the report, the Alert will change back to a full circle with 3 out of 3 hosts are missing software updates.

Dashboard 2

The next slide illustrates how CFEngine communicates the severity of the alert: critical issues are indicated in red, less severe in orange (amber for you Aussies), and mildest level is yellow. I induced a policy non-compliance situation on one of the three hosts (e.g., promised a file edit but prevented CFEngine from accessing the file by filling up the disk), so the Promise Compliance alert spans 1/3 of the circle (1 out of 3 hosts).

Dashboard 3

Notice also that if CFEngine is unable to collect reports from a host or if an agent stops running on a host, the health indicator at the top of the screen changes from OK to a red number indicating the number of issues:

Dashboard health indicator

You can see the number and type of issues:

Dashboard health detail

Notice that the Dunbar numbers are in play here: CFEngine tells you there are issues, and if you want more data, then you can have it. But it doesn’t throw all the detail at you at once, that would be too much.

You can get more detail on which hosts are not reporting by selecting “Hosts not reporting” from the health indicator menu:

Hosts not reporting

You can then select a host in the list of hosts not reporting to see the info for that host (host detail).

Health issues host list

Host detail

That actually takes us to the “Hosts” tab.

The “Hosts” tab starts in the “all hosts” view, where you see the promise compliance summary for your infrastructure:

All Hosts

You can list the hosts that have less than 100% compliance:

Non-compliant hosts

You can see which promises were not kept on each host:

Promises not kept by host

And that takes us to the “Reports” tab. There are many reports available but let’s take a look at the Inventory Report. It starts out with four basic columns but you can add more:

Inventory 1 - Start with 4 columns

You can extend inventory collection by writing CFEngine promises, for example, here I’ve added inventory of the host’s timezone:

Inventory 2 - User Defined

Let’s say our company policy says all hosts must be in the UTC timezone. But in reality we have this:

Inventory 3 - Timezone Detail

You can sort the column contents by selecting the column heading, this groups the outliers and brings them into view:

Inventory 4 - Sort

You can graphically summarize column contents by selecting “Chart Data”:

Inventory 5 - Summary Chart Dialog


Inventory 6 - Summary Chart

Hover over a slice to get more detail:

Inventory 7 - Chart detail

Or switch to column view:

Inventory 8 - Column View

Here is another example:

Inventory 9 - AD status

The charts can be exported and embedded in reports to management, auditors, etc.

Want to give CFEngine Enterprise a try? It’s very easy to download and install the hub package.

Feel free to email me if you have any questions!

Posted in Uncategorized | Leave a comment

Why I credit my career success to USENIX LISA training

I encourage *nix sysadmins to go to the annual USENIX LISA conference and avail themselves of the training there.

USENIX is the UNIX Users Group. Around since the seventies, it is now a global professional society spanning industry and academia.

The LISA (Large Installation System Administration) conference offers a mix of talks, presentations, expo hall, social activities and formal tutorials (e.g., last year’s program).

I attribute my professional success to being connected with USENIX, to attending every LISA conference I can, getting as much training as I can and to the amazing people I’ve met at LISA, many of whom are now dear friends.

The training is key to broadening my knowledge base, increasing my skill set, and making me more valuable as an individual contributor and executive.

Often the person doing the training literally wrote the book on the subject. I’ve been trained on UNIX system administration by Æleen Frisch, on RRDTool by Tobias Oetiker, on CFEngine by Mark Burgess and so on.

The conference is organized by the community — it is truly a conference by sysadmins, for sysadmins.

I want to thank the staff of USENIX and the community for nurturing this resource and commend them for keeping it relevant. I want to express my heartfelt appreciation to the sysadmin executives of EarthLink in the 1990’s who insisted their staff get trained at LISA to get professional.

See you in Boston, December 4 – 9, 2016!


Posted in Uncategorized | Leave a comment

CFEngine Enterprise tip: showing hosts that have a broken RPM database

RPM database corruption is a common problem on Red Hat Linux systems at scale.

When it happens, you have to rebuild the RPM database:

I am working on automating this repair with CFEngine. In the meantime, here is a Custom Report to identify these systems:

-- Aleksey Tsalolikhin, 12 July 2016
-- Show hosts that have broken RPM databases

select hosts.hostname,changetimestamp
from promiseexecutions
inner join hosts on promiseexecutions.hostkey = hosts.hostkey
where logmessages::text ilike '%rpm%db3%' 
and changetimestamp >  current_timestamp - '24 hours'::interval
order by changetimestamp;
Posted in Uncategorized | Leave a comment

Dumping RPM metadata

The other day, I wanted to find out which RPM metadata field was used to store the “el6” value in the “rpm -q” output for a package, e.g:

[root@as-ws-pr-la-01 Desktop]# rpm -q kernel-2.6.32-504.el6.x86_64
[root@as-ws-pr-la-01 Desktop]# 

This came up because this output looks identical for CFEngine’s RHEL 4, 5 and 6 packages, even though there are two different packages involved.

So I used this one-liner to dump all RPM metadata which should be that RELEASE was most likely the proper field to update:

[root@as-ws-pr-la-01 Desktop]# echo  rpm  -q kernel-2.6.32-504.el6.x86_64 --queryformat \" $(for f in `rpm --querytags`; do echo \\n $f = %{${f}} ' '  ; done  )  \" |sh |grep el6
BASENAMES = .vmlinuz-2.6.32-504.el6.x86_64.hmac
 CHANGELOGNAME = Johnny Hughes <> [2.6.32-504.el6.centos]
 EVR = 2.6.32-504.el6
 FILENAMES = /boot/.vmlinuz-2.6.32-504.el6.x86_64.hmac
 NEVR = kernel-2.6.32-504.el6
 NEVRA = kernel-2.6.32-504.el6.x86_64
 NVR = kernel-2.6.32-504.el6
 NVRA = kernel-2.6.32-504.el6.x86_64
/sbin/new-kernel-pkg --package kernel --install 2.6.32-504.el6.x86_64 || exit $?
/sbin/new-kernel-pkg --package kernel --mkinitrd --dracut --depmod --update 2.6.32-504.el6.x86_64 NEWKERNARGS || exit $?
/sbin/new-kernel-pkg --package kernel --rpmposttrans 2.6.32-504.el6.x86_64 || exit $?
    /sbin/weak-modules --add-kernel 2.6.32-504.el6.x86_64 || exit $?
PREUN = /sbin/new-kernel-pkg --rminitrd --rmmoddep --remove 2.6.32-504.el6.x86_64 || exit $?
    /sbin/weak-modules --remove-kernel 2.6.32-504.el6.x86_64 || exit $?
PROVIDEVERSION = 2.6.32-504.el6
 R = 504.el6
 RELEASE = 504.el6
 SOURCERPM = kernel-2.6.32-504.el6.src.rpm
[root@as-ws-pr-la-01 Desktop]#

Does anybody know a better way to find out what RPM fields are used to construct the “rpm -q” output?

Posted in Uncategorized | Leave a comment