Time Management and Git training at Ohio Linux Fest

I am pleased to announce that Mike Weilgart and I will be delivering professional training for Ohio Linux Fest Institute in October.

I will teach “Time Management for System Administrators” and Mike will teach “Git Foundations: Unlocking the Mysteries”.

You can now register for Ohio Linux Fest.

On a personal note, I enjoy walking about Columbus, lots of history there, and I love walking into German Village.

Posted in Uncategorized | Leave a comment

Using Ansible to change sshd configuration

One of my clients is at the ssh “for” loop stage of automation maturity, so I installed Ansible. Because of selinux and Python version issues, I’m using the “raw” mode (which doesn’t require Python on the hosts, it just runs raw shell commands).

What follows is an example of using Ansible raw mode to make changes at scale.


A developer requested:

Please, activate these option on XYZ servers in the /etc/ssh/sshd_config
so I can stay connected while debugging:

ClientAliveInterval 15
ClientAliveCountMax 3


First, check current setting, so that I know what we have in place now (starting point):

$ ansible all -i /tmp/hosts  -m raw -a "grep ClientAliveCountMax /etc/ssh/sshd_config"  --ask-pass --one-line --user=root
SSH password:
X | success | rc=0 | (stdout) #ClientAliveCountMax 3

Y | success | rc=0 | (stdout) #ClientAliveCountMax 3

Z | success | rc=0 | (stdout) #ClientAliveCountMax 3


Uncomment the line, enabling the setting:

$ ansible all -i /tmp/hosts -m raw -a "sed -i /etc/ssh/sshd_config -e 's:.ClientAliveCountMax 3:ClientAliveCountMax 3:'; grep ClientAliveInterval /etc/ssh/sshd_config"  --ask-pass --one-line --user=root

$ ansible all -i /tmp/hosts -m raw -a "grep ClientAliveCountMax /etc/ssh/sshd_config" --ask-pass --one-line --user=root
SSH password:
X | success | rc=0 | (stdout) ClientAliveCountMax 3

Y | success | rc=0 | (stdout) ClientAliveCountMax 3

Z | success | rc=0 | (stdout) ClientAliveCountMax 3


Summary of changes:
Before: #ClientAliveCountMax 3
After: ClientAliveCountMax 3

Rince and repeat for ClientAliveInterval.

Now reload SSHd config:

$ ansible all -i /tmp/hosts -m raw -a "/etc/init.d/sshd reload"  --ask-pass --one-line --user=root
SSH password:
X | success | rc=0 | (stdout) Reloading sshd: [  OK  ]

Y | success | rc=0 | (stdout) Reloading sshd: [  OK  ]

Z | success | rc=0 | (stdout) Reloading sshd: [  OK  ]

Posted in Uncategorized | Leave a comment

Infrastructure Management at Scale

I recently spoke at Digital Media Educators Conference (DMEC) on Infrastructure Management at Scale and the skills educators need to impart to up and coming system administrators.

This conference serves the California community college system, which is dear to my heart. My mother worked at West Los Angeles College library her entire professional life in America, since we arrived in 1988. I used to volunteer and help her out with shelving in the summer. I was a very poor helper since I kept getting distracted by all the delicous books and did more reading than shelving.

While in high school I took computer programming, math and English at West Los Angeles College and at Santa Monica Community College, at first during summer break and then concurrent with eleventh grade, which allowed me to go to University instead of going to 12th grade.

So I have a personal connection to the California community college system and I jumped at the chance to contribute a talk:

Cover slide

Because my presentation was in the Data Representation track, I focused on Inventory and Compliance Reporting so I could show off CFEngine’s slick UI.

I started by laying out CFEngine’s philosophic groundwork:
Promise Theory and the advantages of voluntary cooperation and distributed work over the limitations of imposed direct control.
– The advantages of pull over push (see “Push versus pull” in Deconstructing the `CAP theorem’ for CM and DevOps by the author of CFEngine for more on this), and
– The Dunbar numbers which constrain the quality and quantity of relationships sysadmins are able to have with their infrastructures. The rest of the talk demonstrated how the design of CFEngine uses Dunbar numbers to focus the information it presents.

Dunbar numbers

We also talked about what computer system administration IS, and what the challenges are and how we handle them.

Then I introduced the CFEngine dashboard:

Dashboard 1

I pointed out the header which holds the host count (2, including the hub itself) and the health indicator (OK); the graph of Changes made by CFEngine, the fact that both of our hosts have Software Updates available (1 alert triggered on 2 hosts), and that we have 100% compliance on promise compliance and system health (green check-marks).

The next slide, adding a third host (notice the hosts indicator up top), shows how the Alert for Software Updates changes to a 2/3 arc, as, right after adding the host, as at this point the hub knows 2 out of 3 hosts are missing software updates. Once the agent runs on the third host and the hub collects the report, the Alert will change back to a full circle with 3 out of 3 hosts are missing software updates.

Dashboard 2

The next slide illustrates how CFEngine communicates the severity of the alert: critical issues are indicated in red, less severe in orange (amber for you Aussies), and mildest level is yellow. I induced a policy non-compliance situation on one of the three hosts (e.g., promised a file edit but prevented CFEngine from accessing the file by filling up the disk), so the Promise Compliance alert spans 1/3 of the circle (1 out of 3 hosts).

Dashboard 3

Notice also that if CFEngine is unable to collect reports from a host or if an agent stops running on a host, the health indicator at the top of the screen changes from OK to a red number indicating the number of issues:

Dashboard health indicator

You can see the number and type of issues:

Dashboard health detail

Notice that the Dunbar numbers are in play here: CFEngine tells you there are issues, and if you want more data, then you can have it. But it doesn’t throw all the detail at you at once, that would be too much.

You can get more detail on which hosts are not reporting by selecting “Hosts not reporting” from the health indicator menu:

Hosts not reporting

You can then select a host in the list of hosts not reporting to see the info for that host (host detail).

Health issues host list

Host detail

That actually takes us to the “Hosts” tab.

The “Hosts” tab starts in the “all hosts” view, where you see the promise compliance summary for your infrastructure:

All Hosts

You can list the hosts that have less than 100% compliance:

Non-compliant hosts

You can see which promises were not kept on each host:

Promises not kept by host

And that takes us to the “Reports” tab. There are many reports available but let’s take a look at the Inventory Report. It starts out with four basic columns but you can add more:

Inventory 1 - Start with 4 columns

You can extend inventory collection by writing CFEngine promises, for example, here I’ve added inventory of the host’s timezone:

Inventory 2 - User Defined

Let’s say our company policy says all hosts must be in the UTC timezone. But in reality we have this:

Inventory 3 - Timezone Detail

You can sort the column contents by selecting the column heading, this groups the outliers and brings them into view:

Inventory 4 - Sort

You can graphically summarize column contents by selecting “Chart Data”:

Inventory 5 - Summary Chart Dialog


Inventory 6 - Summary Chart

Hover over a slice to get more detail:

Inventory 7 - Chart detail

Or switch to column view:

Inventory 8 - Column View

Here is another example:

Inventory 9 - AD status

The charts can be exported and embedded in reports to management, auditors, etc.

Want to give CFEngine Enterprise a try? It’s very easy to download and install the hub package.

Feel free to email me if you have any questions!

Posted in Uncategorized | Leave a comment

Why I credit my career success to USENIX LISA training

I encourage *nix sysadmins to go to the annual USENIX LISA conference and avail themselves of the training there.

USENIX is the UNIX Users Group. Around since the seventies, it is now a global professional society spanning industry and academia.

The LISA (Large Installation System Administration) conference offers a mix of talks, presentations, expo hall, social activities and formal tutorials (e.g., last year’s program).

I attribute my professional success to being connected with USENIX, to attending every LISA conference I can, getting as much training as I can and to the amazing people I’ve met at LISA, many of whom are now dear friends.

The training is key to broadening my knowledge base, increasing my skill set, and making me more valuable as an individual contributor and executive.

Often the person doing the training literally wrote the book on the subject. I’ve been trained on UNIX system administration by Æleen Frisch, on RRDTool by Tobias Oetiker, on CFEngine by Mark Burgess and so on.

The conference is organized by the community — it is truly a conference by sysadmins, for sysadmins.

I want to thank the staff of USENIX and the community for nurturing this resource and commend them for keeping it relevant. I want to express my heartfelt appreciation to the sysadmin executives of EarthLink in the 1990’s who insisted their staff get trained at LISA to get professional.

See you in Boston, December 4 – 9, 2016!


Posted in Uncategorized | Leave a comment

CFEngine Enterprise tip: showing hosts that have a broken RPM database

RPM database corruption is a common problem on Red Hat Linux systems at scale.

When it happens, you have to rebuild the RPM database:
– https://access.redhat.com/solutions/6903
– http://www.cyberciti.biz/tips/rebuilding-corrupted-rpm-database.html

I am working on automating this repair with CFEngine. In the meantime, here is a Custom Report to identify these systems:

-- Aleksey Tsalolikhin, 12 July 2016
-- Show hosts that have broken RPM databases

select hosts.hostname,changetimestamp
from promiseexecutions
inner join hosts on promiseexecutions.hostkey = hosts.hostkey
where logmessages::text ilike '%rpm%db3%' 
and changetimestamp >  current_timestamp - '24 hours'::interval
order by changetimestamp;
Posted in Uncategorized | Leave a comment

Dumping RPM metadata

The other day, I wanted to find out which RPM metadata field was used to store the “el6” value in the “rpm -q” output for a package, e.g:

[root@as-ws-pr-la-01 Desktop]# rpm -q kernel-2.6.32-504.el6.x86_64
[root@as-ws-pr-la-01 Desktop]# 

This came up because this output looks identical for CFEngine’s RHEL 4, 5 and 6 packages, even though there are two different packages involved.

So I used this one-liner to dump all RPM metadata which should be that RELEASE was most likely the proper field to update:

[root@as-ws-pr-la-01 Desktop]# echo  rpm  -q kernel-2.6.32-504.el6.x86_64 --queryformat \" $(for f in `rpm --querytags`; do echo \\n $f = %{${f}} ' '  ; done  )  \" |sh |grep el6
BASENAMES = .vmlinuz-2.6.32-504.el6.x86_64.hmac
 CHANGELOGNAME = Johnny Hughes <johnny@centos.org> [2.6.32-504.el6.centos]
 EVR = 2.6.32-504.el6
 FILENAMES = /boot/.vmlinuz-2.6.32-504.el6.x86_64.hmac
 NEVR = kernel-2.6.32-504.el6
 NEVRA = kernel-2.6.32-504.el6.x86_64
 NVR = kernel-2.6.32-504.el6
 NVRA = kernel-2.6.32-504.el6.x86_64
/sbin/new-kernel-pkg --package kernel --install 2.6.32-504.el6.x86_64 || exit $?
/sbin/new-kernel-pkg --package kernel --mkinitrd --dracut --depmod --update 2.6.32-504.el6.x86_64 NEWKERNARGS || exit $?
/sbin/new-kernel-pkg --package kernel --rpmposttrans 2.6.32-504.el6.x86_64 || exit $?
    /sbin/weak-modules --add-kernel 2.6.32-504.el6.x86_64 || exit $?
PREUN = /sbin/new-kernel-pkg --rminitrd --rmmoddep --remove 2.6.32-504.el6.x86_64 || exit $?
    /sbin/weak-modules --remove-kernel 2.6.32-504.el6.x86_64 || exit $?
PROVIDEVERSION = 2.6.32-504.el6
 R = 504.el6
 RELEASE = 504.el6
 SOURCERPM = kernel-2.6.32-504.el6.src.rpm
[root@as-ws-pr-la-01 Desktop]#

Does anybody know a better way to find out what RPM fields are used to construct the “rpm -q” output?

Posted in Uncategorized | Leave a comment

Feedback from “Taming the Git Filesystem” talk on June 2nd, 2016

Mike Weilgart is going to repeat his “Taming the Git Filesystem” talk on June 23rd in Burbank. In the meantime, here is what attendees said about the debut of this talk June 2nd at UUASC-LA/LOPSA-LA meetup:

“You definitely filled out some conceptual holes.”
Stephen Franklin
Systems Engineer

“I liked that it was approachable and assumed no prior knowledge.”
Eric White
Senior System Engineer

“Good foundational talk, Michael! Just as the description says, I’ve just learned enough about git to get my development work done. I occasionally find my ignorance of git’s foundation comes back to bite me in the rear. With this talk I can hope that happens less often.”
George Wu
Vice President of Engineering

“I was able to understand the explanations even as a student with limited technical knowledge.”
Eric, Community College Student

“This talk has something to offer for most git users. It’s not a run of the mill ‘how to use’ git presentation.”
Jordan Schwartz
Systems / Storage Engineer

Posted in Uncategorized | Leave a comment

Identifying critical unpatched vulnerabilities on a Red Hat system

These are some working notes for identifying critical unpatched vulnerabilities on a Red Hat Enterprise Linux system (version 6).

If you install yum-security plugin, you can list security updates available and which CVEs they relate to, as well as their severity according to Red Hat ratings system:

yum update-info list cves available

Identifying which unpatched CVEs (as returned by the yum-security plugin) are Critical according to CVSS (Common Vulnerability Scoring System), with score > 7:

Scored CVEs are available from National Vulnerability Database through a set of XML feeds. The NIST web site says:

A common way to use the feeds is to perform a one-time import of all of the
main XML vulnerability feeds and then use the “modified” feeds to keep

The “xml2” package converts from XML to various formats. I started by converting the 2015 XML to CSV:

$ xml2 < nvdcve-2.0-2015.xml > nvdcve-2.0-2015.flat
$ 2csv entry vuln:cve-id vuln:cvss/cvss:base_metrics/cvss:score vuln:summary < nvdcve-2.0-2015.flat > nvdcve-2.0-2015.csv

Next step: lookup the CVSS scores for CVEs returned by yum-security, is left as an exercise for the reader.

xml2 home
– “xml2” is available through sudo apt install xml2 on Ubuntu

Posted in Uncategorized | Leave a comment

Mario Obejas on “a replacement for bash?” and on writing production-grade code in any language.

Recently Yves Dorfsman asked on the lopsa-tech mailing list:

A lot of people love to hate bash, and there are good reasons for it, but it
seems that there isn’t an obvious replacement for it.

What do you use? Do you see any clear winner to replace it on the horizon?

Mario Obejas, a popular occasional Instructor for Vertical Sysadmin, and living proof that there is life after system administration, answered with this gem. This is the voice of 34 years of Software Development, Infrastructure, and Information Security experience, folks. (Thanks for allowing us to re-post it here, Mario!)

bash replaced sh the same way vim replaced vi, for many of the same reasons. They took what was working, kept the solid parts of the base and improved it. The jury is back – bash works well enough for a lot of production work.

My prediction, Yves, is that the only thing that will replace vim and bash to the same degree will similarly be another iteration that improves upon the already existing widely installed vim and bash base, and leverages the same widely held skill sets.

Some things, like a Ted Cruz, no language can deal with. The last production bash scripts I wrote were to provide data traffic management between the VIIRS satellite ground station and NASA’s net, through their itty bitty data pipe. Our requirements were to be able to recover from up to a catastrophic three day data clog or outage, without losing any of the satellite data. We did very well through 2013 until Ted affected us. Nobody expected a 17 day US government shutdown …..

I’m an odd data point with respect to languages. I loved PASCAL and ADA. But here’s some heresy – I hate C. That’s probably due to the way a lot of people write C, which I characterize as, “as obfuscated and comment free as possible”. I think most people write C with speed in mind (filling in the diamond or checkbox on the schedule) versus maintainability (write like you will not see the code for a year, and then you will have to come back and upgrade/extend it).

We wrote Theater High Altitude Air Defense in Ada. When the day came to integrate our code (including that of our partners and contract vendors) in the lab, we were all shocked to find the system cycling, acquiring and tracking targets, not crashing as expected. We had to go back to our offices and retrieve the formal tests, nobody expected that level of robustness the first week, much less the first 15 minutes.

For longevity, my oldest production code is the Jovial and assembly I wrote for three Middle East mountaintop radars. I participated in the customer selloff in 1984, and 29 years later I received a call in December 2013 – no kidding – about helping the team to do an upgrade in 2014. It was still working(!). The only other longer lived code I know about is off world (satellites) or in the “Don’t touch it!” holy COBOL shrines.

Tangential moral of the story: regardless of language, put some damn comments in the code, like you are talking to somebody years later and explaining what the code does. That somebody might be you, years later.

Posted in Uncategorized | Leave a comment

Ansible now available as an RPM

Ansible is now available as an RPM from the EPEL repo. It pulls in all the dependencies needed by Ansible (Python libraries, libyaml and sshpass). The RPM provides Ansible version which includes a number of bug fixes including a security one (CVE-2016-3096). This is a lot easier to handle than fiddling with Python’s package manager of the day.

$ sudo yum install ansible
Package                         Arch                 Version                       Repository          Size
ansible                         noarch                      epel               2.9 M
Installing for dependencies:
PyYAML                          x86_64               3.10-3.1.el6                  base               157 k
libyaml                         x86_64               0.1.3-4.el6_6                 base                52 k
python-babel                    noarch               0.9.4-5.1.el6                 base               1.4 M
python-crypto                   x86_64               2.0.1-22.el6                  base               159 k
python-crypto2.6                x86_64               2.6.1-2.el6                   epel               513 k
python-httplib2                 noarch               0.7.7-1.el6                   epel                70 k
python-jinja2-26                noarch               2.6-3.el6                     epel               527 k
python-keyczar                  noarch               0.71c-1.el6                   epel               219 k
python-paramiko                 noarch               1.7.5-2.1.el6                 base               728 k
python-pyasn1                   noarch               0.0.12a-1.el6                 base                70 k
python-simplejson               x86_64               2.0.9-3.1.el6                 base               126 k
python-six                      noarch               1.9.0-2.el6                   base                28 k
sshpass                         x86_64               1.05-1.el6                    epel                19 k

Transaction Summary
Install      14 Package(s)

Total download size: 6.9 M
Installed size: 33 M
Is this ok [y/N]:
Posted in Uncategorized | Leave a comment