Saturday, March 28, 2009

Our old LDAP and it's problems

Okay, so I'm also a very irregular blogger. :-) Hopefully I can get this series done, though. I've vaguely promised to give a presentation on this, and should the askers ever get their act together and schedule me, it'd be nice to have it ready.

I promised to talk about several things this time, our old LDAP, a more detailed explanation of what we were trying to do, and why we were trying to do it, and the process we used to select our software. I probably won't finish them all in this post, but let's see how far I get.

Our old LDAP

Our old LDAP service consisted (actually still consists of) three Redhat Enterprise linux servers running OpenLDAP. It's still in place in an upgraded form as I work to migrate us from this old infrastructure to the new one. This is basically a litany of the issues we had with the old service

Replication issues

First, replication
. OpenLDAP, at least when this was installed pre my tenure, openldap had no mechanism for multi-master replication schemes. Nowadays there's syncrepl, but not then. This means that our old setup used a single master, and is vulnerable to missing important changes (e.g. password updates) if the master is down.

Custom Schema

Next up, poorly thought out Custom Schema. This LDAP used an additional attribute, userClass, to restrict account visibility to server machines via way of a nss service search descriptor. The method in general turns out to be about the best way to do this (more on this in later posts), but the problem was the uglyness used in shimming this attribute into a posixAccount objectclass object.

An early version of this shim overrode and extended the posixAccount objectclass via a custom schema. A later version (still in place) created a new object class 'ourAccount' as a SUP of pilotPerson, inetOrgPerson, and account , and declared all account objects as 'objectclass ourAccount' and 'objectclass posixAccount' (and a few others).

Both solutions are a bit ugly. First, 'userClass' might not be the best symantic match for the function it was used for - membership in one or more 'groups'. In particular, there's no easy, single stop way to see a list of all userClass's, and there's advantages to having both a reverse and forward map (e.g. posixgroup/groupOfNames *and* memberOf attributes), and userClass has no mechanism for this. Next, the schema hacks above were a bit ugly and worse, not documented anywhere.

My current one, I admit, still uses a schema hack, but it does allow all the critical operations easily: what's a list of all the 'groups' of users, give me a list of everyone in a group, which groups is this user in, and is this user in this group. Further, it's documented. :-)

Management tool(s)

The set of management tools in place for this service were similarily hacked together, not well supported, and undocumented. There were two scripts in an obscure location to reset passwords and create new accounts. The create new account script also attempted to create an email account on an IMAP server that never really existed. There was also X11 LDAP management GUI on a seperate server which didn't manage any of the custom schema elements, nor, because of the custom schema, could be used to create any new objects. Finally there was a half broken perl script intended to be placed on ldap client hosts to create newly provisioned user's home directories.

Another serious issue is that these tools had no provision for doing proper UID / GID allocation for new accounts. In fact, we had several accounts that had the same UID's and groups that had the same GID's. Also there was no provision for automating user addition and, importantly user deactivation and retirement.

There were also a few outright bugs. The worst was the way password updates were (or weren't) managed. The system depended on each client's 'passwd' command being properly configured via pam_ldap such that it would send password updates to the LDAP replica, get a referral to the master, follow that to the master, bind as the password changing user, and update the userPassword attribute. In addition to the obvious possibility for breakage with any of that configuration, there were two other serious problems.

First, not all systems properly followed the referal, and thus might only update the password in the local replica, or hopefully fail to do so so at least the user got an error instead of an inconsistant password depending on replica.

Second, none of the implementations of pam_ldap bothered to update the 'shadowLastChanged' attribute. This meant that if the user was changing their password due to an expired password, which forced a password change, then no matter how often they updated their password it would always remain expired. This caused no end of oncall support issues for us.

All right, I'm about blogged out. More later on our old LDAP service and other stuff, too, like some of the things we did to make our old LDAP more usable, and our attempts to make it more widespread than just our linux boxes.

-- Pat