Wednesday, July 15, 2009

Funky hardware linux install failures and making redhat enterprise install/rescue isos

So, at work we've got several racks worth of 5 year old IBM 8832 blades, with dual Intel Xeon 3.2 Ghz cpus, Broadcom CSB6 ide raid controllers based on an LSI Logic 1030 chip, and Broadcom CIOB-E ethernet adaptors.

The problem is that the IDE raid is old and whacky enough that the standard redhat boot images don't successfully auto-load the correct modules when it scans the hardware, ergo the raid set isn't seen, only the component drives, and whacky and evil things happen.

However, it looks like these controllers work happily with the LSI fusion mptbase/mptspi/mptscsih series of modules. So, the challenge is to get a base redhat install to load these modules, so I can remotely kickstart the blades for provisioning.

To document my first, failed experiment, here's what I did.

Grabbed as a starting point the boot iso image from images/boot.iso in the unpacked redhat 5.3 cd's.

Mounted the image
  • mount -o loop boot.iso /mnt/iso

and copied the contents to a scratch area to work with
  • mkdir /var/tmp/iso ; cp -rv /mnt/iso/* /var/tmp/iso

Copied the most recent kernel my current box happened to be running into the new iso directory
  • cp -v /boot/vmlinuz-$(uname -r) /mnt/iso/isolinux/vmlinuz

Now, I also have to re-create an initrd for this iso with the appropriate modules pre-loaded. I wanted to leave most of the auto-probing in place, though, so I'm only going to specify --preload= for the mdt* modules I care about.

First, extract the initrd image, and get the list of modules it currently has. This is a two step process since the modules appear to be a compressed cpio image inside the initrd, which is itself a compressed cpio image.
  • cd /var/tmp ; gunzip -c iso/isolinux/initrd.img | cpio -iv '*modules.cpv';
  • modules=$(gunzip -c modules/modules.cpv | cpio -it | cut -d'\' -f3 | sed -e 's/\.ko$//' | sort);

note I'm using bash's embedded command syntax $( ) instead of the traditional backticks above.

Now the shell var 'modules' has a list of modules. I still need a few preloaded modules, and to convert the module list into a useful '--with=... --with=...' argument for mkinitrd
  • with="--with=$(echo $modules | sed -e 's/ /\-\-with=/g')"
  • preload="--preload=mptscsih --preload=mptsdi --preload=mptbase"

Now, finally, make the initrd:
  • mkinitrd -f -v $with $preload iso/isolinux/initrd.img $(uname -r)

And package it up into an iso:
  • mkisofs -J -R -T -v -b isolinux/isolinux.bin -c isolinux/boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table -o blade-rhel53-boot.iso iso/

The problem is, when I do a "linux rescue" boot using this cd image, it still didn't load the mpt* modules! Can anyone point me to what I'm doing wrong?

Thanks!
-- Pat

Edit:

Re-reading this this morning, I noted several typos in the commands as I typed them up last night. Also, the fonts looked like crap.

In the interest of hopefully better accuracy, here's just a (trimmed) copy and paste of my shell history:

  • 2103 mkdir iso; cp -rv /mnt/iso/* iso
  • 2109 gunzip -c iso/isolinux/initrd.img | cpio -idv "*modules.cgz"
  • 2112 modules=$(gunzip -c modules/modules.cgz | cpio -it | cut -d'/' -f3 | sed -e 's/\.ko$//' | sort)
  • 2113 echo $modules
  • 2116 with="--with=$(echo $modules | sed -e 's/ / \-\-with=/g')"
  • 2117 echo $with
  • 2120 uname -a
  • 2121 cp /boot/vmlinuz-2.6.18-128.1.10.el5 iso/isolinux/vmlinuz
  • 2123 preload="--preload=mptscsih --preload=mptspi --preload=mptbase"
  • 2124 sudo mkinitrd -f -v $with $preload iso/isolinux/initrd.img 2.6.18-128.1.10.el5
  • 2129 chmod u+w iso/isolinux/isolinux.bin
  • 2130 mkisofs -J -R -T -v -b isolinux/isolinux.bin -c isolinux/boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table -o blade-rhel53-boot.iso iso/


Saturday, July 11, 2009

St John's Block party review - Cloud Cult and Rince na Chroi Irish Dancing

Okay, enough of this tedious LDAP shite, I'm not getting anything done on that series anyway. So, here's something a bit more fun.

I just got home from the 2009 St John's Block Party and they had another really good line up this year. I donno who puts together the music for them, but I'm continually impressed.

While I didn't get to see one of my old favorites, Boiled in Lead, this year, two really fun acts were Rince na Chroi (pronounced Rink-a na Cree) and Cloud Cult.

Rince na Chroi had some good Irish music from two of the members of Two Tap Trio, played on guitar, penny whistle, and flute. About half the dances were performed in soft shoes, and half in tap. The apx 12-15 dancers were all students at the 170 student plus school, ranging from older teens to about 8 year olds. There were two younger boys, but the majority were girls.

Irish dance is fascinating because of the intricate and amazingly energetic footwork, yet the torso is nearly completely still. The dancers did some really neat and well choreographed and coordinated movements, too, joining and releasing hands and dancing around each other in patterns and figures.

I tried to grab a few short videos of some of the dances, and put them up on qik.com but fair warning -- the quality is crappy. (I'm really disappointed in the qik application for android, but I'll add that to the android / g1 phone review I'm writing)

The other group I spent a lot of time on was Cloud Cult. My end opinion: their status as one of the, if not the, current top rock band in the twin cities area is well deserved. I'm really glad I got to see them, because I wasn't able to stay for their gig last year at the block party, and they're stopping touring after this summer. (I'm guessing that might have to do with the band's lead singer and his wife, also in the band, expecting a child. Congrats to them!)

For those unfamiliar with them, well, start with this youtube video of one of my favorite songs of theirs. Here's another good one of them live last year. Also here's a few still photos of them from their show tonight.

Musically, they do several things I like - they range from quiet, nearly spiritual strings which creep up my spine in tingles to eardrum cracking jam sessions that I could literally feel the wind produced by the subwoofers from 150 feet and a intervening crowd between. Their lyrics are imaginative and poetic. Their choice of instruments combinations and musical styles is eclectic, but original and effective - combining brass, strings, glockenspiel, voice, synthesiser, and traditional rock driving bass, guitar, and drums. They occasionally drop back into a-cappella or minimal accompaniment with great harmonies.

Additionally, they did some neat stuff in their live show - most of the songs were cleanly segued one into the next with no pause, making it non-obvious where and when the one song ended and the next began. This isn't entirely unknown, but is still unusual. Of course, they had great energy. The other cool and unusual thing was that two of the band members who are artists set up large canvasses and painted during the show, only taking small breaks to come in and sing parts or play an instrument. The paintings are apparently auctioned at the end of each show.

My recommendation: go see one of the last live shows this summer (on their web site). Make a strong effort. It's definitely worth it. Also buy their music -- I've greatly enjoyed listening to their albums, and you likely will, too.

-- Pat

Friday, April 3, 2009

Continuing our old LDAP

So I'm realizing as I'm writing these posts that this is really not quite the straightforward whitepaper style I'd originally envisioned, but is instead somewhat blended with chronological writing. As it turns out, a lot of what I'm talking about is stuff I worked on in rough chronological order.

Anyway, on to the next topic.

Updates to our old OpenLDAP service

So, obviously, my first choice was not to throw out the existing openldap servers wholesale. Instead, I first implemented a number of improvements to the service.

The obvious first improvment is to just do a software update. I upgraded the Redhat Enterprise Linux release from rev 3 to rev 4 on new hardware, and along with that, updated openldap from 2.1 to 2.2. This had no massive impact in and of itself, but proved a great learning experience for me. I had to really work to grok some of the basic LDAP concepts as well as the funky schema mods.

More importantly, now I was ready to start making more substantive changes and fixes.

New management tools

One of the first things I addressed were the management tools and internal procedures and documents (or lack thereof). I started by putting in an installation of phpLDAPAdmin 0.96, and making a few customizations for our locally cracked schema. Fortunately, PhpLDAPAdmin makes it easy to modify it's scripts to assign different userclasses to new accounts and groups, so it was fairly trivial to use our customized objectclasses.

Procedurally, though, I still had to establish a consistant procedure for most of our common issues: mostly creating new accounts and groups and provisioning users to machines. Specifically, new UID allocation was still an issue

Account Creation and UID/GID allocation

I began by cleaning up any UID or GID conflicts I could find. This was a tedious process that involved first some scripting to even pull out the conflicts, then finding any machines those conflicting accounts were on, changing one of the conflicts to a new number, and finally changing ownership on any files. There were less than a dozen instances of this problem, but the legwork still took several weeks. Now, I could hopefully proceed forward.

UID's in this system were still somewhat a mess - most of the accounts had an account specific group, but these were rarely the same GID as the UID. Also, this was not universal, many of the accounts had a shared primary GID. Having a purpose specific GID (e.g. staff, or webspher) works well if that person only ever does one thing, but as soon as they use or develop multiple applications the model falls apart.

Given how much time it took me to fix the UID conflicts of just a dozen or so instances, I was terrified of ever attempting to fix the apx 500 accounts in our old LDAP. There were other messes as well, for instance, often a person would have multiple accounts.

I decided to let that problem sit for now, and concentrate on the procedure for creating new accounts. I started by stating a policy of creating an account specific group for each account, and insuring they woulld have the same names and the same UID and GID numbers. I allocated new UID/GID ranges for accounts owned by real people, accounts used for applications (e.g. websphere, oracle) and non-account specific groups.

I investigated whether phpLDAPAdmin could be easily made to do these things for me as part of a single unified operation, and pretty much determined that it wouldn't be worth the time to make the attempt. Even attempting to just use phpLDAPAdmin's id number allocations would mess things up with several different id number ranges.

Ergo, manual procedures were the answer, at least for now. I made a script to pluck the next available uid and gid number pair from the appropriate range, and wrote a procedure instructing how to use that script combined with two or three different phpLDAPAdmin operations to create new accounts.

Home directory provisioning

Another improvement was to replace the old user home directory creation method. The original method was a perl script which directly used Net::LDAP to query the ldap servers and see what accounts needed home directories. It was scheduled to run nightly via cron. It had several limitations - it needed the Net::LDAP module and all its dependancies installed, and it needed to be configured separately for each host, and also separately from configuring the service search descriptors in /etc/ldap.conf. This opened opportunity for configuration mismatches. It also, of course, also doesn't cover the case of what to do when a user is deprovisioned.

One of the first things I considered was attempting to use pam_mkhomedir. That's a pam module which, if you log into a configured host where you do not have a home directory, it will attempt to create one for you. It's a decent partial solution. It doesn't cover any case where a home directory may be expected but the user hasn't logged in yet, such as mail delivery. It also still doesn't cover deprovisioning. Both of those could be lived with, but the kicker was that it wasn't universally available on all our target platforms. Only the various linuxes supported it, and only recent installations of those.

I ended up writing a much simpler script based on the operating system 'list the user registry' tool, 'getent' in linux and solaris, and 'lsuser' in aix, parsing the output, and creating home dirs if necessary. Like the original script, it was run from cron. This script resolves the perl dependancy issue, and effectivly shares the OS's service search descriptor configuration. It still shares the only run once daily and no deprovisioning limitations, though.

Password management


The last major change I made was to hook this LDAP up to the institution wide password sync service. To prevent passwords getting out of sync as well as some of the password changes problems I mentioned my earlier posts, I set an ACL such that users couldn't change their own passwords. This prevented people from using either the local system's 'passwd' command, or even 'ldappasswd' if they were really savvy. Instead, we began referring everyone to the our institutions central password change application.

This actually was a huge thing for us, and by itself significantly reduced our LDAP related oncall issues.

Going forward

After reaching this point in my improvements, though I stalled. There were still a large number of problems with this system that prevented expanding this to serve our Solaris and AIX hosts, or even continuing to expand it to serve other linux hosts.

  • The first and biggest issue is that I wasn't able to get our solaris or AIX to work with this LDAP at all.
  • Also, I was never able to update to more recent versions of phpLDAPAdmin. For whatever reason, I never got an upgrade to connect to the old LDAP.
  • This was needed because the existing phpLDAPAdmin had some serious performance issues
  • The existing UID's and GID's were still a mess.
  • I faced a difficult problem of being forced to do future UID leveling - No easy way out of UID conflicts between the existing LDAP and those same UIDs used for local accounts on non-LDAP machines we wanted to bring in.
  • Finally, a multi-master replication would be sweet

So, we still had quite a ways to go. More on this in future posts.

-- Pat

Saturday, March 28, 2009

Our old LDAP and it's problems

Okay, so I'm also a very irregular blogger. :-) Hopefully I can get this series done, though. I've vaguely promised to give a presentation on this, and should the askers ever get their act together and schedule me, it'd be nice to have it ready.

I promised to talk about several things this time, our old LDAP, a more detailed explanation of what we were trying to do, and why we were trying to do it, and the process we used to select our software. I probably won't finish them all in this post, but let's see how far I get.

Our old LDAP

Our old LDAP service consisted (actually still consists of) three Redhat Enterprise linux servers running OpenLDAP. It's still in place in an upgraded form as I work to migrate us from this old infrastructure to the new one. This is basically a litany of the issues we had with the old service

Replication issues

First, replication
. OpenLDAP, at least when this was installed pre my tenure, openldap had no mechanism for multi-master replication schemes. Nowadays there's syncrepl, but not then. This means that our old setup used a single master, and is vulnerable to missing important changes (e.g. password updates) if the master is down.

Custom Schema

Next up, poorly thought out Custom Schema. This LDAP used an additional attribute, userClass, to restrict account visibility to server machines via way of a nss service search descriptor. The method in general turns out to be about the best way to do this (more on this in later posts), but the problem was the uglyness used in shimming this attribute into a posixAccount objectclass object.

An early version of this shim overrode and extended the posixAccount objectclass via a custom schema. A later version (still in place) created a new object class 'ourAccount' as a SUP of pilotPerson, inetOrgPerson, and account , and declared all account objects as 'objectclass ourAccount' and 'objectclass posixAccount' (and a few others).

Both solutions are a bit ugly. First, 'userClass' might not be the best symantic match for the function it was used for - membership in one or more 'groups'. In particular, there's no easy, single stop way to see a list of all userClass's, and there's advantages to having both a reverse and forward map (e.g. posixgroup/groupOfNames *and* memberOf attributes), and userClass has no mechanism for this. Next, the schema hacks above were a bit ugly and worse, not documented anywhere.

My current one, I admit, still uses a schema hack, but it does allow all the critical operations easily: what's a list of all the 'groups' of users, give me a list of everyone in a group, which groups is this user in, and is this user in this group. Further, it's documented. :-)

Management tool(s)

The set of management tools in place for this service were similarily hacked together, not well supported, and undocumented. There were two scripts in an obscure location to reset passwords and create new accounts. The create new account script also attempted to create an email account on an IMAP server that never really existed. There was also X11 LDAP management GUI on a seperate server which didn't manage any of the custom schema elements, nor, because of the custom schema, could be used to create any new objects. Finally there was a half broken perl script intended to be placed on ldap client hosts to create newly provisioned user's home directories.

Another serious issue is that these tools had no provision for doing proper UID / GID allocation for new accounts. In fact, we had several accounts that had the same UID's and groups that had the same GID's. Also there was no provision for automating user addition and, importantly user deactivation and retirement.

There were also a few outright bugs. The worst was the way password updates were (or weren't) managed. The system depended on each client's 'passwd' command being properly configured via pam_ldap such that it would send password updates to the LDAP replica, get a referral to the master, follow that to the master, bind as the password changing user, and update the userPassword attribute. In addition to the obvious possibility for breakage with any of that configuration, there were two other serious problems.

First, not all systems properly followed the referal, and thus might only update the password in the local replica, or hopefully fail to do so so at least the user got an error instead of an inconsistant password depending on replica.

Second, none of the implementations of pam_ldap bothered to update the 'shadowLastChanged' attribute. This meant that if the user was changing their password due to an expired password, which forced a password change, then no matter how often they updated their password it would always remain expired. This caused no end of oncall support issues for us.

All right, I'm about blogged out. More later on our old LDAP service and other stuff, too, like some of the things we did to make our old LDAP more usable, and our attempts to make it more widespread than just our linux boxes.

-- Pat

Saturday, January 24, 2009

Late blooming blogger - beginning of our LDAP saga

I don't know if anyone else is like this, but I, at least, don't follow blogs. I search them. That is, when I'm interested in a specific topic, I'll google it. Often, I find an interesting blog post or two which I'll read, but I never bother to go back and scan those blogs again.

Ergo, I'd never bothered to blog before largely because I'd not anything to say worth anyone's time. I'd always thought that, to be worth reading, I should have something informative to say.

Well, I do.

Specifically, I thought it'd be helpful if I were to document a bunch of what I've gone through to create our heterogeneous unix LDAP environment at work. There was a pretty fair amount of patching stuff together in ways that weren't documented anywhere. So, I'm going to start a series of blog posts, titled, roughly:

LDAP in a heterogeneous environment - What they don't tell you.

First up, some background on our situation.

I work as a unix admin for a medium sized company - we have about 600 unix servers split roughtly in thirds between AIX, Linux, and Solaris, with a few oddball HP-UX and VMS boxes. There's also about 3000 Windows servers, some Tandem non-stop, a pair of IBM mainframes, some IBM eSeries, a few racks of OS-X high performance compute cluster boxes, and lots of other odds and drabbles I'm only periferally involved in, if at all.

When I transfered to the Unix admin spot from a DBA team about 4 years ago, we had a partial LDAP infrastructure, consisting of a OpenLDAP master server and two slaves. However this was only used for our Linux boxes, so it had limited penetration, and was in other ways a bit of a mess:
  • Uids and gids were pretty much all over the map, including some duplications;
  • There was no automation of provisioning, users had to be manually created in LDAP, and home directories had to be manually created for users wherever they had a login;
  • Worse, there was not automation of account de-provisioning for when people transfered or left
  • Password changes were haphazard, didn't always work, and weren't synchronized with our institutional password management infrastructure
  • The schema loaded on the server just plain didn't work with most AIX and Solaris clients.
In short, I could make a few patches to limp this system along, but in the end I was looking at a complete replacement. To start with, I came up with the following set of goals and requirements:
  1. Central Authentication and Authorization database/repository

    1. All client platforms & os's can use (AIX, Solaris, Linux, HP-UX)
    2. Highly available
    3. Meets institution security and access policies
    4. Uses same username, passwords, and unix specific identifiers (uids, gids) across all systems

  2. Repository information is replicated from core sources, including:

    1. Username will be the historical Unix and Mainframe format, not Active Directory
    2. Real name & information (office, phone#...)
    3. Password
    4. Account disable and account removal

  3. Automation, application access and authorization functions offloaded from unix team

  4. Meets auditing requirements for user management

    1. Determine for each account what level of access and authorization it has across all systems
    2. Maintain access history for each account
      • security changes
      • access level history

  5. Only those accounts valid for the application(s) hosted on a specific machine are presented on that machine.

  6. Client machines can be configured without reference to user accounts contained in the repository. E.g. if account 'foo' is to be used on machine 'bar', no configuration file on 'bar' must have a specific entry for 'foo'.

  7. Secure on the wire network transactions between client machines and repository server
In later posts, I'll give more detailed information about our old LDAP setup, explain these goals a bit more and talk about the process we used to design and select our new architecture.

Bye for now.
-- Pat