Monday, May 12, 2008
DataProtector 6 has a problem, continued
I posted last week about DataProtector and its Enhanced Incremental Backup. Remember that "enhincrdb" directory I spoke of? Take a look at this:

See? This is an in-progress count of one of these directories. 1.1 million files, 152MB of space consumed. That comes to an average file-size of 133 bytes. This is significantly under the 4kb block-size for this particular NTFS volume. On another server with a longer serving enhincrdb hive, the average file-size is 831 bytes. So it probably increases as the server gets older.
On the up side, these millions of weensy files won't actually consume more space for quite some time as they expand into the blocks the files are already assigned to. This means that fragmentation on this volume isn't going to be a problem for a while.
On the down side, it's going to park (in this case) 152MB of data on 4.56GB of disk space. It'll get better over time, but in the next 12 months or so it's still going to be horrendous.
This tells me two things:
Since it is highly likely that I'll be using DataProtector for OES2 systems, this is something I need to keep in mind.

See? This is an in-progress count of one of these directories. 1.1 million files, 152MB of space consumed. That comes to an average file-size of 133 bytes. This is significantly under the 4kb block-size for this particular NTFS volume. On another server with a longer serving enhincrdb hive, the average file-size is 831 bytes. So it probably increases as the server gets older.
On the up side, these millions of weensy files won't actually consume more space for quite some time as they expand into the blocks the files are already assigned to. This means that fragmentation on this volume isn't going to be a problem for a while.
On the down side, it's going to park (in this case) 152MB of data on 4.56GB of disk space. It'll get better over time, but in the next 12 months or so it's still going to be horrendous.
This tells me two things:
- When deciding where to host the enhincrdb hive on a Windows server, format that particular volume with a 1k block size.
- If HP supported NetWare as an Enhanced Incremental Backup client, the 4kb block size of NSS would cause this hive to grow beyond all reasonable proportions.
Since it is highly likely that I'll be using DataProtector for OES2 systems, this is something I need to keep in mind.
Wednesday, May 07, 2008
DataProtecter 6 has a problem
We're moving our BackupExec environment to HP DataProtector. Don't ask why, it made sense at the time.
Once of the niiiice things about DP is what's called, "Enhanced Incremental Backup". This is a de-duplication strategy, that only backs up files that have changed, and only stores the changed blocks. From these incremental backups you can construct synthetic full backups, which are just pointer databases to the blocks for that specified point-in-time. In theory, you only need to do one full backup, keep that backup forever, do enhanced incrementals, then periodically construct synthetic full backups.
We've been using it for our BlackBoard content store. That's around... 250GB of file store. Rather than keep 5 full 275GB backup files for the duration of the backup rotation, I keep 2 and construct synthetic fulls for the other 3. In theory I could just go with 1, but I'm paranoid :). This greatly reduces the amount of disk-space the backups consume.
Unfortunately, there is a problem with how DP does this. The problem rests on the client side of it. In the "$InstallDir$\OmniBack\enhincrdb" directory it constructs a file hive. An extensive file hive. In this hive it keeps track of file state data for all the files backed up on that server. This hive is constructed as follows:
The last real full backup I took of the content store backed up just under 1.7 million objects (objects = directory entries in NetWare, or IIRC inodes in unix-land). Yet the enhincrdb hive had 2.7 million objects. Why the difference? I'm not sure, but I suspect it was keeping state data for 1 million objects that no longer were present in the backup. I have trouble believing that we managed to churn over 60% of the objects in the store in the time I have backups, so I further suspect that it isn't cleaning out state data from files that no longer have a presence in the backup system.
DataProtector doesn't support Enhanced Incrementals for NetWare servers, only Windows and possibly Linux. Due to how this is designed, were it to support NetWare it would create absolutely massive directory structures on my SYS: volumes. The FACSHARE volume has about 1.3TB of data in it, in about 3.3 directory entries. The average FacStaff User volume (we have 3) has about 1.3 million, and the average Student User volume has about 2.4 million. Due to how our data works, our Student user volumes have a high churn rate due to students coming and going. If FACSHARE were to share a cluster node with one Student user volume and one FacStaff user volume, they have a combined directory-entry count of 7.0 million directory entries. This would generate, at first, a \enhincrdb directory with 7.0 million files. Given our regular churn rate, within a year it could easily be over 9.0 million.
When you move a volume to another cluster node, it will create a hive for that volume in the \enhincrdb directory tree. We're seeing this on the BlackBoard Content cluster. So given some volumes moving around, and it is quite conceivable that each cluster node will have each cluster volume represented in its own \enhincrdb directory. Which will mean over 15 million directory-entries parked there on each SYS volume, steadily increasing as time goes on taking who knows how much space.
And as anyone who has EVER had to do a consistency check of a volume that size knows (be it vrepair, chkdsk, fsck,or nss /poolrebuild), it takes a whopper of a long time when you get a lot of objects on a file-system. The old Traditional File System on NetWare could only support 16 million directory entries, and DP would push me right up to that limit. Thank heavens NSS can support w-a-y more then that. You better hope that the file-system that the \enhincrdb hive is on never has any problems.
But, Enhanced Incrementals only apply to Windows so I don't have to worry about that. However.... if they really do support Linux (and I think they do), then when I migrate the cluster to OES2 next year this could become a very real problem for me.
DataProtector's "Enhanced Incremental Backup" feature is not designed for the size of file-store we deal with. For backing up the C: drive of application servers or the inetpub directory of IIS servers, it would be just fine. But for file-servers? Good gravy, no! Unfortunately, those are the servers in most need of de-dup technology.
Once of the niiiice things about DP is what's called, "Enhanced Incremental Backup". This is a de-duplication strategy, that only backs up files that have changed, and only stores the changed blocks. From these incremental backups you can construct synthetic full backups, which are just pointer databases to the blocks for that specified point-in-time. In theory, you only need to do one full backup, keep that backup forever, do enhanced incrementals, then periodically construct synthetic full backups.
We've been using it for our BlackBoard content store. That's around... 250GB of file store. Rather than keep 5 full 275GB backup files for the duration of the backup rotation, I keep 2 and construct synthetic fulls for the other 3. In theory I could just go with 1, but I'm paranoid :). This greatly reduces the amount of disk-space the backups consume.
Unfortunately, there is a problem with how DP does this. The problem rests on the client side of it. In the "$InstallDir$\OmniBack\enhincrdb" directory it constructs a file hive. An extensive file hive. In this hive it keeps track of file state data for all the files backed up on that server. This hive is constructed as follows:
- The first level is the mount point. Example: enhincrdb\F\
- The 2nd level are directories named 00-FF which contain the file state data itself
The last real full backup I took of the content store backed up just under 1.7 million objects (objects = directory entries in NetWare, or IIRC inodes in unix-land). Yet the enhincrdb hive had 2.7 million objects. Why the difference? I'm not sure, but I suspect it was keeping state data for 1 million objects that no longer were present in the backup. I have trouble believing that we managed to churn over 60% of the objects in the store in the time I have backups, so I further suspect that it isn't cleaning out state data from files that no longer have a presence in the backup system.
DataProtector doesn't support Enhanced Incrementals for NetWare servers, only Windows and possibly Linux. Due to how this is designed, were it to support NetWare it would create absolutely massive directory structures on my SYS: volumes. The FACSHARE volume has about 1.3TB of data in it, in about 3.3 directory entries. The average FacStaff User volume (we have 3) has about 1.3 million, and the average Student User volume has about 2.4 million. Due to how our data works, our Student user volumes have a high churn rate due to students coming and going. If FACSHARE were to share a cluster node with one Student user volume and one FacStaff user volume, they have a combined directory-entry count of 7.0 million directory entries. This would generate, at first, a \enhincrdb directory with 7.0 million files. Given our regular churn rate, within a year it could easily be over 9.0 million.
When you move a volume to another cluster node, it will create a hive for that volume in the \enhincrdb directory tree. We're seeing this on the BlackBoard Content cluster. So given some volumes moving around, and it is quite conceivable that each cluster node will have each cluster volume represented in its own \enhincrdb directory. Which will mean over 15 million directory-entries parked there on each SYS volume, steadily increasing as time goes on taking who knows how much space.
And as anyone who has EVER had to do a consistency check of a volume that size knows (be it vrepair, chkdsk, fsck,or nss /poolrebuild), it takes a whopper of a long time when you get a lot of objects on a file-system. The old Traditional File System on NetWare could only support 16 million directory entries, and DP would push me right up to that limit. Thank heavens NSS can support w-a-y more then that. You better hope that the file-system that the \enhincrdb hive is on never has any problems.
But, Enhanced Incrementals only apply to Windows so I don't have to worry about that. However.... if they really do support Linux (and I think they do), then when I migrate the cluster to OES2 next year this could become a very real problem for me.
DataProtector's "Enhanced Incremental Backup" feature is not designed for the size of file-store we deal with. For backing up the C: drive of application servers or the inetpub directory of IIS servers, it would be just fine. But for file-servers? Good gravy, no! Unfortunately, those are the servers in most need of de-dup technology.
Tuesday, May 06, 2008
Being annoyed by rug?
Rug/zmd in SLES10-SP1 is still a headache maker. Novell knows this, but I strongly suspect that we'll have to wait until SLES11 before we get anything improved. OpenSUSE now has zypper which works pretty good, and I think you can do it in SLES if you want, but I haven't tried.
One of the chief annoyances of rug is that the zmd.db file kept in /var/lib/zmd/zmd.db gets corrupted far too easily. And when that happens, rug can take HOURS to return anything. If it returns anything at all.
The fix for it is easy, stop zmd, delete the zmd.db file, restart zmd. Since I'm doing this fairly often, I've whipped up a bash script to do it for me.
nukezmd
One of the chief annoyances of rug is that the zmd.db file kept in /var/lib/zmd/zmd.db gets corrupted far too easily. And when that happens, rug can take HOURS to return anything. If it returns anything at all.
The fix for it is easy, stop zmd, delete the zmd.db file, restart zmd. Since I'm doing this fairly often, I've whipped up a bash script to do it for me.
nukezmd
#!/bin/shSimple, to the point. Works.
#
# For killing ZMD when it is clearly hung. An all too often occurance.
#
declare PIDZMD
# First get the PID of ZMD
printf "Getting PID... "
let PIDZMD=`rczmd showpid`
printf "$PIDZMD\n"
# Then unconditionally kill it
printf "Killing zmd hard... \n"
kill -9 $PIDZMD
# Remove the old, inconsistent database
printf "Nuking old database... \n"
rm /var/lib/zmd/zmd.db
# Restart ZMD, which will build a new, consistent database
printf "Restarting ZMD\n"
rczmd start
Monday, May 05, 2008
Back-scatter spam
There was a recent slashdot post on this. We've had a fair amount of this sort of spam. And the victims are at pretty high levels of our organization, too. Last week the person who is responsible for us even having a Blackberry Enterprise Server asked us to figure out a way to prevent these emails from being forwarded to their blackberry. When a spam campaign is rolling, that person can get a bounce-message every 5-15 minutes for up to 8 hours, into the wee hours of the night. And that's just the mails that get PAST our anti-spam appliance. We set up some forwarding filters, but we haven't heard back about how effective they are.
This is a hard thing to guard against. You can't use the reputation of the sender IP address, since they're all legitimate mailers being abused by the spam campaign and are returning delivery service notices per spec. So the spam filtering has to be by content, which is a bit less effective. In one case, of the 950-odd DSN's we received for a specific person during a specific spam campaign, only 15 made it to the inbox. But that 15 was enough above what they normally saw (about 3 a day) that they complained.
Backscatter is a problem. However, our affected users have so far been sophisticated enough users of email to realize that this was more likely forgery than something wrong with their computer. So, we haven't been asked to "track down those responsible." This is a relief for us, as we've been asked that in the past when forged spams have come to the attention of higher level executives.
If it becomes a more wide-spread problem, we will be told to Do Something by the powers that be. Unfortunately, there isn't a lot that can be done. Blocking these sorts of DSNs is doable, but that's an expensive thing to manage in terms of people time. In 6-12 months we can expect the big anti-spam vendors to include options to just block DSN's uniformly, but until that time comes (and we have the budget for the added expenses) we'd have to do it through dumb keyword filters. Not a good solution. And it would also cause legitimate bounce messages to fail to arrive.
This is a hard thing to guard against. You can't use the reputation of the sender IP address, since they're all legitimate mailers being abused by the spam campaign and are returning delivery service notices per spec. So the spam filtering has to be by content, which is a bit less effective. In one case, of the 950-odd DSN's we received for a specific person during a specific spam campaign, only 15 made it to the inbox. But that 15 was enough above what they normally saw (about 3 a day) that they complained.
Backscatter is a problem. However, our affected users have so far been sophisticated enough users of email to realize that this was more likely forgery than something wrong with their computer. So, we haven't been asked to "track down those responsible." This is a relief for us, as we've been asked that in the past when forged spams have come to the attention of higher level executives.
If it becomes a more wide-spread problem, we will be told to Do Something by the powers that be. Unfortunately, there isn't a lot that can be done. Blocking these sorts of DSNs is doable, but that's an expensive thing to manage in terms of people time. In 6-12 months we can expect the big anti-spam vendors to include options to just block DSN's uniformly, but until that time comes (and we have the budget for the added expenses) we'd have to do it through dumb keyword filters. Not a good solution. And it would also cause legitimate bounce messages to fail to arrive.
Wednesday, April 30, 2008
Legal processes
Yesterday we received a Litigation Hold request. For those of you who don't know, this is the order given as part of a lawsuit ordering us to take steps to preserve data that could be used as part of the Discovery process of the suit. This is something that is becoming more and more common these days.
Our department has been pretty lucky so far. Since I started here in late 2003 this is the first Litigation Hold request we've had to deal with. We've had a few "public records requests" come through which are handled similarly, but this is the first one involving data that may be introduced under sworn testimony.
This morning we had an article pointed out to us by the Office of Finance Management at the state. WWU is a State agency, so OFM is in our chain of bureaucracy.
Case Law/Rule Changes Thrust Electronic Document Discovery into the Spotlight.
It's an older PDF, but it does give a high level view of the sorts of things we should be doing when these requests come in. One of the things that we don't have any processes for are the sequestration of held data and chain of custody preservation. We are now building those.
Guideline #4 has the phrase, "Consultants are particularly useful in this role," referring to overseeing the holding process and standing up before a court to testify that the data was handled correctly. This is very true! Trained professionals are the kind of people to know the little nuances that hostile lawyers can use to invalidate gathered evidence. Someone who has done a lot of reading and been to a few SANS classes is not that person.
Just because it is possible to self represent yourself in court as your own lawyer, doesn't make it a good idea. In fact, it generally is a very bad idea. Same thing applies to the above phrase. You want someone who knows what the heck they're doing when they climb up there onto the witness stand.
This is going to be an interesting learning experience.
Our department has been pretty lucky so far. Since I started here in late 2003 this is the first Litigation Hold request we've had to deal with. We've had a few "public records requests" come through which are handled similarly, but this is the first one involving data that may be introduced under sworn testimony.
This morning we had an article pointed out to us by the Office of Finance Management at the state. WWU is a State agency, so OFM is in our chain of bureaucracy.
Case Law/Rule Changes Thrust Electronic Document Discovery into the Spotlight.
It's an older PDF, but it does give a high level view of the sorts of things we should be doing when these requests come in. One of the things that we don't have any processes for are the sequestration of held data and chain of custody preservation. We are now building those.
Guideline #4 has the phrase, "Consultants are particularly useful in this role," referring to overseeing the holding process and standing up before a court to testify that the data was handled correctly. This is very true! Trained professionals are the kind of people to know the little nuances that hostile lawyers can use to invalidate gathered evidence. Someone who has done a lot of reading and been to a few SANS classes is not that person.
Just because it is possible to self represent yourself in court as your own lawyer, doesn't make it a good idea. In fact, it generally is a very bad idea. Same thing applies to the above phrase. You want someone who knows what the heck they're doing when they climb up there onto the witness stand.
This is going to be an interesting learning experience.
Thursday, April 17, 2008
And a gripe
2.5 hours is too freakin' long for "rug lu" to tell me which patches need application to this particular OES2 server. This needs fixing. I hope its fixed in SLES10 SP2.
Tuesday, April 15, 2008
Beta attitudes
One thing I've noticed while working on this beta is a change in attitude. Specifically, attitude regarding problems. I've run into problems so far that would have had me throwing things across the room by now. Yet, instead I get that 'ahah!' feeling and proceed to figure out how it went poink exactly like that. And then report it. That feels good.
All of my prior bug-hunting has been post-release, when we ran into issues in production. Now, it's in pre-release and the bugs and issues I find now will be fixed by release (or at least documented so people know to expect it to break that way).
It's an interesting change in attitude.
All of my prior bug-hunting has been post-release, when we ran into issues in production. Now, it's in pre-release and the bugs and issues I find now will be fixed by release (or at least documented so people know to expect it to break that way).
It's an interesting change in attitude.
Friday, April 11, 2008
On email, what comes in it
A friend recently posted the following:
Looking at statistics on the mail filter in front of Exchange, it looks like 5.9% of incoming messages for the last 7 days are clean. That is a LOT of messages getting dropped on the floor. This comes to just shy of 40,000 legitimate mail messages a day. For comparison, the number of mail messages coming in from Titian (the student email system, and unpublished backup MTA) has a 'clean' rate of 42.5%, or 2800ish legit messages a day.
People expect their email to be legitimate. Directory-harvesting attacks do constitute the majority to discrete emails; these are the messages you receive that have weird subjects, come from people you don't know, but don't have anything in the body. They're looking to see which addresses result in 'no person by that name here' messages and those that seemingly deliver. This is also why people unfortunate enough to have usernames or emails like "fred@" or "cindy@" have the worst spam problems of any organization.
As I've mentioned many times, we're actively considering migrating student email to one of the free email services offered by Google or Microsoft. This is because historically student email has had a budget of "free", and our current strategy is not working. The way it is not working is because the email filters aren't robust enough to meet expectation. Couple that with the expectation of effectively unlimited mail quota (thank you Google) and student email is no longer a "free" service. We can either spend $30,000 or more on an effective commercial anti-spam product, or we can give our email to the free services in exchange for valuable demographic data.
It's very hard to argue with economics like that.
One thing that you haven't seen yet in this article are viruses. In the last 7 days, our border email filter saw that 0.108% of incoming messages contain viruses. This is a weensy bit misleading, since the filter will drop connections with bad reputations before even accepting mail and that may very well cut down the number of reported viruses. But the fact remains that viruses in email are not the threat they once were. All the action these days are on subverted and outright evil web-sites, and social engineering (a form of virus of the mind).
This is another example of how expectation and reality differ. After years of being told, and in many cases living through the after-effects of it, people know that viruses come in email. The fact that the threat is so much more based on social engineering hasn't penetrated as far, so products aimed at the consumer call themselves anti-virus when in fact most of the engineering in them was pointed at spam filtering.
Anti-virus for email is ubiquitous enough these days that it is clear that the malware authors out there don't bother with email vectors for self-propagating software any more. That's not where the money is. The threat had moved on from cleverly disguised .exe files to cunningly wrought (in their minds) emails enticing the gullible to hit a web site that will infest them through the browser. These are the emails that border filters try to keep out, and it is a fundamentally harder problem than .exe files were.
The big commercial vendors get the success rate they do for email cleaning in part because they deploy large networks of sensors all across the internet. Each device or software-install a customer turns on can potentially be a sensor. The sensors report back to the mother database, and proprietary and patented methods are used to distill out anti-spam recipes/definitions/modules for publishing to subscribed devices and software. There is nothing saying that an open-source product can't do this, but the mother-database is a big cost that someone has to pay for and is a very key part of this spam fighting strategy. Bayesian filtering only goes so far.
And yet, people expect email to just be clean. Especially at work. That is a heavy expectation to meet.
80-90% of ALL email is directory harvesting attacks. 60-70% of the rest is spam or phishing. 1-5% of email is legit. Really makes you think about the invisible hand of email security, doesn't it?Those of us on the front lines of email security (which isn't quite me, I'm more of a field commander than a front line researcher) suspected as much. And yes, most people, nay, the vast majority, don't realize exactly what the signal-to-noise ratio is for email. Or even suspect the magnitude. I suspect that the statistic of, "80% of email is crap," is well known, but I don't think people even realize that the number is closer to, "95% of email is crap."
Looking at statistics on the mail filter in front of Exchange, it looks like 5.9% of incoming messages for the last 7 days are clean. That is a LOT of messages getting dropped on the floor. This comes to just shy of 40,000 legitimate mail messages a day. For comparison, the number of mail messages coming in from Titian (the student email system, and unpublished backup MTA) has a 'clean' rate of 42.5%, or 2800ish legit messages a day.
People expect their email to be legitimate. Directory-harvesting attacks do constitute the majority to discrete emails; these are the messages you receive that have weird subjects, come from people you don't know, but don't have anything in the body. They're looking to see which addresses result in 'no person by that name here' messages and those that seemingly deliver. This is also why people unfortunate enough to have usernames or emails like "fred@" or "cindy@" have the worst spam problems of any organization.
As I've mentioned many times, we're actively considering migrating student email to one of the free email services offered by Google or Microsoft. This is because historically student email has had a budget of "free", and our current strategy is not working. The way it is not working is because the email filters aren't robust enough to meet expectation. Couple that with the expectation of effectively unlimited mail quota (thank you Google) and student email is no longer a "free" service. We can either spend $30,000 or more on an effective commercial anti-spam product, or we can give our email to the free services in exchange for valuable demographic data.
It's very hard to argue with economics like that.
One thing that you haven't seen yet in this article are viruses. In the last 7 days, our border email filter saw that 0.108% of incoming messages contain viruses. This is a weensy bit misleading, since the filter will drop connections with bad reputations before even accepting mail and that may very well cut down the number of reported viruses. But the fact remains that viruses in email are not the threat they once were. All the action these days are on subverted and outright evil web-sites, and social engineering (a form of virus of the mind).
This is another example of how expectation and reality differ. After years of being told, and in many cases living through the after-effects of it, people know that viruses come in email. The fact that the threat is so much more based on social engineering hasn't penetrated as far, so products aimed at the consumer call themselves anti-virus when in fact most of the engineering in them was pointed at spam filtering.
Anti-virus for email is ubiquitous enough these days that it is clear that the malware authors out there don't bother with email vectors for self-propagating software any more. That's not where the money is. The threat had moved on from cleverly disguised .exe files to cunningly wrought (in their minds) emails enticing the gullible to hit a web site that will infest them through the browser. These are the emails that border filters try to keep out, and it is a fundamentally harder problem than .exe files were.
The big commercial vendors get the success rate they do for email cleaning in part because they deploy large networks of sensors all across the internet. Each device or software-install a customer turns on can potentially be a sensor. The sensors report back to the mother database, and proprietary and patented methods are used to distill out anti-spam recipes/definitions/modules for publishing to subscribed devices and software. There is nothing saying that an open-source product can't do this, but the mother-database is a big cost that someone has to pay for and is a very key part of this spam fighting strategy. Bayesian filtering only goes so far.
And yet, people expect email to just be clean. Especially at work. That is a heavy expectation to meet.
Wednesday, April 02, 2008
From Slashdot: Should users manage their own PC's?
Should IT Shops Let Users Manage Their Own PCs?
It's a very Web 2.0 concept. And there is some merit to it. Back in the day when workstation lock-downs were getting common in workplace settings (ZENworks was good for that), there was a debate about some of this. At my old job one thing we wanted to lock down was the wall-paper. That one thing would help reinforce the idea that this was a WORK Pc, not a home PC. The counter argument to this is that such user environment things are mostly harmless, so permitting them allows the lock-down to be less intrusive on the user.
This is another step in that direction. Workplaces have PC configuration standards for a variety of good reasons. You want all machines plugged into your network to not be festering hives of scum and malware, and these sorts of standards can prevent that. On the other end of the scale, high end users know the tools of their field better than your general IT desktop support person does and in theory can do more with the tools they know versus the tools forced upon them.
On the control end of the spectrum, you keep IT costs down by standardizing the configs in your enterprise. This keeps the Total Cost of Ownership down, a big thing for companies with the right internal costing controls (*nudge nudge*). One tech can support many more end users that way, since the range of things they support is kept to a minimum.
On the freedom end of the spectrum, the end user gets exactly the tools they want to do their job. They're happier that way. And since they support themselves, IT costs are controlled. One tech can support many more end users that way, since the bits they're supporting are significantly reduced.
The 'freedom' end of things runs smack into some standard industry practices, such as volume licensing and big-buy discounts. Dell, for instance, sells PCs cheaper if you buy them by the gross rather than in singles as users are onboarded. Specialized packages like AutoCAD also come cheaper if you buy them in packs of 10 rather than one at a time. Licenses all too often these days are timed and enforced, so you could have end users forgetting to renew the license on their Scrivener install and being non-productive for a few days while purchasing gets them a renewed license. The big 'endpoint management suites', what they seem to be calling the AntiVirus/Firewall package these days, all assume enterprise central control.
On the other hand, users liked being treated like reasoning, intelligent people who are capable of making choices about their work environment. This makes for happier workers.
Also working in this favor is the trend to webify everything in the workplace. The days when you have a whonking big file-server to store all the company data on are slowly going away, and being replaced with things like SharePoint (which can get just as big, don't get me wrong). The fights we've had in the past about how to roll out a new Novell Client to all our desktops would be moot in such an environment as the 'client' is called 'Firefox' (or Gnome, or Office 2007).
On the downside of the 'freedom' end of things is piracy. Tools like Zen Asset Management are there to make sure that the software in use is actually legal. In this freedom environment there is the significantly increased probability of someone bringing their 'backup' copy of something from home to install on their work machine and creating legal liability for the company if they get audited.
Another downside is interoperability problems. The Microsoft Office users create document-macros that the WordPerfect Office users can't run, and the OpenOffice users can't read the WordPerfect files. The Microsoft Office users publish things to SharePoint, where the OpenOffice users drop their stuff onto a handy WebDAV server somewhere. Office peer-pressure will still work on software selection to a point, even if you absolutely love Package Q for your day-to-day work you won't use it if the software everyone else in the office uses can't do a thing with it.
The trade-off here is balancing the chaos and increased direct costs 'freedom' will introduce to the IT environment versus the productivity bonuses and intangible benefits (morale). That will decidedly depend on the culture of the office, and what it is that they do. I know some people who would leave their current jobs just to get the freedom to order the machine they want and use the software they want to use, even if it means somewhat less benefits.
A friend of mine recently changed jobs. The old job was was Microsoft. Since Microsoft is a software development firm of some significant size, they try to dog-food their own stuff wherever possible; even if the tool is a poor fit for the task at hand. She spent a lot of time clubbing her software to do what it didn't really want to do, all the while knowing that there were two non-Microsoft packages that did exactly what she wanted. The new job is not with Microsoft, and the first day there they gave her an order sheet to order the software she wanted; they wanted results and trusted her to turn them in in an understandable format. Thus, the joys of freedom.
So, to answer the question, it depends. It depends on corporate culture to a significant degree, as well as the sector the company is in, as well as the work being done. In highly creative areas such as design, the benefits can be great. In highly regimented areas such as accounting, perhaps not so much or at least a high degree of freedom won't be worth the ultimate costs.
It's a very Web 2.0 concept. And there is some merit to it. Back in the day when workstation lock-downs were getting common in workplace settings (ZENworks was good for that), there was a debate about some of this. At my old job one thing we wanted to lock down was the wall-paper. That one thing would help reinforce the idea that this was a WORK Pc, not a home PC. The counter argument to this is that such user environment things are mostly harmless, so permitting them allows the lock-down to be less intrusive on the user.
This is another step in that direction. Workplaces have PC configuration standards for a variety of good reasons. You want all machines plugged into your network to not be festering hives of scum and malware, and these sorts of standards can prevent that. On the other end of the scale, high end users know the tools of their field better than your general IT desktop support person does and in theory can do more with the tools they know versus the tools forced upon them.
On the control end of the spectrum, you keep IT costs down by standardizing the configs in your enterprise. This keeps the Total Cost of Ownership down, a big thing for companies with the right internal costing controls (*nudge nudge*). One tech can support many more end users that way, since the range of things they support is kept to a minimum.
On the freedom end of the spectrum, the end user gets exactly the tools they want to do their job. They're happier that way. And since they support themselves, IT costs are controlled. One tech can support many more end users that way, since the bits they're supporting are significantly reduced.
The 'freedom' end of things runs smack into some standard industry practices, such as volume licensing and big-buy discounts. Dell, for instance, sells PCs cheaper if you buy them by the gross rather than in singles as users are onboarded. Specialized packages like AutoCAD also come cheaper if you buy them in packs of 10 rather than one at a time. Licenses all too often these days are timed and enforced, so you could have end users forgetting to renew the license on their Scrivener install and being non-productive for a few days while purchasing gets them a renewed license. The big 'endpoint management suites', what they seem to be calling the AntiVirus/Firewall package these days, all assume enterprise central control.
On the other hand, users liked being treated like reasoning, intelligent people who are capable of making choices about their work environment. This makes for happier workers.
Also working in this favor is the trend to webify everything in the workplace. The days when you have a whonking big file-server to store all the company data on are slowly going away, and being replaced with things like SharePoint (which can get just as big, don't get me wrong). The fights we've had in the past about how to roll out a new Novell Client to all our desktops would be moot in such an environment as the 'client' is called 'Firefox' (or Gnome, or Office 2007).
On the downside of the 'freedom' end of things is piracy. Tools like Zen Asset Management are there to make sure that the software in use is actually legal. In this freedom environment there is the significantly increased probability of someone bringing their 'backup' copy of something from home to install on their work machine and creating legal liability for the company if they get audited.
Another downside is interoperability problems. The Microsoft Office users create document-macros that the WordPerfect Office users can't run, and the OpenOffice users can't read the WordPerfect files. The Microsoft Office users publish things to SharePoint, where the OpenOffice users drop their stuff onto a handy WebDAV server somewhere. Office peer-pressure will still work on software selection to a point, even if you absolutely love Package Q for your day-to-day work you won't use it if the software everyone else in the office uses can't do a thing with it.
The trade-off here is balancing the chaos and increased direct costs 'freedom' will introduce to the IT environment versus the productivity bonuses and intangible benefits (morale). That will decidedly depend on the culture of the office, and what it is that they do. I know some people who would leave their current jobs just to get the freedom to order the machine they want and use the software they want to use, even if it means somewhat less benefits.
A friend of mine recently changed jobs. The old job was was Microsoft. Since Microsoft is a software development firm of some significant size, they try to dog-food their own stuff wherever possible; even if the tool is a poor fit for the task at hand. She spent a lot of time clubbing her software to do what it didn't really want to do, all the while knowing that there were two non-Microsoft packages that did exactly what she wanted. The new job is not with Microsoft, and the first day there they gave her an order sheet to order the software she wanted; they wanted results and trusted her to turn them in in an understandable format. Thus, the joys of freedom.
So, to answer the question, it depends. It depends on corporate culture to a significant degree, as well as the sector the company is in, as well as the work being done. In highly creative areas such as design, the benefits can be great. In highly regimented areas such as accounting, perhaps not so much or at least a high degree of freedom won't be worth the ultimate costs.
Tuesday, March 25, 2008
IPv6 vs IPX
In a session last week came the following comment from a presenter (paraphrased):
00008021:0002a540d0e1
Where 'node number' is the MAC address of the network card in question. It's up to the routers to figure out where network-numbers live, and advertised services issue full-network broadcasts to advertise said service, which is the primary reason that IPX just doesn't scale if WAN links are in the mix. But that's by the by.
IPv6 addresses work similarly:
2001:0db8:85a3:08d3:1319:8a2e:0370:7334
The last 48 bits are the MAC address and the bits ahead of it constitute the network number. Except... the IPv6 designers knew about the failings of IPX and worked around them. The last 48 bits don't have to be the MAC address, though as I understand it that address has to exist for each physical interface. Unlike IPX, IPv6 has the ability to have 'secondary' addresses. The lack of this ability was the main reason that Novell Cluster Services only worked on IP networks, which caused its own wave of grief when clustering was introduced in the NetWare 5.1 era. Secondary IPv6 numbers don't have to follow the MAC format, which in my opinion is a good thing!
Yes, when I first read about IPv6 addressing I had that same, "wow, this is just like IPX," moment the BrainShare presenter had. Only, more scalable, and more flexible.
How may of you in the room have been at this long enough to do IPX? Ok, great. Now how many of you have done anything with IPv6? Doesn't that look JUST like IPX?And he's right, to a point. IPX addresses are of the form network-number:node-number, such as:
00008021:0002a540d0e1
Where 'node number' is the MAC address of the network card in question. It's up to the routers to figure out where network-numbers live, and advertised services issue full-network broadcasts to advertise said service, which is the primary reason that IPX just doesn't scale if WAN links are in the mix. But that's by the by.
IPv6 addresses work similarly:
2001:0db8:85a3:08d3:1319:8a2e:0370:7334
The last 48 bits are the MAC address and the bits ahead of it constitute the network number. Except... the IPv6 designers knew about the failings of IPX and worked around them. The last 48 bits don't have to be the MAC address, though as I understand it that address has to exist for each physical interface. Unlike IPX, IPv6 has the ability to have 'secondary' addresses. The lack of this ability was the main reason that Novell Cluster Services only worked on IP networks, which caused its own wave of grief when clustering was introduced in the NetWare 5.1 era. Secondary IPv6 numbers don't have to follow the MAC format, which in my opinion is a good thing!
Yes, when I first read about IPv6 addressing I had that same, "wow, this is just like IPX," moment the BrainShare presenter had. Only, more scalable, and more flexible.
Labels: brainshare, clustering, netware, novell, sysadmin
Tuesday, March 18, 2008
BrainShare Tuesday
Today started off with a bit of panic, as I hadn't set my alarm. Me being a west-coaster, 7:20 (when I woke up) is an entirely reasonable time to get up as far as my body is concerned. Only, I needed to get dressed and breakfasted before my first session at 8:30. Aie! I had to eat quick, but I got there. Didn't get a chance to check work email, though.
ATT326: Advanced Linux Troubleshooting
An ATT, therefore hard to summarize. But I learned about a few new commands I didn't know about before. Like strace. And vimdiff.
TUT130: Challenges in Storage I/O in Virtualization
Another nice one, but an emergency at work (printing down in a dorm, during finals week) distracted me heavily during the first half of it. Which resulted in the following note in my notes:
TUT216: OES2 SP1 Architectural Overview
There is a LOT of new stuff in SP1.
TUT211: Enhanced Protocol Support in OES2 SP1
This is the session where they went into detail about the AFP and CIFS support. They said that netatalk, the existing AFP stack on Linux, gets really slow once you go over the 20 concurrent users. Whoa! I can soooo understand why Novell felt the need to make a new one.
Tonight was the night formerly known as 'Sponsor Night,' but has a new name now that everyone who gets a booth is no longer a 'sponsor'. Some are sponsors, some are exhibiters. I can't keep track. Anyway, today was their party. "World of Novellcraft!" Homage to vid-gaming.
Lots of Wii, lots of Rock Band, some Halo, lots of women dressed in Renaissance Festival gear getting their pictures taken by the 90%+ male audience. I've blogged before about my ambivalence about Sponsor Night. I lasted until about 7, when I came back to the hotel.
Tomorrow I have an actual LUNCH BREAK in my schedule! Ooo! AndSoul Asylum Soul Coughing Collective Soul plays the concert! I've been listening to two of their CD's for the past two months so I think I may even know a few songs by now.
ATT326: Advanced Linux Troubleshooting
An ATT, therefore hard to summarize. But I learned about a few new commands I didn't know about before. Like strace. And vimdiff.
TUT130: Challenges in Storage I/O in Virtualization
Another nice one, but an emergency at work (printing down in a dorm, during finals week) distracted me heavily during the first half of it. Which resulted in the following note in my notes:
NPIV looks really nifty. Look into it.NPIV being how you can use fibre-channel zoning to zone off VM's, rather than HBA's. Highly useful. I also learned about a neat new thing called Virtual Fabrics. Virtual Fabrics work kind of like VLANS for fabrics. You can segregate your fabrics into fabrics that share hardware but nothing else. Handy if your, say, Solaris admins don't want you mucking about with their zoning, while saving money through consolidated hardware.
TUT216: OES2 SP1 Architectural Overview
There is a LOT of new stuff in SP1.
- It will include eDir 8.8.4 (8.8.3 will ship this summer sometime)
- NCP and eDir will be fully 64-bit
- OES2 SP1 will be based on SLES SP2, which will be releasing about the same time
- AFP Support
- AFP 3.1
- Uses Diffie-Helman 1 for password exchange, meaning the 8-character password problem is solved.
- Fully SMP-safe
- Has cross-protocol locking with NCP. CIFS doesn't have cross-protocol locking yet, but IIRC, Samba does
- Does not need LUM enabled users
- CIFS Support
- NTLMv1, but v2 is a possibility if enough people ask, so file those enhancement requests!!
- CIFS is separate from Samba, therefore can not be used in conjunction with Domain Services for Windows
- As with AFP, fully SMP safe
- EDir 8.8.4
- LDAP auditing enhanced
- "newer auth protocols", but they didn't say what.
TUT211: Enhanced Protocol Support in OES2 SP1
This is the session where they went into detail about the AFP and CIFS support. They said that netatalk, the existing AFP stack on Linux, gets really slow once you go over the 20 concurrent users. Whoa! I can soooo understand why Novell felt the need to make a new one.
- The 8 character password limit has been fixed! They now support DH1 for passing passwords.
- The 'afptcp' daemon can use one password protocol at a time, so you can only use DH1, or one of the other three I can't remember.
- Support for OSX 10.1 and 10.2 is scanty, and 10.5 is limited but users may not notice anyway.
- Passwords will be case sensitive.
- Kerberos will be in a future release
- Performance is faster than NetWare, partly due to the ability to multi-thread
- Can register services by way of SLP
- Only supports NSS for the time being, the other Linux file-systems will be a future feature.
- Can support 500 concurrent users, and 1000+ in the future. This fits our current AFP loads.
- We can configure more about how it works than we could on NetWare, such as how many worker threads to spawn.
- Has meaninful debug logs!
- Has a new command, 'afpstat' that works like 'netstat' for giving a snapshot of afp connections.
Tonight was the night formerly known as 'Sponsor Night,' but has a new name now that everyone who gets a booth is no longer a 'sponsor'. Some are sponsors, some are exhibiters. I can't keep track. Anyway, today was their party. "World of Novellcraft!" Homage to vid-gaming.
Lots of Wii, lots of Rock Band, some Halo, lots of women dressed in Renaissance Festival gear getting their pictures taken by the 90%+ male audience. I've blogged before about my ambivalence about Sponsor Night. I lasted until about 7, when I came back to the hotel.
Tomorrow I have an actual LUNCH BREAK in my schedule! Ooo! And
Labels: brainshare, edir, linux, netware, novell, OES, sysadmin
Monday, March 17, 2008
Today at Brainshare
Monday. Opening day. I had trouble getting to sleep last night due to a poor choice of bed-time reading (don't read action, don't read action, don't read action). And had to get up at 6am body time in order to get breakfast before the morning keynote. There be zombies.
Breakfast was uninspired. As per usual, the hashbrowns had cooled to a gellid mass before I found everything and got a seat.
The Monday keynotes are always the CxO talks about strategy and where we're going. Today a mess of press releases from Novell give a good idea what the talks were about. Hovsepian was first, of course, and was actually funny. He gave some interesting tid-bits of knowledge.
Another thing he mentioned several times in association with Fossa and agility, is mergers and acquisitions. This is not something us Higher Ed types ever have to deal with, but it is an area in .COM land that requires a certain amount of IT agility to accommodate successfully. He mentioned this several times, which suggests that this strategy is aimed squarely at for-profit industry.
Also, SAP has apparently selected SLES as their primary platform for the SMB products.
Pat Hume from SAP also spoke. But as we're on Banner, and it'll take a sub-megaton nuclear strike to get us off of it, I didn't pay attention and used the time to send some emails.
Oh, and Honeywell? They're here because they have hardware that works with IDM. That way the same ID you use for your desktop login can be tied to the RFID card in your pocket that gets you into the datacenter. Spiffy.
ATT375 Advanced Tips & Tricks for Troubleshooting eDir 8.8
A nice session. Hard to summarize. That said, they needed more time as the Laptops with VMWare weren't fast enough for us to get through many of the exercises. They also showed us some nifty iMonitor tricks. And where the high-yield shoot-your-foot-off weapons are kept.
BUS202 Migrating a NetWare Cluster to OES2
Not a good session. The presenter had a short slide deck, and didn't really present anything new to me other than areas where other people have made major mistakes. And to PLAN on having one of the linux migrations go all lost-data on you. He recommended SAN snapshots. It shortly digressed into "Migrating a NetWare Cluster to Linux HA", which is a different session all together. So I left.
TUT215 Integrating Macintosh with Novell
A very good session. The CIO of Novell Canada was presenting it, and he is a skilled speaker. Apparently Novell has written a new AFP stack from scratch for OES2 Sp1, since NETATALK is comparatively dog slow. And, it seems, the AFP stack is currently out performing the NCP stack on OES2 SP1. Whoa! Also, the Banzai GroupWise client for Mac is apparently gorgeous. He also spent quite a long time (18 minutes) on the Kanaka client from Condrey Consulting. The guy who wrote that client was in the back of the room and answered some questions.
Breakfast was uninspired. As per usual, the hashbrowns had cooled to a gellid mass before I found everything and got a seat.
The Monday keynotes are always the CxO talks about strategy and where we're going. Today a mess of press releases from Novell give a good idea what the talks were about. Hovsepian was first, of course, and was actually funny. He gave some interesting tid-bits of knowledge.
- Novell's group of partners is growing, adding a couple hundred new ones since last year. This shows the Novell 'ecosystem' is strong.
- 8700 new customers last year
- Novell press mentions are now only 5% negative.
- High Capacity Computing
- Policy Engines
- Orchestration
- Convergence
- Mobility
Another thing he mentioned several times in association with Fossa and agility, is mergers and acquisitions. This is not something us Higher Ed types ever have to deal with, but it is an area in .COM land that requires a certain amount of IT agility to accommodate successfully. He mentioned this several times, which suggests that this strategy is aimed squarely at for-profit industry.
Also, SAP has apparently selected SLES as their primary platform for the SMB products.
Pat Hume from SAP also spoke. But as we're on Banner, and it'll take a sub-megaton nuclear strike to get us off of it, I didn't pay attention and used the time to send some emails.
Oh, and Honeywell? They're here because they have hardware that works with IDM. That way the same ID you use for your desktop login can be tied to the RFID card in your pocket that gets you into the datacenter. Spiffy.
ATT375 Advanced Tips & Tricks for Troubleshooting eDir 8.8
A nice session. Hard to summarize. That said, they needed more time as the Laptops with VMWare weren't fast enough for us to get through many of the exercises. They also showed us some nifty iMonitor tricks. And where the high-yield shoot-your-foot-off weapons are kept.
BUS202 Migrating a NetWare Cluster to OES2
Not a good session. The presenter had a short slide deck, and didn't really present anything new to me other than areas where other people have made major mistakes. And to PLAN on having one of the linux migrations go all lost-data on you. He recommended SAN snapshots. It shortly digressed into "Migrating a NetWare Cluster to Linux HA", which is a different session all together. So I left.
TUT215 Integrating Macintosh with Novell
A very good session. The CIO of Novell Canada was presenting it, and he is a skilled speaker. Apparently Novell has written a new AFP stack from scratch for OES2 Sp1, since NETATALK is comparatively dog slow. And, it seems, the AFP stack is currently out performing the NCP stack on OES2 SP1. Whoa! Also, the Banzai GroupWise client for Mac is apparently gorgeous. He also spent quite a long time (18 minutes) on the Kanaka client from Condrey Consulting. The guy who wrote that client was in the back of the room and answered some questions.
Labels: brainshare, clustering, netware, novell, OES, sysadmin, virtualization
Tuesday, February 26, 2008
The future of the IT career path
There was an article in Computerworld a week or so ago that just caught my eye.
IT career paths you never dreamed of
The short of it is that IT as we've known it, a separate stack, is being integrated into the general business functions. Things like software-as-a-service, outsourcing, and freakishly fast WAN pipes mean there is less call for people like internal application developers, systems analysts, and system administrators. Those that remain, have a decided focus on project management, and focus on the business.
I see some truth to this. I've known for years now that the kind of job I fit best in, only exists in organizations larger than a certain size. Organizations smaller than a certain size tend to be subject to, "the computer guy," being in charge of everything computery. WWU is large enough that I can specialize in one field, file-server maintenance and upkeep, without having to be 'the computer guy' to a bunch of people.
This also means that my desktop support skills have atrophied from where they once were. Since everyone thinks that, "working in computers," means in reality, "desktop support," I have a hard time convincing family that I only know a little more than they do about why their Thunderbird broke in just that way. Doctors have this problem too, I hear.
Anyway. The article mentions that newer job titles are including the word, "architect," in them. And I really agree with this, since any company needs people with an enterprise view of their IT infrastructure. I'm one of those people for Western, especially when it comes to the file servers. It is people like us who sheepdog consultants hired to implement new technologies.
Which brings up another thing about the article. The article is rather .COM centered, which I understand. Us .EDU types really do live in a different world (where ELSE are you going to get 4000 people pounding the exact same file server at the exact same time?). The idea of hiring consultants (very expensive temp workers) to do the heavy lifting during upgrades is something we laugh ourselves silly over, since we barely have the money to BUY the new upgrade (even with our hefty .EDU discounts) much less pay someone else to put it in for us. Something simpler like outsourcing 90% of our on-site helpdesk work through a SE Asian call-center and remote-control apps is something we could possibly do, but the union those helpdesk techs belong to would pitch a fit. The same thing applies for a contract service to manage printers. Similar sorts of things apply to the non-profits of the world (the .ORG world), though perhaps not the union angle.
But out there in the for-profit world, and the for-profits larger than SOHO or SMB, that's another story entirely. I don't know how much longer there is going to be a call for file-server jocks.
IT career paths you never dreamed of
The short of it is that IT as we've known it, a separate stack, is being integrated into the general business functions. Things like software-as-a-service, outsourcing, and freakishly fast WAN pipes mean there is less call for people like internal application developers, systems analysts, and system administrators. Those that remain, have a decided focus on project management, and focus on the business.
I see some truth to this. I've known for years now that the kind of job I fit best in, only exists in organizations larger than a certain size. Organizations smaller than a certain size tend to be subject to, "the computer guy," being in charge of everything computery. WWU is large enough that I can specialize in one field, file-server maintenance and upkeep, without having to be 'the computer guy' to a bunch of people.
This also means that my desktop support skills have atrophied from where they once were. Since everyone thinks that, "working in computers," means in reality, "desktop support," I have a hard time convincing family that I only know a little more than they do about why their Thunderbird broke in just that way. Doctors have this problem too, I hear.
Anyway. The article mentions that newer job titles are including the word, "architect," in them. And I really agree with this, since any company needs people with an enterprise view of their IT infrastructure. I'm one of those people for Western, especially when it comes to the file servers. It is people like us who sheepdog consultants hired to implement new technologies.
Which brings up another thing about the article. The article is rather .COM centered, which I understand. Us .EDU types really do live in a different world (where ELSE are you going to get 4000 people pounding the exact same file server at the exact same time?). The idea of hiring consultants (very expensive temp workers) to do the heavy lifting during upgrades is something we laugh ourselves silly over, since we barely have the money to BUY the new upgrade (even with our hefty .EDU discounts) much less pay someone else to put it in for us. Something simpler like outsourcing 90% of our on-site helpdesk work through a SE Asian call-center and remote-control apps is something we could possibly do, but the union those helpdesk techs belong to would pitch a fit. The same thing applies for a contract service to manage printers. Similar sorts of things apply to the non-profits of the world (the .ORG world), though perhaps not the union angle.
But out there in the for-profit world, and the for-profits larger than SOHO or SMB, that's another story entirely. I don't know how much longer there is going to be a call for file-server jocks.
Monday, February 04, 2008
Today's 18 year olds...
Over the time I've been here there has occasionally been a list posted in the break-room. This list is the, "Incoming freshman today...." list of things they know, experience, or haven't experienced. It contains things like:
Which got me thinking about a few things. One of the items that is frequently put forth about Kids These Days (tm) is that they don't KNOW anything, they just know how to FIND things. There is some debate about this, but it is a common sentiment. I believe that kids these days (KTD) have figured out keyword based searching, and the search engines have gotten good enough at mind-reading that arcane search incantations aren't needed nearly as often as they were in the past.
Before Google, there was AltaVista. This was an era of the internet where boolean search incantations were needed to really narrow down to what you wanted. I didn't switch to Google for a long time because Google didn't have the NEAR search term, which I used on AltaVista as a way to narrow results to be more relevant. I didn't know at the time that Google effectively threw that term in on every search.
Those of us who lived through that era of the internet built up searching skills. I remember some searches I did back then that were pretty complex. I can't remember the exact terms used, but they looked like this:
bootes AND (antaries OR proxima) AND (fulcrum NEAR pinnacle)
I had a logic class in college, so these sorts of parenthetical statements made sense to me. Still do, I just don't end up needing to uncork the boolean logic to find what I need anymore as the search engines have gotten good enough that I don't NEED to do it. I know google allows much of the above, but I haven't had to do it so I don't know the syntax for it.
So I posit that yes, KTD don't know anything, but neither are their search skills robust.
Which brings me to Novell. I got to thinking what a NetWare administrator in 1990 had to know to do their job, and how I could fit into such a hypothetical time.
Right now if I don't know the answer to a problem I have a few methods to figure it out.
As I see it, a NetWare admin of 1990 was on average more knowledgeable about their product than the NetWare admin of 2008. Such administrators avoided the cost of paying for support incidents by having the manuals in hard-copy form, and plonking down real money for CompuServe accounts. If I have a weird problem I'll hit up the Novell KB to see if there is a TID on it, then check the support forums to see if it is mentioned there, before I'll expend an incident on the thing. In time I've managed to teach myself how NetWare works in some very basic ways, simply by troubleshooting oddball problems. This is why I typically end up talking to backline support when I call in, unless the problem is a known issue in the private KB. My skills are probably on par with what was normal 'back in the day'.
I think this holds true for a lot of the tech field. Back then there was a lot of stuff you just had to KNOW. Or failing that, have spent the money to get the backup resources in place (manuals, support contracts). These days a base understanding of how things work is the key to phrasing the right search queries in the online knowledge bases, and less rote memorization (training) can be effective in solving a greater list of problems.
Prosthetic memory! Prosthetic training! The tools of geeks everywhere.
- Were born in 1990
- ...have never known life without cable or satellite TV.
- ...probably have never seen a rotary dial phone.
- ...have had internet access for most of their school life.
Which got me thinking about a few things. One of the items that is frequently put forth about Kids These Days (tm) is that they don't KNOW anything, they just know how to FIND things. There is some debate about this, but it is a common sentiment. I believe that kids these days (KTD) have figured out keyword based searching, and the search engines have gotten good enough at mind-reading that arcane search incantations aren't needed nearly as often as they were in the past.
Before Google, there was AltaVista. This was an era of the internet where boolean search incantations were needed to really narrow down to what you wanted. I didn't switch to Google for a long time because Google didn't have the NEAR search term, which I used on AltaVista as a way to narrow results to be more relevant. I didn't know at the time that Google effectively threw that term in on every search.
Those of us who lived through that era of the internet built up searching skills. I remember some searches I did back then that were pretty complex. I can't remember the exact terms used, but they looked like this:
bootes AND (antaries OR proxima) AND (fulcrum NEAR pinnacle)
I had a logic class in college, so these sorts of parenthetical statements made sense to me. Still do, I just don't end up needing to uncork the boolean logic to find what I need anymore as the search engines have gotten good enough that I don't NEED to do it. I know google allows much of the above, but I haven't had to do it so I don't know the syntax for it.
So I posit that yes, KTD don't know anything, but neither are their search skills robust.
Which brings me to Novell. I got to thinking what a NetWare administrator in 1990 had to know to do their job, and how I could fit into such a hypothetical time.
Right now if I don't know the answer to a problem I have a few methods to figure it out.
- Hit the online Novell Knowledge Base over at novell.com/support
- Hit the peer-support forums over at forums.novell.com (or nntp://forums.novell.com/ if you prefer old-school)
- Pay for a support incident
- Ask around the office
- Hit the peer-support forums over on CompuServe, which required a modem and a CompuServe account.
- See if the problem is mentioned in the book-shelf of manuals, which was a big investment to own.
- Pay for a support incident.
- Ask around the office.
As I see it, a NetWare admin of 1990 was on average more knowledgeable about their product than the NetWare admin of 2008. Such administrators avoided the cost of paying for support incidents by having the manuals in hard-copy form, and plonking down real money for CompuServe accounts. If I have a weird problem I'll hit up the Novell KB to see if there is a TID on it, then check the support forums to see if it is mentioned there, before I'll expend an incident on the thing. In time I've managed to teach myself how NetWare works in some very basic ways, simply by troubleshooting oddball problems. This is why I typically end up talking to backline support when I call in, unless the problem is a known issue in the private KB. My skills are probably on par with what was normal 'back in the day'.
I think this holds true for a lot of the tech field. Back then there was a lot of stuff you just had to KNOW. Or failing that, have spent the money to get the backup resources in place (manuals, support contracts). These days a base understanding of how things work is the key to phrasing the right search queries in the online knowledge bases, and less rote memorization (training) can be effective in solving a greater list of problems.
Prosthetic memory! Prosthetic training! The tools of geeks everywhere.
Friday, January 25, 2008
A needed patch.
Novell has released a patch for the "ConsoleOne sorting problem."
The sorting problem happens when you have eDir 8.8 installed. Suddenly C1 starts sorting things by creation date rather than as you've ever seen it before. This is... confusing. ConsoleOne 1.3h helped some of it for us, but not all. And now, we have a patch!
Let ConsoleOne Sort Correctly!
The sorting problem happens when you have eDir 8.8 installed. Suddenly C1 starts sorting things by creation date rather than as you've ever seen it before. This is... confusing. ConsoleOne 1.3h helped some of it for us, but not all. And now, we have a patch!
Let ConsoleOne Sort Correctly!
Saturday, January 19, 2008
Good migration
At home I just migrated the linux server to new hardware. This has to be one of the easiest migrations I've ever done for that service. Now just the obsessive tweaking needs doing, all the major functions are moved.
That server is running Slackware. I'm not using SuSE at home for a couple of reasons:
I got my first wireless access point in November of 2000. Way back then, they hadn't quite figured out all the short-cuts to cracking WEP so it required a certain amount of traffic to analyze. This was a Linksys B AP, and a Linksys wireless card. Together they had el-crappo for range (to today's standards).
With that in mind I segregated my network.
Internet <- Cisco 675 DSL -> Wired network <- Linux server -> Wireless network
Didn't have cable in our area yet back then. The Cisco handled everything I needed. Unfortunately, it was badly behaved. It had the nasty habit of ARPing through the whole dhcp range, one addr per second, continually.
At that point in time I had one wireless device. The always-on windows server was on the wired network, and the linux server configured to proxy things. So the only traffic on the wireless network was from my laptop; no ARP ARP ARP ARP ARP and no windows browse packets. In other words, it was a network that was hard to crack. Oh yeah, baby.
Fast forward a couple years. I move out here, we get cable instead of DSL.
Another year or two, and the 802.11b AP died so we moved to a G AP.
Another year, and I added a certain linux-based media server (wireless for long reasons) and my wife got a PowerBook.
The 10MB ethernet card in the back of that Linux machine (a Pentium 2 450MHz machine) was really... concerning me. Comcast is still under 10MB, but... it's the principle of the thing. It was a bloody ISA card for pete's sake.
So today I flattened the network. It's structured the same, but rather than have separate subnets I'm just using brctl to bridge the two; I like being able to easily sniff my wireless traffic. We no longer have an always-on Windows box. And WPA-PSK is a heckova lot harder to crack than WPA ever was. So, I figure it's safe. Plus, if the linux machine ever dies I only have to move one cable to get things back online.
Now the internet seems faster when browsing on the laptops. I guess that 10MB card was actually slowing things down a bit.
That server is running Slackware. I'm not using SuSE at home for a couple of reasons:
- I've been using Slack since college
- Diversity is good when figuring out how to run Linux
- Slackware doesn't have anything approaching YaST.
- Getting a new service online with Slackware takes about five times longer than it does with SuSE, but at the end of it you know how it bloody well works.
- It's easier to crib from existing config files that way.
I got my first wireless access point in November of 2000. Way back then, they hadn't quite figured out all the short-cuts to cracking WEP so it required a certain amount of traffic to analyze. This was a Linksys B AP, and a Linksys wireless card. Together they had el-crappo for range (to today's standards).
With that in mind I segregated my network.
Internet <- Cisco 675 DSL -> Wired network <- Linux server -> Wireless network
Didn't have cable in our area yet back then. The Cisco handled everything I needed. Unfortunately, it was badly behaved. It had the nasty habit of ARPing through the whole dhcp range, one addr per second, continually.
At that point in time I had one wireless device. The always-on windows server was on the wired network, and the linux server configured to proxy things. So the only traffic on the wireless network was from my laptop; no ARP ARP ARP ARP ARP and no windows browse packets. In other words, it was a network that was hard to crack. Oh yeah, baby.
Fast forward a couple years. I move out here, we get cable instead of DSL.
Another year or two, and the 802.11b AP died so we moved to a G AP.
Another year, and I added a certain linux-based media server (wireless for long reasons) and my wife got a PowerBook.
The 10MB ethernet card in the back of that Linux machine (a Pentium 2 450MHz machine) was really... concerning me. Comcast is still under 10MB, but... it's the principle of the thing. It was a bloody ISA card for pete's sake.
So today I flattened the network. It's structured the same, but rather than have separate subnets I'm just using brctl to bridge the two; I like being able to easily sniff my wireless traffic. We no longer have an always-on Windows box. And WPA-PSK is a heckova lot harder to crack than WPA ever was. So, I figure it's safe. Plus, if the linux machine ever dies I only have to move one cable to get things back online.
Now the internet seems faster when browsing on the laptops. I guess that 10MB card was actually slowing things down a bit.
Wednesday, January 16, 2008
NetWare library patches
Novell recently split the libc and clib patches for NetWare. For a long time patches like "nwlib6a" included both. Now, they're split.
This just caused me a problem. It turns out that if you have libcsp6b (the LibC patch) applied and not nwlib6k (the CLib patch), there is an abend possibility. It happened yesterday. It turns out that in that case, a badly formed network broadcast can cause an abend. This caused three of my six cluster nodes to fall on their butts at the same time. That was fun. Strange (but good) thing is, I had already applied both patches to these three servers but hadn't gotten around to rebooting them yet. So, by killing themselves they actually fixed the problem.
The abend, key details:
Heh heh heh. Oops.
And now a bit of history. Long time NetWare admins can ignore this part.
Q: Why are there two C libraries?
CLIB is the library NetWare started with. It began life in the dark and misty past, probably in the late 1980's. It is the deepest, darkest bowels of NetWare from the era when Novell was it when it came to office networking. Being so old, its APIs are very mature. Applications developed against CLIB generally speaking just plain work.
CLIB is also depreciated since it is highly proprietary, and doesn't play well with others. "Just plain works" in this instance means an assumption of 8.3 names, with kludging to support long file names if at all possible. CLIB applications have a tendency to have IPX dependencies for no good reason.
LIBC was created, IIRC, around the release of NetWare 5.0 when it became possible for NetWare to operate in a "pure IP" environment. LIBC was designed with the concept of POSIX semantics in mind, which CLIB was not. LIBC was created from scratch with long file name support. By now, as of NetWare 6.5 SP7, most of the NetWare kernel is written against LIBC rather than CLIB.
As an example of LIBC vs CLIB, take the 'MyWeb' service this blog is served by. When I did this the first time, it was on NetWare 6.0, using Apache 1.3. Apache 1.3 was linked against CLIB and was very stable. The service notes for the Apache Modules I needed to run to make it work made it clear that supporting long file-names on remote servers was something that only recently started working.
When the migration to NetWare 6.5 came around, it meant I had to migrate MyWeb to Apache 2.0. Apache 2.0 is linked against LIBC and used a different apache module to make things work. I had troubles. The LibC functions were not nearly as mature as their CLIB counterparts, and it showed. 3.5 years later things are now a lot more stable then back then.
This just caused me a problem. It turns out that if you have libcsp6b (the LibC patch) applied and not nwlib6k (the CLib patch), there is an abend possibility. It happened yesterday. It turns out that in that case, a badly formed network broadcast can cause an abend. This caused three of my six cluster nodes to fall on their butts at the same time. That was fun. Strange (but good) thing is, I had already applied both patches to these three servers but hadn't gotten around to rebooting them yet. So, by killing themselves they actually fixed the problem.
The abend, key details:
EIP in SERVER.NLM at code start +0015FD27hHeh heh heh. Oops.
And now a bit of history. Long time NetWare admins can ignore this part.
Q: Why are there two C libraries?
CLIB is the library NetWare started with. It began life in the dark and misty past, probably in the late 1980's. It is the deepest, darkest bowels of NetWare from the era when Novell was it when it came to office networking. Being so old, its APIs are very mature. Applications developed against CLIB generally speaking just plain work.
CLIB is also depreciated since it is highly proprietary, and doesn't play well with others. "Just plain works" in this instance means an assumption of 8.3 names, with kludging to support long file names if at all possible. CLIB applications have a tendency to have IPX dependencies for no good reason.
LIBC was created, IIRC, around the release of NetWare 5.0 when it became possible for NetWare to operate in a "pure IP" environment. LIBC was designed with the concept of POSIX semantics in mind, which CLIB was not. LIBC was created from scratch with long file name support. By now, as of NetWare 6.5 SP7, most of the NetWare kernel is written against LIBC rather than CLIB.
As an example of LIBC vs CLIB, take the 'MyWeb' service this blog is served by. When I did this the first time, it was on NetWare 6.0, using Apache 1.3. Apache 1.3 was linked against CLIB and was very stable. The service notes for the Apache Modules I needed to run to make it work made it clear that supporting long file-names on remote servers was something that only recently started working.
When the migration to NetWare 6.5 came around, it meant I had to migrate MyWeb to Apache 2.0. Apache 2.0 is linked against LIBC and used a different apache module to make things work. I had troubles. The LibC functions were not nearly as mature as their CLIB counterparts, and it showed. 3.5 years later things are now a lot more stable then back then.
Labels: clustering, netware, novell, sysadmin
Monday, December 17, 2007
Not dead.
Wow, last post was the 30th? Jeez. I was on vacation all last week, which accounts for some of it. And it's looking like I'll be out sick for at least a pair of days with a crud I got while wandering about. Not sharing that with work, nosir.
On my list of things to do during the winter inter-session is to get eDir 8.8 deployed in the production tree. I just need to have ALL the servers in the tree (all, not just replica holders due to backlink updates) up and talking when I do the first one, and that could take some scheduling. This is the first step to OES2, which will be deployed on the eDir servers first.
As soon as I get some new hardware, since they're getting old.
On my list of things to do during the winter inter-session is to get eDir 8.8 deployed in the production tree. I just need to have ALL the servers in the tree (all, not just replica holders due to backlink updates) up and talking when I do the first one, and that could take some scheduling. This is the first step to OES2, which will be deployed on the eDir servers first.
As soon as I get some new hardware, since they're getting old.
Friday, November 30, 2007
OES2 SP1 timing
Novell just posted the third draft of their OES2 Best Practices guide. Which you can locate here. In that guide is this text:
Another 21 months of a 32-bit operating system on the single biggest storage consumer on campus. We'll have at least one hardware refresh before then for some of the nodes, and... boy I hope they have NetWare drivers for that. The very limited testing I did with NetWare-in-Xen was not encouraging from a performance stand-point. If it looks like I'll have to deploy that way for the next servers we get in the cluster, I'll have to do more real testing to characterize the performance hit (if any). The idea of a 64-bit memory space for file-caching makes me drool. Not getting it for 21 months is painful.
That said, if Novell releases the eDirectory enabled AFP server for OES2-Linux outside of the service-pack I could still make the 2008 window. That's our only dependency for SP1
Domain Services for Windows, which is scheduled to ship with OES 2 SP1 (currently scheduled for late 2008), will also offer some clear advantages."Late 2008" means they WILL NOT have SP1 out by August of 2008. This means that the upgrade of our 6 node cluster to OES will have to wait until 2009. Grrarrr!
Another 21 months of a 32-bit operating system on the single biggest storage consumer on campus. We'll have at least one hardware refresh before then for some of the nodes, and... boy I hope they have NetWare drivers for that. The very limited testing I did with NetWare-in-Xen was not encouraging from a performance stand-point. If it looks like I'll have to deploy that way for the next servers we get in the cluster, I'll have to do more real testing to characterize the performance hit (if any). The idea of a 64-bit memory space for file-caching makes me drool. Not getting it for 21 months is painful.
That said, if Novell releases the eDirectory enabled AFP server for OES2-Linux outside of the service-pack I could still make the 2008 window. That's our only dependency for SP1
Wednesday, November 28, 2007
I/O starvation on NetWare, HP update
Last week I talked about a problem we're having with the HP MSA1500cs and our NetWare cluster. The problem is still there, of course. I've opened cases with both HP and Novell to handle this one. HP because I really thing that such command latencies are a defect, and Novell since they're having starvation issues with clusters.
This morning I got a voice-mail from HP, an update for our case. Greatly summarized:
Especially since I don't think I've made back-line on the Novell case yet. They're involved, but I haven't been referred to a new support engineer yet.
This morning I got a voice-mail from HP, an update for our case. Greatly summarized:
The MSA team has determined that your device is working perfectly, and can find no defects. They've referred the case to the NetWare software team.Or...
Working as designed. Fix your software. Talk to Novell.Which I'm doing. Now to see if I can light a fire on the back-channels, or if we've just made HP admit that these sorts of command latencies are part of the design and need to be engineered around in software. Highly frustrating.
Especially since I don't think I've made back-line on the Novell case yet. They're involved, but I haven't been referred to a new support engineer yet.
Labels: clustering, hp, msa, netware, novell, OES, storage, sysadmin
Monday, November 26, 2007
Adding attachments to an open HP Support case
I don't think this is documented anywhere. But I just learned how to add updates to the HP case-file. Including attachments.
To: support_am@hp.comAny attachments to it will be automatically imported into the case. LOOKING at the case itself is a lot more complicated, and I'm still not sure of the steps. But this should be of use to some of you.
Subject:
CASE_ID_NUM: [case number, such as 36005555555]
MESSAGE: [text]
Wednesday, November 21, 2007
I/O starvation on NetWare
The MSA1500cs we've had for a while has shown a bad habit. It is visible when you connect a serial cable to the management port on the MSA1000 controller, and doing a "show perf" after starting performance tracking. The line in question is "Avg Command Latency:", which is a measure of how long it takes to execute an I/O operation. Under normal circumstances this metric stays between 5-30ms. When things go bad, I've seen it as far as 270ms.
This is a problem with our cluster nodes. Our cluster nodes can seen LUNs on both the MSA1500cs and the EVA3000. The EVA is where the cluster has been housed since it started, and the MSA has taken up two low-I/O-volume volumes to make space on the EVA.
IF the MSA is in the high Avg Command Latency state, and
IF a cluster node is doing a large Write to the MSA (such as a DVD ISO image, or B2D operation),
THEN "Concurrent Disk Requests" in Monitor go north of 1000
This is a dangerous state. If this particular cluster node is housing some higher trafficked volumes, such as FacShare:, the laggy I/O is competing with regular (fast) I/O to the EVA. If this sort of mostly-Read I/O is concurrent with the above heavy Write situation it can cause the cluster node to not write to the Cluster Partition on time and trigger a poison-pill from the Split Brain Detector. In short, the storage heart-beat to the EVA (where the Cluster Partition lives) gets starved out in the face of all the writes to the laggy MSA.
Users definitely noticed when the cluster node was in such a heavy usage state. Writes and Reads took a loooong time on the LUNs hosted on the fast EVA. Our help-desk recorded several "unable to map drive" calls when the nodes were in that state, simply because a drive-mapping involves I/O and the server was too busy to do it in the scant seconds it normally does.
This is sub-optimal. This also doesn't seem to happen on Windows, but I'm not sure of that.
This is something that a very new feature in the Linux kernel could help out, that that's to introduce the concept of 'priority I/O' to the storage stack. I/O with a high priority, such as cluster heart-beats, gets serviced faster than I/O of regular priority. That could prevent SBD abends. Unfortunately, as the NetWare kernel is no longer under development and just under Maintenance, this is not likely to be ported to NetWare.
I/O starvation. This shouldn't happen, but neither should 270ms I/O command latencies.
This is a problem with our cluster nodes. Our cluster nodes can seen LUNs on both the MSA1500cs and the EVA3000. The EVA is where the cluster has been housed since it started, and the MSA has taken up two low-I/O-volume volumes to make space on the EVA.
IF the MSA is in the high Avg Command Latency state, and
IF a cluster node is doing a large Write to the MSA (such as a DVD ISO image, or B2D operation),
THEN "Concurrent Disk Requests" in Monitor go north of 1000
This is a dangerous state. If this particular cluster node is housing some higher trafficked volumes, such as FacShare:, the laggy I/O is competing with regular (fast) I/O to the EVA. If this sort of mostly-Read I/O is concurrent with the above heavy Write situation it can cause the cluster node to not write to the Cluster Partition on time and trigger a poison-pill from the Split Brain Detector. In short, the storage heart-beat to the EVA (where the Cluster Partition lives) gets starved out in the face of all the writes to the laggy MSA.
Users definitely noticed when the cluster node was in such a heavy usage state. Writes and Reads took a loooong time on the LUNs hosted on the fast EVA. Our help-desk recorded several "unable to map drive" calls when the nodes were in that state, simply because a drive-mapping involves I/O and the server was too busy to do it in the scant seconds it normally does.
This is sub-optimal. This also doesn't seem to happen on Windows, but I'm not sure of that.
This is something that a very new feature in the Linux kernel could help out, that that's to introduce the concept of 'priority I/O' to the storage stack. I/O with a high priority, such as cluster heart-beats, gets serviced faster than I/O of regular priority. That could prevent SBD abends. Unfortunately, as the NetWare kernel is no longer under development and just under Maintenance, this is not likely to be ported to NetWare.
I/O starvation. This shouldn't happen, but neither should 270ms I/O command latencies.
Labels: clustering, hp, msa, netware, novell, OES, storage, sysadmin
Monday, November 19, 2007
I didn't realize it was this bad.
A while back Novell held an online survey about YaST usage. They've just released results.
Right at the top, in the demographics section are the results for the 'gender' question.

Ow. Women are 2.3%? Jeez.
These sorts of surveys are FAR from scientific. But still, such a STRONG bias is rather disheartening. I know the BrainShare crowd is somewhere between 4:1 to 6:1 Men-to-Women (don't have exact numbers). That said, most of the women I meet there are there for either Identity Management or GroupWise. The audiences for sessions on high Linux geekery (like for Clusters or HA computing) are... very male.
Just looking at that chart makes me wince. Yeesh.
Right at the top, in the demographics section are the results for the 'gender' question.

Ow. Women are 2.3%? Jeez.
These sorts of surveys are FAR from scientific. But still, such a STRONG bias is rather disheartening. I know the BrainShare crowd is somewhere between 4:1 to 6:1 Men-to-Women (don't have exact numbers). That said, most of the women I meet there are there for either Identity Management or GroupWise. The audiences for sessions on high Linux geekery (like for Clusters or HA computing) are... very male.
Just looking at that chart makes me wince. Yeesh.
Thursday, November 15, 2007
Encryption & key demands
As some of you know, the UK has passed a law which authorizes jail time for people who refuse to turn over encryption keys. If I'm remembering right, 2-3 years. This is a bill that's been making the rounds for quite some time, and got passage as a terror bill. Nefarious elements have figured out that modern encryption technologies really can flummox even the US National Security Agency deep crack mainframes and they therefore use it. There was a reason that encryption technologies were classified a munition and therefore export-restricted.
Those of you who've been with Novell/NetWare long enough remember this. Back in the day the NICI and other PKI components came in three flavors, Domestic (strong, 128bit), International (weak, 40bit? 56bit?), and basic (none). Things have loosened up since then.
Part of the problem of encryption is that while the private keys may be strong, securing them is tricky. When the feds raid your house and grab every device capable of both digital computation and communication to throw into the evidence locker, their computer forensics people can get your private keys. However, if your private keys are further locked away, such as PGP, it won't do them much good. To gain access to your key-ring they'll need the pass-phrase.
That's where the new law in the UK comes in. Police have two options to figure out your pass-phrase. They can intercept it somehow, or they can point a jail term at your head and demand the the pass-phrase.
That doesn't work in the US thanks to the Bill of Rights, and the 5th Amendment. This is the amendment that states that you have a right to not self incriminate, and by extension this means that police can't force you to divulge information that could be detrimental to you. As it happens, the people who wrote this amendment had the English legal system in mind when they came up with the idea, what with us being an ex-colony and all that. So if you performed safe encryption handling, didn't write the pass phrase anywhere and made a point of making sure it never hit disk in the clear, the US Government can't penalize you for not telling them the pass phrase. A US law similar to the UK law would face a much harder judicial battle than it got in the UK.
Which isn't the case in the UK. As one crypto expert I spoke with once said, the UK law amounts to, "rubber-hose cryptography." Which is an allusion to the fact that a sufficient application of pain (i.e. torture) can cause someone to fork over their own encryption keys, which is a concern in certain totalitarian regimes.
The accepted response to 'rubber-hose' crypto methods is to use a 'duress key'. This key will either destroy the crypted data, or reveal harmless data (40GB of soft porn!). The problem with such a key is that it works best if such a key is not known to exist. Forensics analysis can show what kind of crypto is in use, and if that particular type supports the use of a duress key, the interrogators can work that into their own information extraction methods. Also, any forensics person worth their salt works on a COPY of the data (as the RIAA knows all too freaking well, digital data is very easy to duplicate), so having the duress key destroy the data isn't a loss. In a judicial framework, having the key given destroy the (copy of the) data can earn the person a, "hampering a lawful investigation," charge and even more jail time.
All that said and done, there are still PLENTY of ways for the US Government to gain access to pass-phrases. I've heard of at least one case where a key-logger was installed on a machine for the express purpose of intercepting the key-strokes of the pass-phrase. If the pass phrase exists in the physical realm in any way (outside of your head), they can execute search warrants on it. Some crypto programs don't handle pass-phrase handling well. Also, if you have a Word document that was crypted, then decrypted so you could view it, the temp files Word saves ever 5-10 minutes are in-the-clear and recoverable through sufficient disk analysis. The end-user needs to know about safe handling of in-the-clear data.
All of which is expensive work. If the Government can save several thousand dollars in tech time by simply asking you the pass phrase and throwing you in the clink if you don't give it, that's what they'll do. If the person under investigation is known to be very crypto savvy (uses a Linix machine, with an encrypted file-system that requires a hand-entered password to even load, and uses PGP or similar on top of that to defend against attacks when the file-system is mounted) it becomes WAY cheaper to go the Judicial route than the tech route.
Yeah, 2-3 years may be much better than the 20-life you'd face on a terrorism charge. But you'd be in custody the whole time, and they'll be spending that 2-3 years going over your encrypted data the hard way. And if actual actionable evidence surfaces to support a terrorism charge, you can bet your bippy that you'd be hauled into a court-house for a new trial, only this time facing 20-life. If you're in the UK. Here in the US they'll just keep you under surveillance until they get the pass phrase or enough other evidence to hold you down in custody and give them an excuse to throw everything you've ever touched into evidence lock-up.
Those of you who've been with Novell/NetWare long enough remember this. Back in the day the NICI and other PKI components came in three flavors, Domestic (strong, 128bit), International (weak, 40bit? 56bit?), and basic (none). Things have loosened up since then.
Part of the problem of encryption is that while the private keys may be strong, securing them is tricky. When the feds raid your house and grab every device capable of both digital computation and communication to throw into the evidence locker, their computer forensics people can get your private keys. However, if your private keys are further locked away, such as PGP, it won't do them much good. To gain access to your key-ring they'll need the pass-phrase.
That's where the new law in the UK comes in. Police have two options to figure out your pass-phrase. They can intercept it somehow, or they can point a jail term at your head and demand the the pass-phrase.
That doesn't work in the US thanks to the Bill of Rights, and the 5th Amendment. This is the amendment that states that you have a right to not self incriminate, and by extension this means that police can't force you to divulge information that could be detrimental to you. As it happens, the people who wrote this amendment had the English legal system in mind when they came up with the idea, what with us being an ex-colony and all that. So if you performed safe encryption handling, didn't write the pass phrase anywhere and made a point of making sure it never hit disk in the clear, the US Government can't penalize you for not telling them the pass phrase. A US law similar to the UK law would face a much harder judicial battle than it got in the UK.
Which isn't the case in the UK. As one crypto expert I spoke with once said, the UK law amounts to, "rubber-hose cryptography." Which is an allusion to the fact that a sufficient application of pain (i.e. torture) can cause someone to fork over their own encryption keys, which is a concern in certain totalitarian regimes.
The accepted response to 'rubber-hose' crypto methods is to use a 'duress key'. This key will either destroy the crypted data, or reveal harmless data (40GB of soft porn!). The problem with such a key is that it works best if such a key is not known to exist. Forensics analysis can show what kind of crypto is in use, and if that particular type supports the use of a duress key, the interrogators can work that into their own information extraction methods. Also, any forensics person worth their salt works on a COPY of the data (as the RIAA knows all too freaking well, digital data is very easy to duplicate), so having the duress key destroy the data isn't a loss. In a judicial framework, having the key given destroy the (copy of the) data can earn the person a, "hampering a lawful investigation," charge and even more jail time.
All that said and done, there are still PLENTY of ways for the US Government to gain access to pass-phrases. I've heard of at least one case where a key-logger was installed on a machine for the express purpose of intercepting the key-strokes of the pass-phrase. If the pass phrase exists in the physical realm in any way (outside of your head), they can execute search warrants on it. Some crypto programs don't handle pass-phrase handling well. Also, if you have a Word document that was crypted, then decrypted so you could view it, the temp files Word saves ever 5-10 minutes are in-the-clear and recoverable through sufficient disk analysis. The end-user needs to know about safe handling of in-the-clear data.
All of which is expensive work. If the Government can save several thousand dollars in tech time by simply asking you the pass phrase and throwing you in the clink if you don't give it, that's what they'll do. If the person under investigation is known to be very crypto savvy (uses a Linix machine, with an encrypted file-system that requires a hand-entered password to even load, and uses PGP or similar on top of that to defend against attacks when the file-system is mounted) it becomes WAY cheaper to go the Judicial route than the tech route.
Yeah, 2-3 years may be much better than the 20-life you'd face on a terrorism charge. But you'd be in custody the whole time, and they'll be spending that 2-3 years going over your encrypted data the hard way. And if actual actionable evidence surfaces to support a terrorism charge, you can bet your bippy that you'd be hauled into a court-house for a new trial, only this time facing 20-life. If you're in the UK. Here in the US they'll just keep you under surveillance until they get the pass phrase or enough other evidence to hold you down in custody and give them an excuse to throw everything you've ever touched into evidence lock-up.
Monday, November 05, 2007
HP support problems.
We had another unfortunate incident with HP support this morning. We found some critical infrastructure had quietly expired from warranty, so was not covered. How it is supposed to work is that when things near expiration we add them to a separate Support Contract we have with HP to cover stuff not on warranty.
One of the biggest problems we have is that HP Support verification requires two factor authentication. You need both the serial number of the device (and for multi-device systems like blade racks or SAN racks it isn't always clear which S/N you need) and the model number of the device (ditto, with multi-device systems). The brand new servers we've received have a handy tag on the front with both numbers, but devices older than about a year do NOT.
Having a single S/N key to support is not hard to do. Dell has been doing this for YEARS (the 'express service code'), so it can be done. It also makes the support verification problem a lot easier.
HP also used to inform us when major things were coming off of support. As my boss just pointed out, doing so is a revenue thing for them, as they were always able to talk us into paying them money to keep things supported. A couple years ago they stopped doing that, and since then we've had several instances of key machines quietly going unsupported.
My experiences with HP support:
It hasn't quite gotten to the point where I'll CALL them before trying to find things myself, but it is getting close. Their web-site is THAT BAD.
One of the biggest problems we have is that HP Support verification requires two factor authentication. You need both the serial number of the device (and for multi-device systems like blade racks or SAN racks it isn't always clear which S/N you need) and the model number of the device (ditto, with multi-device systems). The brand new servers we've received have a handy tag on the front with both numbers, but devices older than about a year do NOT.
Having a single S/N key to support is not hard to do. Dell has been doing this for YEARS (the 'express service code'), so it can be done. It also makes the support verification problem a lot easier.
HP also used to inform us when major things were coming off of support. As my boss just pointed out, doing so is a revenue thing for them, as they were always able to talk us into paying them money to keep things supported. A couple years ago they stopped doing that, and since then we've had several instances of key machines quietly going unsupported.
My experiences with HP support:
| General Web Support | Very bad. Hard to find information. Even HP techs have trouble |
| On-site Support | Very good |
| Phone Support | Pretty good |
| Downloading Drivers | Bad. Its on the web-site, so hard to find exactly what you're looking for. |
| Finding Documentation | Mixed. For some things like servers it is OK. For Storage things it is very bad. |
Labels: sysadmin
Monday, October 15, 2007
Peer-to-peer sharing
One feature that has shown up in some applications and widgets lately has gained some traction internally. That is the concept of peer to peer sharing of disk space without going through all the pain of getting things approved and formally set up. The general idea is this one.
I want to share U:\SharedStuff\ApacheGroup\ to five other users. U: is my home directory, which is actually map-rooted so I don't see the top level directory. So I go to a web page and tell it I want to share this directory, to these people, for this long. Go.
It struck me that this sort of thing can be engineered with NetWare and OES. The key components are eDirectory, NSS, and NetStorage.
The web server takes the request and translates $Path into a real path by referencing the HomeDirectory attribute of the user who requested the share. Then, using LDAP it creates two objects:
A Group Object
There is a small constellation of maintenance tasks that also need to be created, such as a janitor process to deal with expirations, a helpdesk view to track who has what shares, a historic view to see what shares got deleted recently that suddenly need to be back RIGHT NOW, something to interface this with whatever disk or directory quota systems are in use.
The use of NetStorage allows WebDAV to be used as an access method, which allows the shares to be seen. The really brave may be able to leverage DFS to create actual directory structures reflecting the shares in the actual directories so drive mappings can be used; unfortunately I have no idea if a DFS database that large is a good idea.
Users would love this. No need to go through management to get a directory set up on the shared space. You just set up and go. Great for adhoc groups, or small private gatherings.
Unfortunately, this sort of share model is one that a lot of sys-admins are familiar with. If you've ever had a chance to examine the network of a small business with under 15 users, all of whom call themselves 'not that good with computers', you know what I'm talking about. This model of sharing is the one that Windows for Workgroups was designed for, and is still the default mode for plain old WinXP. Excessive use of peer to peer sharing like that can lead to one unholy mess, especially if a key person leaves (or in the case of the Windows example, one hard drive crashes hard).
If left unchecked, you can get whole business processes designed with the assumption that [username] will never retire. That already happens to an alarming extent, but this would make the dependency more invisible to those of us charged with making it all work again when it breaks. You can have shared spaces that are business critical to the company living 100% inside a user's self-managed space, and vulnerable to deletion on termination of that employee.
This is all part of the balance we as system administrators have to keep between end user functionality, and data protection. Desktop techs fight a constant battle to get users to save data on the server where it is backed up, and Novell puts out things like iFolder to help that whole thing become more invisible. We created shared directories to draw a big line between 'my stuff' and 'us stuff'.
That said, data-access habits are changing all the time. My own boss prefers to email a 150KB Excel spreadsheet to all of us, even though all of us have ready access a shared directory setup just for that. SharePoint integrates with Office to make the web-server look like a file-server. We still have to adapt with the times.
User-directed sharing is something I can see as highly desirable among the student population and faculty as well. Among staff, I'm less sure its a good idea outside of the 'trivial' personal use we're allowed.
I want to share U:\SharedStuff\ApacheGroup\ to five other users. U: is my home directory, which is actually map-rooted so I don't see the top level directory. So I go to a web page and tell it I want to share this directory, to these people, for this long. Go.
It struck me that this sort of thing can be engineered with NetWare and OES. The key components are eDirectory, NSS, and NetStorage.
The web server takes the request and translates $Path into a real path by referencing the HomeDirectory attribute of the user who requested the share. Then, using LDAP it creates two objects:
A Group Object
- Created and named dynamically
- [AuxClass] Attribute with user-defined name
- [AuxClass] Attribute with the creator
- [AuxClass] Attribute with the expiry date
- Since this is eDirectory, group memberships apply immediately rather than taking a logout/login cycle to refresh the access token like in MS networks.
- Created & named dynamically
- Associated to the created group
- Assigned to the specified users
- This allows the share to show up in NetStorage
There is a small constellation of maintenance tasks that also need to be created, such as a janitor process to deal with expirations, a helpdesk view to track who has what shares, a historic view to see what shares got deleted recently that suddenly need to be back RIGHT NOW, something to interface this with whatever disk or directory quota systems are in use.
The use of NetStorage allows WebDAV to be used as an access method, which allows the shares to be seen. The really brave may be able to leverage DFS to create actual directory structures reflecting the shares in the actual directories so drive mappings can be used; unfortunately I have no idea if a DFS database that large is a good idea.
Users would love this. No need to go through management to get a directory set up on the shared space. You just set up and go. Great for adhoc groups, or small private gatherings.
Unfortunately, this sort of share model is one that a lot of sys-admins are familiar with. If you've ever had a chance to examine the network of a small business with under 15 users, all of whom call themselves 'not that good with computers', you know what I'm talking about. This model of sharing is the one that Windows for Workgroups was designed for, and is still the default mode for plain old WinXP. Excessive use of peer to peer sharing like that can lead to one unholy mess, especially if a key person leaves (or in the case of the Windows example, one hard drive crashes hard).
If left unchecked, you can get whole business processes designed with the assumption that [username] will never retire. That already happens to an alarming extent, but this would make the dependency more invisible to those of us charged with making it all work again when it breaks. You can have shared spaces that are business critical to the company living 100% inside a user's self-managed space, and vulnerable to deletion on termination of that employee.
This is all part of the balance we as system administrators have to keep between end user functionality, and data protection. Desktop techs fight a constant battle to get users to save data on the server where it is backed up, and Novell puts out things like iFolder to help that whole thing become more invisible. We created shared directories to draw a big line between 'my stuff' and 'us stuff'.
That said, data-access habits are changing all the time. My own boss prefers to email a 150KB Excel spreadsheet to all of us, even though all of us have ready access a shared directory setup just for that. SharePoint integrates with Office to make the web-server look like a file-server. We still have to adapt with the times.
User-directed sharing is something I can see as highly desirable among the student population and faculty as well. Among staff, I'm less sure its a good idea outside of the 'trivial' personal use we're allowed.
Tuesday, September 25, 2007
The perils of a manual process
Yesterday I found the root cause of a rather perplexing problem. We had a user, happily for me one of the other sysadmins at WWU, who couldn't get their eDir password changed. No matter how many times he ran the identity management process, his AD PW would change, but eDir would not even though the success on the event was good.
A word of note:
We do not use Novell Identity Management. We've built our own. When Novell first came out with DirXML 1.0, we already had the foundation of what we have right now. So, when I talk about IDM, I'm actually referring to our own self-built system not Novell's IDM.
To troubleshoot, I ran many tests. The longest one was to turn on dstrace logging on the root replica server, and push changes to the object. I'd push a change, watch the logs, then parse through the log for the user's object.
To figure out what that is, I needed to sniff packets. PKTSCAN to the rescue. On the IDM server I turned off connections to all but the server holding the Master replicas of everything. Then on the master replica server I loaded PKTSCAN. I turned on sniffing, make the change, wait 5 seconds just to be safe, turn off the sniff, save the sniff, and load the sniff in Wireshark. The two cases I tested:
Looking at the details of T=WWU, I saw that it had an aux class associated with it. It was posixAccount. Thus, was I illuminated.
This particular sysadmin requested to have his account 'turned on for linux'. Which is code for having the posixAccount aux-class associated and the uid, gid, cn, and shell attributes added. This is still a manual process for us since requests are few and far between, though that is changing. It would seem that when I did it, I right-clicked on the wrong object. Whoopsie poo! Easily fixed, though.
I removed the aux-class from the tree root object, and suddenly... IDM changes started applying to the right object! Hooray! I think the IDM code was keying off of commonName rather than CN for some reason, which is why the aux-class got in the way.
A word of note:
We do not use Novell Identity Management. We've built our own. When Novell first came out with DirXML 1.0, we already had the foundation of what we have right now. So, when I talk about IDM, I'm actually referring to our own self-built system not Novell's IDM.
To troubleshoot, I ran many tests. The longest one was to turn on dstrace logging on the root replica server, and push changes to the object. I'd push a change, watch the logs, then parse through the log for the user's object.
- Changing it via LDAP made a sync.
- Changing it via the IDM did not make a sync.
- Changing it via iManager made a sync.
- Changing it via ConsoleOne on the IDM server made a sync
To figure out what that is, I needed to sniff packets. PKTSCAN to the rescue. On the IDM server I turned off connections to all but the server holding the Master replicas of everything. Then on the master replica server I loaded PKTSCAN. I turned on sniffing, make the change, wait 5 seconds just to be safe, turn off the sniff, save the sniff, and load the sniff in Wireshark. The two cases I tested:
- Change the concurrent connections attribute through IDM
- Change the concurrent connections attribute through ConsoleOne on the IDM server
Looking at the details of T=WWU, I saw that it had an aux class associated with it. It was posixAccount. Thus, was I illuminated.
This particular sysadmin requested to have his account 'turned on for linux'. Which is code for having the posixAccount aux-class associated and the uid, gid, cn, and shell attributes added. This is still a manual process for us since requests are few and far between, though that is changing. It would seem that when I did it, I right-clicked on the wrong object. Whoopsie poo! Easily fixed, though.
I removed the aux-class from the tree root object, and suddenly... IDM changes started applying to the right object! Hooray! I think the IDM code was keying off of commonName rather than CN for some reason, which is why the aux-class got in the way.
Monday, September 24, 2007
Neat eDir trick
One thing that I learned at BrainShare years ago is that eDir 8.7 permits LDAP clients to register against events. Probably the most widely applicable devnet thing is the LDAP Classes for Java. From my understanding, this sort of technology is used in both Novell Identity Manager and NSure Audit.
So, what the heck is it? From the documentation:
Stuff like this is downright handy in identity management situations. If a change is made to "phoneNumber" in the Identity tree, that change can be trapped by the monitor, and propagated to the production eDir tree, Active Directory, and NIS+. What's now a batch process can be event based.
I'm not a java programmer so I'm limited in what *I* can do with it. However, I have coworkers who DO speak java, and can probably do wonderful things with it.
So, what the heck is it? From the documentation:
The event system extension allows the client to specify the events for which it wants to receive notification. This information is sent in the extension request. If the extension request specifies valid events, the LDAP server keeps the connection open and uses the intermediate extended response to notify the client when events occur. Any data associated with an event is also sent in the response. If an error occurs when processing the extended request or during the subsequent processing of events, the server sends an extended response to the client containing error information and then terminates the processing of the request.It's an extension to LDAP that Novell created to permit event monitoring. It monitors events in eDirectory, from object changes, to internal eDirectory statuses like obituary processing. For example, you can set up a connection and tell the LDAP server to tell you of all changes to the "member" attribute, and track all group modifications. Or track the "last login time" attribute, and create a robust login audit log.
Stuff like this is downright handy in identity management situations. If a change is made to "phoneNumber" in the Identity tree, that change can be trapped by the monitor, and propagated to the production eDir tree, Active Directory, and NIS+. What's now a batch process can be event based.
I'm not a java programmer so I'm limited in what *I* can do with it. However, I have coworkers who DO speak java, and can probably do wonderful things with it.
Tuesday, September 04, 2007
Expanding the EVA
Our EVA3000 is full. All shelves have disks in them. In order to add space we need to replace our existing 143GB drives with 300GB drives. This is a rather expensive way to gain more space, as that extra 157GB of space costs the same as 300GB of space. But, that's what we have to do.
And wow does it take a while.
First I have to ungroup the disk. This can take up to two days. Then I pull the drive, and put the new one in. And regroup on top of it, which takes another up to two days. All the group/ungroup operations are competing for I/O with regular production.
Total time to add 157GB to the SAN? Looks to be 3 days and change.
We need a newer EVA.
And wow does it take a while.
First I have to ungroup the disk. This can take up to two days. Then I pull the drive, and put the new one in. And regroup on top of it, which takes another up to two days. All the group/ungroup operations are competing for I/O with regular production.
Total time to add 157GB to the SAN? Looks to be 3 days and change.
We need a newer EVA.
Saturday, August 25, 2007
Measuring sysadmin productivity
There was another thread on Slashdot today that caught my attention:
http://ask.slashdot.org/askslashdot/07/08/25/1753220.shtml
The asker asked:
Productivity at its most abstract is the rate at which an employee adds value to an organization. The tricky part is determining how to measure that rate and the value itself. In manufacturing, it is easier as 'widgets-per-hour' is generally OK. IBM and Microsoft attempted to do this to programming back in the development phase for OS/2, and the infamous "KLOC", or, "thousand lines of code."
System Administration is something that doesn't lend itself well to such quantification. A significant part of our job is quite literally, fire-watch; do nothing until something breaks and then spring into action to contain and correct the damage. While we're waiting for something to break, we're also working on projects to get new or upgraded systems online.
What I have seen done is to have to account for every minute of my day. Every moment of my day has to be chargable against something; a project, a department, or other time-tracking tool. It is also my experience that such managers take a dim view of entries such as these:
9:50-10:00 Bathroom
11:45-12:00 Time-sheet entry
15:45-16:00 Time-sheet entry
The questioner asked, "what is the basic unit of productivity for an *nix admin?"
I could come up with a funny name for this fictional unit, but in essence there isn't one. To fully quantify an admin's productivity requires fully quantified metrics for:
http://ask.slashdot.org/askslashdot/07/08/25/1753220.shtml
The asker asked:
RailGunSally writes "I am a (strictly technical) member of a large *nix systems admin team at a Fortune 150. Our new IT Management Overlord is a hardcore bean-counter from hell. We in the trenches have been tasked with providing 'metrics' on absolutely everything from system utilization to paper clip recycling. Of course, measuring productivity is right up there at the top of the list. We're stumped as to a definition of the basic unit of productivity for a *nix admin. There is a school of thought in our group that holds that if the PHBs are simple enough to want to operate purely from pie charts and spreadsheets, then we should just graph some output from /dev/random and have done with it. I personally love the idea, but I feel the need for due diligence, so I put the question to the Slashdot community: How does one reasonably quantify admin productivity?"I don't have a "bean-couter from hell" boss, but this is a topic I've spent a bit of time thinking about at my last job. How to you measure productivity of a sysadmin? The question at previous job was how do you determine which employee holds more value than another. This is not an easy thing.
Productivity at its most abstract is the rate at which an employee adds value to an organization. The tricky part is determining how to measure that rate and the value itself. In manufacturing, it is easier as 'widgets-per-hour' is generally OK. IBM and Microsoft attempted to do this to programming back in the development phase for OS/2, and the infamous "KLOC", or, "thousand lines of code."
System Administration is something that doesn't lend itself well to such quantification. A significant part of our job is quite literally, fire-watch; do nothing until something breaks and then spring into action to contain and correct the damage. While we're waiting for something to break, we're also working on projects to get new or upgraded systems online.
What I have seen done is to have to account for every minute of my day. Every moment of my day has to be chargable against something; a project, a department, or other time-tracking tool. It is also my experience that such managers take a dim view of entries such as these:
9:50-10:00 Bathroom
11:45-12:00 Time-sheet entry
15:45-16:00 Time-sheet entry
The questioner asked, "what is the basic unit of productivity for an *nix admin?"
I could come up with a funny name for this fictional unit, but in essence there isn't one. To fully quantify an admin's productivity requires fully quantified metrics for:
- The impact of server and service downtime.
- The value gained from meetings.
- The seasonal variations in business (in our case, when are classes in session? When are finals? When do grades need to be reported? When are parents on campus? Things like that.)
- Bureaucrat