Wednesday, May 07, 2008

DataProtecter 6 has a problem

We're moving our BackupExec environment to HP DataProtector. Don't ask why, it made sense at the time.

Once of the niiiice things about DP is what's called, "Enhanced Incremental Backup". This is a de-duplication strategy, that only backs up files that have changed, and only stores the changed blocks. From these incremental backups you can construct synthetic full backups, which are just pointer databases to the blocks for that specified point-in-time. In theory, you only need to do one full backup, keep that backup forever, do enhanced incrementals, then periodically construct synthetic full backups.

We've been using it for our BlackBoard content store. That's around... 250GB of file store. Rather than keep 5 full 275GB backup files for the duration of the backup rotation, I keep 2 and construct synthetic fulls for the other 3. In theory I could just go with 1, but I'm paranoid :). This greatly reduces the amount of disk-space the backups consume.

Unfortunately, there is a problem with how DP does this. The problem rests on the client side of it. In the "$InstallDir$\OmniBack\enhincrdb" directory it constructs a file hive. An extensive file hive. In this hive it keeps track of file state data for all the files backed up on that server. This hive is constructed as follows:
  • The first level is the mount point. Example: enhincrdb\F\
  • The 2nd level are directories named 00-FF which contain the file state data itself
On our BlackBoard content store, it had 2.7 million files in that hive, and consumed around 10.5GB of space. We noticed this behavior when C: ran out of space. Until this happened, we've never had a problem installing backup agents to C: before. Nor did we find any warnings in the documentation that this directory could get so big.

The last real full backup I took of the content store backed up just under 1.7 million objects (objects = directory entries in NetWare, or IIRC inodes in unix-land). Yet the enhincrdb hive had 2.7 million objects. Why the difference? I'm not sure, but I suspect it was keeping state data for 1 million objects that no longer were present in the backup. I have trouble believing that we managed to churn over 60% of the objects in the store in the time I have backups, so I further suspect that it isn't cleaning out state data from files that no longer have a presence in the backup system.

DataProtector doesn't support Enhanced Incrementals for NetWare servers, only Windows and possibly Linux. Due to how this is designed, were it to support NetWare it would create absolutely massive directory structures on my SYS: volumes. The FACSHARE volume has about 1.3TB of data in it, in about 3.3 directory entries. The average FacStaff User volume (we have 3) has about 1.3 million, and the average Student User volume has about 2.4 million. Due to how our data works, our Student user volumes have a high churn rate due to students coming and going. If FACSHARE were to share a cluster node with one Student user volume and one FacStaff user volume, they have a combined directory-entry count of 7.0 million directory entries. This would generate, at first, a \enhincrdb directory with 7.0 million files. Given our regular churn rate, within a year it could easily be over 9.0 million.

When you move a volume to another cluster node, it will create a hive for that volume in the \enhincrdb directory tree. We're seeing this on the BlackBoard Content cluster. So given some volumes moving around, and it is quite conceivable that each cluster node will have each cluster volume represented in its own \enhincrdb directory. Which will mean over 15 million directory-entries parked there on each SYS volume, steadily increasing as time goes on taking who knows how much space.

And as anyone who has EVER had to do a consistency check of a volume that size knows (be it vrepair, chkdsk, fsck,or nss /poolrebuild), it takes a whopper of a long time when you get a lot of objects on a file-system. The old Traditional File System on NetWare could only support 16 million directory entries, and DP would push me right up to that limit. Thank heavens NSS can support w-a-y more then that. You better hope that the file-system that the \enhincrdb hive is on never has any problems.

But, Enhanced Incrementals only apply to Windows so I don't have to worry about that. However.... if they really do support Linux (and I think they do), then when I migrate the cluster to OES2 next year this could become a very real problem for me.

DataProtector's "Enhanced Incremental Backup" feature is not designed for the size of file-store we deal with. For backing up the C: drive of application servers or the inetpub directory of IIS servers, it would be just fine. But for file-servers? Good gravy, no! Unfortunately, those are the servers in most need of de-dup technology.

Labels: , , , ,


Tuesday, May 06, 2008

Being annoyed by rug?

Rug/zmd in SLES10-SP1 is still a headache maker. Novell knows this, but I strongly suspect that we'll have to wait until SLES11 before we get anything improved. OpenSUSE now has zypper which works pretty good, and I think you can do it in SLES if you want, but I haven't tried.

One of the chief annoyances of rug is that the zmd.db file kept in /var/lib/zmd/zmd.db gets corrupted far too easily. And when that happens, rug can take HOURS to return anything. If it returns anything at all.

The fix for it is easy, stop zmd, delete the zmd.db file, restart zmd. Since I'm doing this fairly often, I've whipped up a bash script to do it for me.

nukezmd
#!/bin/sh
#
# For killing ZMD when it is clearly hung. An all too often occurance.
#

declare PIDZMD

# First get the PID of ZMD

printf "Getting PID... "
let PIDZMD=`rczmd showpid`
printf "$PIDZMD\n"
# Then unconditionally kill it

printf "Killing zmd hard... \n"
kill -9 $PIDZMD

# Remove the old, inconsistent database

printf "Nuking old database... \n"
rm /var/lib/zmd/zmd.db

# Restart ZMD, which will build a new, consistent database

printf "Restarting ZMD\n"
rczmd start
Simple, to the point. Works.

Labels: , , ,


Monday, May 05, 2008

Linux @ Home

My laptop at home dual-boots between openSUSE and WinXP. There are a few reasons why I don't boot the Linux side very often, some of them work related. And, what the heck, here are the two reasons.

1: Wireless driver problems
I have an intel 3945 WLAN card. It works just fine in linux, well supported. What throws it for a loop, however, are sleep and hibernate states. It can go one, two, four, maybe five cycles through sleep before it will require a reboot in order to find the home wireless again. If it doesn't lock the laptop up hard. Since my usage patterns are heavily dependent upon Sleep mode, this is a major, major disincentive to keep the Linux side booted.

I understand the 2.6.25 kernel is a lot better about this particular driver. Thus, I wait with eager anticipation the release of openSUSE 11.0. This driver is currently the ipw3945 driver, and will eventually turn into iwl3945 driver once it comes down the pipe. What little I've read about it suggests that the iwl driver is more stable through power states.

2: NetWare remote console
I use rconip for remote console to NetWare. Back when Novell first created the IP-based rconsole, they also released rconj along side ConsoleOne to provide it. As this was written in Java, it was mind bogglingly slow. This little .exe file was vastly faster, and I've come to use it extensively. Unless I get Wine working, this tool will have to stay on my Windows XP partition. It works great, and I haven't found a good linux-based replacement yet.

Time has moved on. Hardware has gotten faster, and the 'java penalty' has reduced markedly. RconJ is actually usable, but I still don't use it. Plus, it would require me to install ConsoleOne onto my laptop. It's 32-bit, so that's actually possible, but I really don't want to do that.

The Remote Console through the Novell Remote Monitor (that service out on :8009) has a nice remote-console utility, but it also requires Java. I'm still biased against java, and java-on-linux still seems fairly unstable to me. I don't trust it yet. It also doesn't scale well. When I'm service-packing, it is a LOT nicer looking to have 6 rconip windows up than 6 browser-based NRM java-consoles open. Plus, rconip will allow me access to the server console if DS is locked, something that NRM can't do and is invaluable in an emergency.

Once the wireless driver problems are fixed, I'll boot the linux side much more often. Remote-X over SSH actually makes some of my remote management a touch easier than it is in WinXP. And if I really really need to use Windows, my work XP VM is accessible over RDesktop. There are a few other non-work reasons why I don't boot Linux very often, but I'll not go into those here.

So, oddly, NetWare is partly responsible for keeping me in Windows at home. But only partly.

Labels: , , , ,


Back-scatter spam

There was a recent slashdot post on this. We've had a fair amount of this sort of spam. And the victims are at pretty high levels of our organization, too. Last week the person who is responsible for us even having a Blackberry Enterprise Server asked us to figure out a way to prevent these emails from being forwarded to their blackberry. When a spam campaign is rolling, that person can get a bounce-message every 5-15 minutes for up to 8 hours, into the wee hours of the night. And that's just the mails that get PAST our anti-spam appliance. We set up some forwarding filters, but we haven't heard back about how effective they are.

This is a hard thing to guard against. You can't use the reputation of the sender IP address, since they're all legitimate mailers being abused by the spam campaign and are returning delivery service notices per spec. So the spam filtering has to be by content, which is a bit less effective. In one case, of the 950-odd DSN's we received for a specific person during a specific spam campaign, only 15 made it to the inbox. But that 15 was enough above what they normally saw (about 3 a day) that they complained.

Backscatter is a problem. However, our affected users have so far been sophisticated enough users of email to realize that this was more likely forgery than something wrong with their computer. So, we haven't been asked to "track down those responsible." This is a relief for us, as we've been asked that in the past when forged spams have come to the attention of higher level executives.

If it becomes a more wide-spread problem, we will be told to Do Something by the powers that be. Unfortunately, there isn't a lot that can be done. Blocking these sorts of DSNs is doable, but that's an expensive thing to manage in terms of people time. In 6-12 months we can expect the big anti-spam vendors to include options to just block DSN's uniformly, but until that time comes (and we have the budget for the added expenses) we'd have to do it through dumb keyword filters. Not a good solution. And it would also cause legitimate bounce messages to fail to arrive.

Labels: , ,


Wednesday, April 30, 2008

Legal processes

Yesterday we received a Litigation Hold request. For those of you who don't know, this is the order given as part of a lawsuit ordering us to take steps to preserve data that could be used as part of the Discovery process of the suit. This is something that is becoming more and more common these days.

Our department has been pretty lucky so far. Since I started here in late 2003 this is the first Litigation Hold request we've had to deal with. We've had a few "public records requests" come through which are handled similarly, but this is the first one involving data that may be introduced under sworn testimony.

This morning we had an article pointed out to us by the Office of Finance Management at the state. WWU is a State agency, so OFM is in our chain of bureaucracy.

Case Law/Rule Changes Thrust Electronic Document Discovery into the Spotlight
.

It's an older PDF, but it does give a high level view of the sorts of things we should be doing when these requests come in. One of the things that we don't have any processes for are the sequestration of held data and chain of custody preservation. We are now building those.

Guideline #4 has the phrase, "Consultants are particularly useful in this role," referring to overseeing the holding process and standing up before a court to testify that the data was handled correctly. This is very true! Trained professionals are the kind of people to know the little nuances that hostile lawyers can use to invalidate gathered evidence. Someone who has done a lot of reading and been to a few SANS classes is not that person.

Just because it is possible to self represent yourself in court as your own lawyer, doesn't make it a good idea. In fact, it generally is a very bad idea. Same thing applies to the above phrase. You want someone who knows what the heck they're doing when they climb up there onto the witness stand.

This is going to be an interesting learning experience.

Labels: ,


Monday, April 28, 2008

The GPL in a software-as-a-service world

Just this last weekend I went to Linuxfest Northwest, which is held here in Bellingham. This is nice! It's just a short drive.

One of the talks I went to was held by Ted Haeger, currently of Bungee Labs. The topic of the talk was one he had just posted to his blog, "Sharing Source Code In The Cloud".

One point he brought up that I hadn't heard of before is that the GPL triggers when you 'convey' the software to someone else. And that the GPL specifically excludes where the software is hosted on a server and users just use the software there, so long as the software itself never leaves the company in question. This is exactly what Google did and still does. All of their search IP was built on an OSS platform, but is still held as the crown jewels of their company; all because they haven't given the software to anyone else.

Apparently, this 'loophole' is being exploited by a LOT of new companies trying to get in on the software-as-a-service market. Such as Bungee Labs, as it happens. What effect will this have on the state of GPLed software? Hard to say, the market is still in its early days.

It makes you think.

Labels: ,


Thursday, April 17, 2008

And a gripe

2.5 hours is too freakin' long for "rug lu" to tell me which patches need application to this particular OES2 server. This needs fixing. I hope its fixed in SLES10 SP2.

Labels: , ,


NetWare and Novell, changing a company

A couple days ago Richard Bliss had a long blog entry about, "Novell's Cash Cow - How NetWare almost killed the company". It had some very interesting points. Some we knew:
We are all familiar with NetWare, the dominate Network Operating system of the 1980s and 1990s. We are all familiar with Microsoft's tactics of penetrating the NOS market with Windows NT by focusing on using Windows as an application platform.
Apparently Richard worked for Novell around 2001. I find that interesting since my first BrainShare was 2001, and that was when they announced the release of NetWare 6.0. While there he saw what seemed to be an outright denial that NetWare had been passed up by Windows and something new needed to be done.

In 2001 I knew that Windows had for all intents and purposes won. The only place you ever really saw NetWare servers were as file-servers, or running GroupWise or the small handful of apps that used NetWare as an application server. The stalwart loyalists among us saw this as annoying, but not a major problem.

It was also good for Novell's bottom line. NetWare still accounted for a large percentage of their revenues. Even though the writing was on the wall, they were still making real money on it so didn't see a need to change. This is why NetWare 6.0 introduced the AMP stack to NetWare, as a way to better make NetWare an application server and to slow the loss of customers. At BrainShare 2001 there was open speculation about "NetWare 7.0" and what it would look like.

And there still was until 2005 when Novell announced what the next version of NetWare would be. This being after the SUSE and Ximian purchases, it would be based on Linux. This move had been rumored, and alternately derided and lauded, for some time. There was a great wailing and gnashing of teeth on the part of the stalwart NetWare loyalists. It also started an exodus of customers, as Novell's financial reports at the time point out.

Fortunately for the company, they started actively promoting (for certain values of 'active' that are higher than they were previously, but still in the theme of Novell Stealth Marketing) and developing their other products, like GroupWise, Novell Identity Management, ZenWorks, and most especially their Linux business. It took them until last quarter to turn in a quarter in the black, and NetWare revenues are under 20% of total now. So, they've turned the corner and are no longer dependent on the NetWare cash cow. They have a couple of them in the field now, which is a MUCH healthier place to be.

It's a funny thing, but one of the reasons why NetWare is such a kick-butt file-server compared to everything else is why it's a challenging environment to develop in. Had Novell seen the light earlier and bought SUSE (or rolled their own Linux distro) in... 1999 instead, right after the NW5.1 release, they still would have run into the fundamental architectural problems in 32-bit linux that make it an inferior file-serving platform for large environments. By 2008 their server could have been a LOT more mature, and perfectly poised to take advantage of 64-bit Linux.

Novell in the 1990's is not an example of a 'nimble' company. It is trying to get there now through diversification. Not many companies (especially tech companies) have survived the loss of their prime money earner; Apple has done it through OSX, which required a fanatically loyal fan base to survive the dark years. This is the prime reason people kept predicting the imminent demise or buyout of Novell. Now that they're earning profits again, and have diversified away from just the OS sector, they're not going to be going out of business any time soon.

Now if only they had better SMB packages and programs. I hear repeatedly from peers who support SMBs that Novell's packages and programs in that space are lacking or exploitative. Significant revenue, and more importantly mindshare, are in the SMB market. Plus, today's SMB is tomorrow's large or global enterprise.

Labels: , ,


Tuesday, April 15, 2008

Beta attitudes

One thing I've noticed while working on this beta is a change in attitude. Specifically, attitude regarding problems. I've run into problems so far that would have had me throwing things across the room by now. Yet, instead I get that 'ahah!' feeling and proceed to figure out how it went poink exactly like that. And then report it. That feels good.

All of my prior bug-hunting has been post-release, when we ran into issues in production. Now, it's in pre-release and the bugs and issues I find now will be fixed by release (or at least documented so people know to expect it to break that way).

It's an interesting change in attitude.

Labels: ,


This page is powered by Blogger. Isn't yours?