Thursday, May 29, 2008
OES2 and SLES10-SP2
Very true. And most especially true if you're running virtualized NetWare! The paravirtualization components in NW65SP7 are designed around the version of Xen that's in SLES10-SP1, and SP2 contains a much newer version of Xen (trying to play catch-up to VMWare means a fast dev cycle, after all). So, expect problems if you do it.Updating OES2
OES2 systems should NOT be updated to SLES10 SP2 at this time!
Also, the OES2 install does contain some kernel packages, such as those relating to NSS.
OES2 systems need to wait until either Novell gives the all clear for SP2 deployments on OES2-fcs, or OES2-SP1 ships. OES2-SP1 is built around SLES10-Sp2.
Friday, May 23, 2008
Problem with SLES10-SP2
Updates catalogs missing after updating libzypp
I've heard on the grape-vine that this particular libzypp update was put into the SLES10-SP1 channel in order to prepare for SP2's release. Those fine folk out there that have turned on Auto Updating on their SLE[S|D] boxes have very probably already been bit by it. I hope Novell gets this one fixed, and posts recovery steps, soon.
Thursday, May 22, 2008
A question of scale
Honda's 68MPG FCX Fuel-Cell Sedan to See Limited Service in '08
This is interesting in and of its own self. But in the main body of the article is this very interesting sentence :
Honda's FCX prototype uses a 95kW (127HP) electric motor which is powered by a 100kW Proton Exchange Membrane Fuel Cell (PEFC), 171 liter hydrogen fuel tank and a bank of lithium-ion batteries.The UPS attached to our datacenter is 50kW. This one car will have to push out enough electricity to run TWO of our datacenters in order to have enough oomph to satisfy the normal American consumer. Interesting!
Labels: sysadmin
Wednesday, May 21, 2008
SLES10 SP2 shipped
This means that the ongoing OES2 SP1 beta I'm a part of will be done on released code for the SLES side of it. So any bugs we find there may end up as patches on the SP2 channel.
One nice thing in the new code?
"rug refresh --clean"
This will do what I posted about a few days ago. It'll nuke the zmd database and rebuild it fresh! Niiiice! Unfortunately, a truly better version of rug won't come until "Code 11".
Wednesday, May 14, 2008
NetWare and Xen
Guidelines for using NSS in a virtual environment
Towards the bottom of this document, you get this:
Nice stuff there! The "xenblk barriers" can also have an impact on the performance of your virtualized NetWare server. If your I/O stream runs the server out of cache, performance can really suffer if barriers are non-zero. If it fits in cache, the server can reorder the I/O stream to the disks to the point that you don't notice the performance hit.Configuring Write Barrier Behavior for NetWare in a Guest Environment
Write barriers are needed for controlling I/O behavior when writing to SATA and ATA/IDE devices and disk images via the Xen I/O drivers from a guest NetWare server. This is not an issue when NetWare is handling the I/O directly on a physical server.
The XenBlk Barriers parameter for the SET command controls the behavior of XenBlk Disk I/O when NetWare is running in a virtual environment. The setting appears in the Disk category when you issue the SET command in the NetWare server console.
Valid settings for the XenBlk Barriers parameter are integer values from 0 (turn off write barriers) to 255, with a default value of 16. A non-zero value specifies the depth of the driver queue, and also controls how often a write barrier is inserted into the I/O stream. A value of 0 turns off XenBlk Barriers.
A value of 0 (no barriers) is the best setting to use when the virtual disks assigned to the guest server’s virtual machine are based on physical SCSI, Fibre Channel, or iSCSI disks (or partitions on those physical disk types) on the host server. In this configuration, disk I/O is handled so that data is not exposed to corruption in the event of power failure or host crash, so the XenBlk Barriers are not needed. If the write barriers are set to zero, disk I/O performance is noticeably improved.
Other disk types such as SATA and ATA/IDE can leave disk I/O exposed to corruption in the event of power failure or a host crash, and should use a non-zero setting for the XenBlk Barriers parameter. Non-zero settings should also be used for XenBlk Barriers when writing to Xen LVM-backed disk images and Xen file-backed disk images, regardless of the physical disk type used to store the disk images.
So, keep in mind where your disk files are! If you're using one huge XFS partition and hosting all the disks for your VM-NW systems on that, then you'll need barriers. If you're presenting a SAN LUN directly to the VM, then you'll need to "SET XENBLK BARRIERS = 0", as they're set to 16 by default. This'll give you better performance.
Labels: benchmarking, netware, novell, NSS, OES, storage, virtualization
Monday, May 12, 2008
DataProtector 6 has a problem, continued

See? This is an in-progress count of one of these directories. 1.1 million files, 152MB of space consumed. That comes to an average file-size of 133 bytes. This is significantly under the 4kb block-size for this particular NTFS volume. On another server with a longer serving enhincrdb hive, the average file-size is 831 bytes. So it probably increases as the server gets older.
On the up side, these millions of weensy files won't actually consume more space for quite some time as they expand into the blocks the files are already assigned to. This means that fragmentation on this volume isn't going to be a problem for a while.
On the down side, it's going to park (in this case) 152MB of data on 4.56GB of disk space. It'll get better over time, but in the next 12 months or so it's still going to be horrendous.
This tells me two things:
- When deciding where to host the enhincrdb hive on a Windows server, format that particular volume with a 1k block size.
- If HP supported NetWare as an Enhanced Incremental Backup client, the 4kb block size of NSS would cause this hive to grow beyond all reasonable proportions.
Since it is highly likely that I'll be using DataProtector for OES2 systems, this is something I need to keep in mind.
Wednesday, May 07, 2008
DataProtecter 6 has a problem
Once of the niiiice things about DP is what's called, "Enhanced Incremental Backup". This is a de-duplication strategy, that only backs up files that have changed, and only stores the changed blocks. From these incremental backups you can construct synthetic full backups, which are just pointer databases to the blocks for that specified point-in-time. In theory, you only need to do one full backup, keep that backup forever, do enhanced incrementals, then periodically construct synthetic full backups.
We've been using it for our BlackBoard content store. That's around... 250GB of file store. Rather than keep 5 full 275GB backup files for the duration of the backup rotation, I keep 2 and construct synthetic fulls for the other 3. In theory I could just go with 1, but I'm paranoid :). This greatly reduces the amount of disk-space the backups consume.
Unfortunately, there is a problem with how DP does this. The problem rests on the client side of it. In the "$InstallDir$\OmniBack\enhincrdb" directory it constructs a file hive. An extensive file hive. In this hive it keeps track of file state data for all the files backed up on that server. This hive is constructed as follows:
- The first level is the mount point. Example: enhincrdb\F\
- The 2nd level are directories named 00-FF which contain the file state data itself
The last real full backup I took of the content store backed up just under 1.7 million objects (objects = directory entries in NetWare, or IIRC inodes in unix-land). Yet the enhincrdb hive had 2.7 million objects. Why the difference? I'm not sure, but I suspect it was keeping state data for 1 million objects that no longer were present in the backup. I have trouble believing that we managed to churn over 60% of the objects in the store in the time I have backups, so I further suspect that it isn't cleaning out state data from files that no longer have a presence in the backup system.
DataProtector doesn't support Enhanced Incrementals for NetWare servers, only Windows and possibly Linux. Due to how this is designed, were it to support NetWare it would create absolutely massive directory structures on my SYS: volumes. The FACSHARE volume has about 1.3TB of data in it, in about 3.3 directory entries. The average FacStaff User volume (we have 3) has about 1.3 million, and the average Student User volume has about 2.4 million. Due to how our data works, our Student user volumes have a high churn rate due to students coming and going. If FACSHARE were to share a cluster node with one Student user volume and one FacStaff user volume, they have a combined directory-entry count of 7.0 million directory entries. This would generate, at first, a \enhincrdb directory with 7.0 million files. Given our regular churn rate, within a year it could easily be over 9.0 million.
When you move a volume to another cluster node, it will create a hive for that volume in the \enhincrdb directory tree. We're seeing this on the BlackBoard Content cluster. So given some volumes moving around, and it is quite conceivable that each cluster node will have each cluster volume represented in its own \enhincrdb directory. Which will mean over 15 million directory-entries parked there on each SYS volume, steadily increasing as time goes on taking who knows how much space.
And as anyone who has EVER had to do a consistency check of a volume that size knows (be it vrepair, chkdsk, fsck,or nss /poolrebuild), it takes a whopper of a long time when you get a lot of objects on a file-system. The old Traditional File System on NetWare could only support 16 million directory entries, and DP would push me right up to that limit. Thank heavens NSS can support w-a-y more then that. You better hope that the file-system that the \enhincrdb hive is on never has any problems.
But, Enhanced Incrementals only apply to Windows so I don't have to worry about that. However.... if they really do support Linux (and I think they do), then when I migrate the cluster to OES2 next year this could become a very real problem for me.
DataProtector's "Enhanced Incremental Backup" feature is not designed for the size of file-store we deal with. For backing up the C: drive of application servers or the inetpub directory of IIS servers, it would be just fine. But for file-servers? Good gravy, no! Unfortunately, those are the servers in most need of de-dup technology.
Tuesday, May 06, 2008
Being annoyed by rug?
One of the chief annoyances of rug is that the zmd.db file kept in /var/lib/zmd/zmd.db gets corrupted far too easily. And when that happens, rug can take HOURS to return anything. If it returns anything at all.
The fix for it is easy, stop zmd, delete the zmd.db file, restart zmd. Since I'm doing this fairly often, I've whipped up a bash script to do it for me.
nukezmd
#!/bin/shSimple, to the point. Works.
#
# For killing ZMD when it is clearly hung. An all too often occurance.
#
declare PIDZMD
# First get the PID of ZMD
printf "Getting PID... "
let PIDZMD=`rczmd showpid`
printf "$PIDZMD\n"
# Then unconditionally kill it
printf "Killing zmd hard... \n"
kill -9 $PIDZMD
# Remove the old, inconsistent database
printf "Nuking old database... \n"
rm /var/lib/zmd/zmd.db
# Restart ZMD, which will build a new, consistent database
printf "Restarting ZMD\n"
rczmd start
Monday, May 05, 2008
Linux @ Home
1: Wireless driver problems
I have an intel 3945 WLAN card. It works just fine in linux, well supported. What throws it for a loop, however, are sleep and hibernate states. It can go one, two, four, maybe five cycles through sleep before it will require a reboot in order to find the home wireless again. If it doesn't lock the laptop up hard. Since my usage patterns are heavily dependent upon Sleep mode, this is a major, major disincentive to keep the Linux side booted.
I understand the 2.6.25 kernel is a lot better about this particular driver. Thus, I wait with eager anticipation the release of openSUSE 11.0. This driver is currently the ipw3945 driver, and will eventually turn into iwl3945 driver once it comes down the pipe. What little I've read about it suggests that the iwl driver is more stable through power states.
2: NetWare remote console
I use rconip for remote console to NetWare. Back when Novell first created the IP-based rconsole, they also released rconj along side ConsoleOne to provide it. As this was written in Java, it was mind bogglingly slow. This little .exe file was vastly faster, and I've come to use it extensively. Unless I get Wine working, this tool will have to stay on my Windows XP partition. It works great, and I haven't found a good linux-based replacement yet.
Time has moved on. Hardware has gotten faster, and the 'java penalty' has reduced markedly. RconJ is actually usable, but I still don't use it. Plus, it would require me to install ConsoleOne onto my laptop. It's 32-bit, so that's actually possible, but I really don't want to do that.
The Remote Console through the Novell Remote Monitor (that service out on :8009) has a nice remote-console utility, but it also requires Java. I'm still biased against java, and java-on-linux still seems fairly unstable to me. I don't trust it yet. It also doesn't scale well. When I'm service-packing, it is a LOT nicer looking to have 6 rconip windows up than 6 browser-based NRM java-consoles open. Plus, rconip will allow me access to the server console if DS is locked, something that NRM can't do and is invaluable in an emergency.
Once the wireless driver problems are fixed, I'll boot the linux side much more often. Remote-X over SSH actually makes some of my remote management a touch easier than it is in WinXP. And if I really really need to use Windows, my work XP VM is accessible over RDesktop. There are a few other non-work reasons why I don't boot Linux very often, but I'll not go into those here.
So, oddly, NetWare is partly responsible for keeping me in Windows at home. But only partly.
Labels: linux, netware, novell, opinion, virtualization
Back-scatter spam
This is a hard thing to guard against. You can't use the reputation of the sender IP address, since they're all legitimate mailers being abused by the spam campaign and are returning delivery service notices per spec. So the spam filtering has to be by content, which is a bit less effective. In one case, of the 950-odd DSN's we received for a specific person during a specific spam campaign, only 15 made it to the inbox. But that 15 was enough above what they normally saw (about 3 a day) that they complained.
Backscatter is a problem. However, our affected users have so far been sophisticated enough users of email to realize that this was more likely forgery than something wrong with their computer. So, we haven't been asked to "track down those responsible." This is a relief for us, as we've been asked that in the past when forged spams have come to the attention of higher level executives.
If it becomes a more wide-spread problem, we will be told to Do Something by the powers that be. Unfortunately, there isn't a lot that can be done. Blocking these sorts of DSNs is doable, but that's an expensive thing to manage in terms of people time. In 6-12 months we can expect the big anti-spam vendors to include options to just block DSN's uniformly, but until that time comes (and we have the budget for the added expenses) we'd have to do it through dumb keyword filters. Not a good solution. And it would also cause legitimate bounce messages to fail to arrive.
