It starts out with an email from a friend saying that they are looking into acquiring a storage server for their science lab.
Date: Mon, 1 Mar 2010
From: A Friend In That Lab
Subject: Fwd: Fwd: server config pricing spreadsheet
To: Chris X Edwards
We're looking for recommendations for something you know a lot more about than
I do, so if you have time, I thought I'd ask what you think...
---------- Forwarded message ----------
Subject: Re: Fwd: server config pricing spreadsheet
To: PI
From: Someone The PI Knows Who Knows-About-Computers
Cc: A Friend In That Lab, Person Responsible For System Administration
Hi folks
I like the x4440 the best, it seems to be the best value. Nothing wrong with
the x4275, though the extra power would be good. If you did purchase the
x4275, I would recommend and even number of drives (4 or more) and configure
with a RAID 10. [Blah blah blah]
[A bunch of talk about various server options which is uninteresting.]
I offered to go to their lab meeting and tell them my perspective on storage servers. This offer was accepted and I met with this entire group (PI, PRFSA, et al) on March 16, 2010. I explained to them the McNett Storage Server Architecture: build it yourself from components that are easily replaceable and if in doubt, buy two and put one on the shelf. And, of course, use Linux.
In the previous couple of months I had just built such a storage server of similar capacity. I showed them my notes on that specific build. They were looking to buy a used Sun machine. With storage capacity and RAID configuration similar to my box, their prospect was, despite being used, about three times more expensive.
I clearly told them that the storage server they were looking at would be a proprietary machine which would limit their responsiveness to problems. I enumerated all the ways that storage servers fail and addressed how the McNett design accommodates those failures. By having easily obtainable cold spares of anything that could fail on hand and ready to go you greatly reduce your dependence on uncontrollable entities in keeping the machine running.
They were very skeptical, instead appearing to favor the apparent greater security of Sun’s very reassuring warranty (which this machine apparently would be "covered" by). I explained my experience with warranties and how big companies are more motivated to make the service seem good before the sale than after.
In March of 2010, I basically described the exact course of events which ultimately took place 3 years later after they bought the Sun…
Date: Jan 2013
From: Person Responsible For System Administration
Subject: [admin-mailinglist] Spare hard drive for a Sun Fire x4540?
To: admin-mailinglist
Hi everyone.
I realize this is a long shot, but does anyone have any spare hard drives for a
Sun Fire X4540 ("thor") ? We currently have 2 failed drives and are waiting on
replacements from Oracle, but they are backordered with no ETA. If one more
drive fails, we could lose all of our data, so naturally I'm very concerned.
Thanks,
PRFSA
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Date: Jan 2013
From: Helpful Sys-Admin
Subject: [admin-mailinglist] Re: Spare hard drive for a Sun Fire x4540?
To: Person Responsible For System Administration
What size?
If you have a hardware support contract, Oracle is bound to specific
response times (2 hours onsite for "Premier").
http://www.oracle.com/us/support/premier/servers-storage/overview/index.html
Are they not honoring that?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Date: Jan 2013
From: Person Responsible For System Administration
Subject: [admin-mailinglist] Re: Spare hard drive for a Sun Fire x4540?
To: Helpful Sys-Admin
It's a 1 TB drive. Oracle has responded, but only to tell me that the
drives are backordered.
Thanks,
PRFSA
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Date: Jan 2013
From: Helpful Sys-Admin
Subject: [admin-mailinglist] Re: Spare hard drive for a Sun Fire x4540?
To: Person Responsible For System Administration
Sorry, our thors have 500GB drives.
Oracle's response sounds unacceptable to me. They should have stock
on hand sufficient to cover their contracts. Obviously that's not
entirely feasible. What if a earthquake took out some major sites
with lots of drives? But barring any such disaster, they should have
a replacement drive on-site in 2 hours, as their contract obliges them
to.
Have you tried to escalate it?
My storage servers have Linux software RAID1 OS drives, minimal lean secure Gentoo, custom kernels, and daily notifications that failure has not occurred. But I owe a big debt of gratitude to Dr. McNett for showing me the wisdom of doing it right by doing it yourself.
Here’s some more helpful feedback I received:
"Yes, that was similar to the experience we had with a storage server [someone] insisted we buy from a company called Procom. We ended up actually losing data due to a combination of a hang of their proprietary OS and a non-battery backed raid controller in their box.
Guess who later purchased Procom — Sun."
And this posted to the university’s system administrator’s list (May 2011).
My name is …. and I am Director of UC Systems at SDSC. Over the last 18 months we have seen the quality of support from Oracle deteriorate beyond the point of simple frustration. Recently, on a simple failed disk drive incident, it required 16 system support touches by our local system specialist and periods exceeding 24 hours between responses from Oracle. The system in question is under 7x24x365 4 hour "Gold" support as it stores user data. Not only did Oracle fail to provide response in the spirit of a 4-hour support contract, but also we wasted valuable administrator time on a trivial support issue.
UPDATE 2020-05-14
I just wanted to note that the McNett server I built in 2009 is still running in 2020. It is not my responsibility any more and I have advised its owner to retire it ASAP. But, they tell me, it works fine.