ILAN Systems Resource Center
For
DEC Alpha Systems Running Tru64 and VMS
By Tom Reynolds
Summary
iLAN Systems Inc. provides support of DEC VAX/Alpha systems. Using a
centralized software engineer support model pioneered by DEC, we are able
to create synergy via remote equipment access or by phone. With 27 parts
locations nationwide, we can provide next-day part installation support on
a month-to-month contract or 4 hour parts-in-hand with an annual contract. This translates into big dollar savings for the
customer, without sacrificing quality of service; nor the need to commit
to a lengthy service contract.
Background
Third-Party support of hardware of Digital Equipment
- Colorado Springs Resource Center -
The technical expertise of Digital Equipment Corporation and its successors resides in its Colorado Springs Resource Center. In the early 80’s Digital Equipment recognized that it would be impossible to hire and train the necessary high-level technical individuals to support their expanding installed base. Instead they concentrated their best support minds at the Resource Center and developed numerous automatic systems to assist in troubleshooting and proactive support. The result is that today, the once excellent field service team is only legend. The field technicians are little more than board swappers. Nevertheless the DEC model of concentrating technical expertise and automatic tools in a single location has proved superior to the third-party model that relies on the field technician for fault isolation of problems
- Distributed Parts - As OEM, Digital Equipment and its successors, currently Hewlett Packard Co., have a natural advantage in that their large installed base permits distribution of spare parts storage in multiple locations worldwide. Typically, third party support organizations source their parts nationwide and rely on FedEx to deliver the parts overnight. The result is that reliable “next business day” service is all that is available from third party sources. The typical, “four-hours on-site with part in hand” that is necessary for mission critical applications service remained the purview of the OEM.
All this changed in 2003 when one of the largest DEC shops agreed to outsourced service for its 5000-odd Alpha computers running Tru64 and VMS. The size of the contract allowed the funding of an alternative Resource Center and the investment in distributed parts warehousing nationwide.
The development of an alternative Resource Center
Hewlett Packard, the successor to the Digital Equipment service business retains many advantages. In addition to working closely with engineering and operating system development engineering they maintain an extensive knowledgebase of past problems that aids in fault isolation. However, as any past employee of Ken’s once proud company knows, successive layoffs have decimated the ranks of top quality engineers. The result is a significant reduction in human troubleshooting experience and an over-reliance on automated tools.
Challenged to develop an alternative Resource Center, iLAN used its personal contacts to locate and employ previous DEC/Compaq employees that had demonstrated exceptional troubleshooting ability and to build a team of ex-DEC developers to build alternative fault identification and isolation tools and databases.
The result was that human expertise trumps computer programs. The iLAN Resource Center was able to successfully solve several problems that Hewlett-Packard had left for over a year and was able to solve the customers problems virtually all the time1. In addition, a propriety script exercises system commands and programs to collect system configuration and maintenance history to populate the iLAN Maintenance History and Knowledgebase databases.
Development of the “24 x 7, Four Hour, Part-in-Hand”
service
It took development of the iLAN Resource Center proactive script and a $2 million dollar investment to provide pre-sourced parts to the 27 nationwide locations that iLAN uses to support its 24 x 7 customers. Early in the contract it became clear that specific configuration information was required in order to properly stock the appropriate parts at the appropriate location closest to the customer site and this information was not available from the customer. One goal of the iLAN script is to provide accurate configuration directly from the system itself.
In the 24 x 7, Four Hour service, the parts and the technicians take separate paths to the customer site. The closest next available technician is dispatched to the site while the parts are separately couriered to the site. Both events must happen within four hours to satisfy the
SLA.
ILAN Service Level Agreement {SLA}
ILAN Systems maintains two Service Level Agreements {SLA} for hardware service on Alpha systems and one SLA for software service on Tru64 Unix and VMS operating systems. These SLA are dictated by agreement with iLAN’s largest customer2. ILAN services over 5000 VMS and Tru64 machines at these levels.
- Software
- Problem diagnosis within one hour 80 percent of the time3.
- Hardware
- 24 x 7 technician on site with part in 4 hours 80 percent of the time4.
- M-F 8 x 5 next business day.
The difference between the two hardware SLA’s is part sourcing. For the 24 x 7 SLA parts are pre-sourced by iLAN and warehoused in one of our 27 locations5 closest to the customer site. For next business day service parts are sourced nationally from one of iLAN’s seven parts suppliers and are delivered by FEDEX or equivalent.
Software Support
During normal business hours 8AM-8PM Eastern Time Zone [5AM-5PM Pacific Time Zone] iLAN maintains a dedicated Help Desk for Alpha platforms. Customers calling the dedicated number [x 712] will immediately connected to a competent software technician for Tru64 and VMS6.
In many cases the problem will be an obvious hardware failure. In this case the software technician will ascertain to appropriate part number and firmware revision number and dispatch hardware
support7.
When the problem is not obvious the Software Support Engineer will perform one or all of the following tasks8:
- Evaluate Console Dump
- Evaluate Error Log
- Evaluate Core Dump
- Consult iLAN Maintenance History database for this machine
- Consult iLAN Knowledgebase
- Research patch and firmware levels and check for and install available patches.
For all systems under contract, iLAN requests that a script be run during the maintenance window that populates the Maintenance History Database for that machine9. This database contains both the current configuration and maintenance history of the machine10. Systems under 24 x 7 contract coverage must have the script installed and access to the machine must be granted .
In rare cases the Software Specialist will not be able to diagnose the problem without and on-site technician. In this case the hardware technician will assist the Software Specialist with diagnostics when he
arrives11.
Software Specialist hardware support function
In a manner similar to the Digital/Compaq/HP model, the Software Specialist has the training and resources necessary to direct repair of the system. Whereas in many cases the problem is simply a failed hardware component, in some cases the dispatched parts do not fix or only partly solve the problem. For this reason the Software Specialist is available to assist the hardware technician in ascertaining that the problem is, in fact solved. In some cases it will be necessary for the Software Specialist to “watch” the system in order to determine if the problem is solved.
1The SLA for the iLAN
Resource Center was 99% of solvable problems solved. The iLAN Resource
Center has consistently exceeded this SLA. H-P is only able to
outperform iLAN when an undiscovered bug requiring a new bug fix
surfaces, a rare occurrence.
2Per contract iLAN
cannot reveal the name of the customer or the outsourcer prime
contractor.
3This SLA requires
remote access to the customer's server and the previous installation of
the iLAN proactive script. The actual number for software service was
93% diagnosis within one hour. Average time to diagnose was 43
minutes.
4This SLA only applies
when the customer site is a reasonable 3 hour drive from one of the 27
parts depots. When the drive is longer the SLA will be adjusted upward
accordingly.
5See Parts
Location List.
6This requirement is
per agreement with iLAN's largest customer. A competent technician is
defined as a technician who can read an error log and a core dump.
7The hardware SLA
begins when Software Support supplies the appropriate part number to
Hardware Support.
8This is not an
exhaustive list.
9For Tru64 systems this
script essentially executes sys_check -escalate and emails the result to
the iLAN database server. For VMS as series of commands are executed and
the results emailed (or FTP).
10Executing this script
is required for 24 x 7 service because it is an integral part of the
part sourcing process.
11This includes console
access to HSx devices.
NOTE: Digital Equipment
Corporation, DEC, HP, Compaq, Tru64,
and VMS are the intellectual properties or copyrights of Hewlett Packard
Corporation. |