ILAN Systems Resource Center
For
DEC Alpha Systems Running Tru64 and VMS

By Tom Reynolds, Emeritus-Member of iLAN Founders 

Summary
iLAN Systems Inc. provides support of DEC VAX/Alpha systems. Using a centralized software engineer support model pioneered by DEC, we are able to create synergy via remote equipment access or by phone. With 27 parts locations nationwide, we can provide next-day part installation support on a month-to-month contract or 4 hour parts-in-hand with an annual contract. This translates into big dollar savings for the customer, without sacrificing quality of service; nor the need to commit to a lengthy service contract.

Background
Third-Party support of hardware of Digital Equipment

  • Colorado Springs Resource Center - The technical expertise of Digital Equipment Corporation and its successors resides in its Colorado Springs Resource Center. In the early 80’s Digital Equipment recognized that it would be impossible to hire and train the necessary high-level technical individuals to support their expanding installed base. Instead they concentrated their best support minds at the Resource Center and developed numerous automatic systems to assist in troubleshooting and proactive support. The result is that today, the once excellent field service team is only legend. The field technicians are little more than board swappers. Nevertheless the DEC model of concentrating technical expertise and automatic tools in a single location has proved superior to the third-party model that relies on the field technician for fault isolation of problems
  • Distributed Parts - As OEM, Digital Equipment and its successors, currently Hewlett Packard Co., have a natural advantage in that their large installed base permits distribution of spare parts storage in multiple locations worldwide. Typically, third party support organizations source their parts nationwide and rely on FedEx to deliver the parts overnight. The result is that reliable “next business day” service is all that is available from third party sources. The typical, “four-hours on-site with part in hand” that is necessary for mission critical applications service remained the purview of the OEM.

All this changed in 2003 when one of the largest DEC shops agreed to outsourced service for its 5000-odd Alpha computers running Tru64 and VMS. The size of the contract allowed the funding of an alternative Resource Center and the investment in distributed parts warehousing nationwide. 

The development of an alternative Resource Center
Hewlett Packard, the successor to the Digital Equipment service business retains many advantages. In addition to working closely with engineering and operating system development engineering they maintain an extensive knowledgebase of past problems that aids in fault isolation. However, as any past employee of Ken’s once proud company knows, successive layoffs have decimated the ranks of top quality engineers. The result is a significant reduction in human troubleshooting experience and an over-reliance on automated tools. 

Challenged to develop an alternative Resource Center, iLAN used its personal contacts to locate and employ previous DEC/Compaq employees that had demonstrated exceptional troubleshooting ability and to build a team of ex-DEC developers to build alternative fault identification and isolation tools and databases. 

The result was that human expertise trumps computer programs. The iLAN Resource Center was able to successfully solve several problems that Hewlett-Packard had left for over a year and was able to solve the customers problems virtually all the time1. In addition, a propriety script exercises system commands and programs to collect system configuration and maintenance history to populate the iLAN Maintenance History and Knowledgebase databases.

Development of the “24 x 7, Four Hour, Part-in-Hand” service
It took development of the iLAN Resource Center proactive script and a $2 million dollar investment to provide pre-sourced parts to the 27 nationwide locations that iLAN uses to support its 24 x 7 customers. Early in the contract it became clear that specific configuration information was required in order to properly stock the appropriate parts at the appropriate location closest to the customer site and this information was not available from the customer. One goal of the iLAN script is to provide accurate configuration directly from the system itself.

In the 24 x 7, Four Hour service, the parts and the technicians take separate paths to the customer site. The closest next available technician is dispatched to the site while the parts are separately couriered to the site. Both events must happen within four hours to satisfy the SLA. 

ILAN Service Level Agreement {SLA}
ILAN Systems maintains two Service Level Agreements {SLA} for hardware service on Alpha systems and one SLA for software service on Tru64 Unix and VMS operating systems. These SLA are dictated by agreement with iLAN’s largest customer2. ILAN services over 5000 VMS and Tru64 machines at these levels.

  • Software
    • Problem diagnosis within one hour 80 percent of the time3.
  • Hardware
    • 24 x 7 technician on site with part in 4 hours 80 percent of the time4. 
    • M-F 8 x 5 next business day.

The difference between the two hardware SLA’s is part sourcing. For the 24 x 7 SLA parts are pre-sourced by iLAN and warehoused in one of our 27 locations5 closest to the customer site. For next business day service parts are sourced nationally from one of iLAN’s seven parts suppliers and are delivered by FEDEX or equivalent.

Software Support
During normal business hours 8AM-8PM Eastern Time Zone [5AM-5PM Pacific Time Zone] iLAN maintains a dedicated Help Desk for Alpha platforms. Customers calling the dedicated number [x 712] will immediately connected to a competent software technician for Tru64 and VMS
6.

In many cases the problem will be an obvious hardware failure. In this case the software technician will ascertain to appropriate part number and firmware revision number and dispatch hardware support7

When the problem is not obvious the Software Support Engineer will perform one or all of the following tasks8:

  • Evaluate Console Dump
  • Evaluate Error Log
  • Evaluate Core Dump
  • Consult iLAN Maintenance History database for this machine
  • Consult iLAN Knowledgebase
  • Research patch and firmware levels and check for and install available patches.

For all systems under contract, iLAN requests that a script be run during the maintenance window that populates the Maintenance History Database for that machine9. This database contains both the current configuration and maintenance history of the machine10. Systems under 24 x 7 contract coverage must have the script installed and access to the machine must be granted .

In rare cases the Software Specialist will not be able to diagnose the problem without and on-site technician. In this case the hardware technician will assist the Software Specialist with diagnostics when he arrives11.

Software Specialist hardware support function
In a manner similar to the Digital/Compaq/HP model, the Software Specialist has the training and resources necessary to direct repair of the system. Whereas in many cases the problem is simply a failed hardware component, in some cases the dispatched parts do not fix or only partly solve the problem. For this reason the Software Specialist is available to assist the hardware technician in ascertaining that the problem is, in fact solved. In some cases it will be necessary for the Software Specialist to “watch” the system in order to determine if the problem is solved.


1The SLA for the iLAN Resource Center was 99% of solvable problems solved. The iLAN Resource Center has consistently exceeded this SLA. H-P is only able to outperform iLAN when an undiscovered bug requiring a new bug fix surfaces, a rare occurrence.

2Per contract iLAN cannot reveal the name of the customer or the outsourcer prime contractor.

3This SLA requires remote access to the customer's server and the previous installation of the iLAN proactive script. The actual number for software service was 93% diagnosis within one hour. Average time to diagnose was 43 minutes. 

4This SLA only applies when the customer site is a reasonable 3 hour drive from one of the 27 parts depots. When the drive is longer the SLA will be adjusted upward accordingly. 

5See Parts Location List.

6This requirement is per agreement with iLAN's largest customer. A competent technician is defined as a technician who can read an error log and a core dump.

7The hardware SLA begins when Software Support supplies the appropriate part number to Hardware Support. 

8This is not an exhaustive list.

9For Tru64 systems this script essentially executes sys_check -escalate and emails the result to the iLAN database server. For VMS as series of commands are executed and the results emailed (or FTP). 

10Executing this script is required for 24 x 7 service because it is an integral part of the part sourcing process. 

11This includes console access to HSx devices.

NOTE: Digital Equipment Corporation, DEC, HP, Compaq, Tru64, and VMS are the intellectual properties or copyrights of Hewlett Packard Corporation.