In 1957, I think the month was May, I arrived in Maywood Air Force Station to begin my assignment to the Air Force Ballistics Missiles Logistics Office. A few months later we moved temporarily to Mira Loma Air Force Station, then to Norton Air Force Base, our ultimate location.
The mission of the AFBMLO was to develop and operate a computer system for controlling the supplying of all ICBMs and IRBMs world-wide. The AF BMLO was a part of the Air Materiel Command headquartered at Wright-Patterson AFB. To accomplish the mission the AF acquired a large IBM 705 computer whose main memory could hold 40,000 characters rather than the standard 20,000. The Air Force also gathered together a group of persons — military, IBM representatives, consultants and general service civilian employees — whose assignment it would be to develop and operate this computer controlled supply control system. Of the persons gathered together, only a few of us had any experience developing computer systems: a handful of the consultants from Sutherland Company (Peoria, Illinois), one general services civilian and one second lieutenant, me.
A supply control system — an inventory system — has to have a database. The database would contain the records of the millions of parts available or installed on missiles. We would assume today this database would be stored on a hard drive. In 1957 we did not have hard drives; although we did have magnetic drums (as long as they kept revolving without failing) and we did use them for intermediate storage as best I can remember. Our “database” would be stored on a stack of large, long magnetic tapes; each reel of 7-track tape could store millions of records and several reels were necessary to store the main inventory file. To process the files on tape our computer had 26 separate tape units.
The system contained many different files, each of which resided on one or more reels of magnetic tape. Examples were the massive inventory database, parts description database, location description database, incoming order file, shipping file and on and on. Think about the inventory database residing on several reels of magnetic tape. To get to a particular record on the database all the previous records had to be passed by. To update the database changes were read in from another tape file, matched against the inventory database and a new inventory database was created onto a new set of magnetic tape reels. After an “update” was finished a new set of tapes would be the latest inventory master; the tapes that were read to create the new master would be the last previous master database and so on back as far as history was kept.
Need for Backup
An update of the inventory database could take several hours. When one reel of tape, either the input database, the output database or changes (transactions) reached the end of the tape we did not want to stop the computer while the operator re-wound the completed tape and replaced it with the next reel, so we used a pair of tape units for each file. While one tape was rewinding and would then be dismounted, the next reel on the paired drive would be used. By the time that reel had reached the end the next reel of the file would be waiting on the original paired tape unit.
Magnetic tapes were not the most reliable medium for storing data; errors could and did occur. To ensure we were using the correct tape we wrote labels and reel sequence numbers on the beginnings of our reels; we also kept count of how many records were on a reel and wrote a record count at the end of the tape. When we wrote a tape we created a sum of all the (binary) digits we wrote on it, called a checksum. When we read the tape we recalculated the sum and checked it when the end of tape was reached to ensure no error had occurred in reading or writing. Our system maintained controls on whether the correct tape and correct reel number was processed. Discrepancies were discovered sometimes in record counts or checksums. Another problem with tapes was that they might break or become damaged and unusable.
Let’s talk about a possible problem that is all too similar to problems that did occur. The computer operator is running a job that updates the inventory database with shipments that have been made. The information on shipments is on one reel of tape. The inventory database to be updated is on 10 reels of 2400 feet long tape. Running the job takes 8 hours. The changes (transactions) to the inventory database are in the same sequence as the data on the database.
One by one the records on the inventory database are checked against the transactions. If there is no match, the input inventory database records are copied onto the output inventory database that is being created. If there is a match then the information from the transactions is used to update the corresponding inventory database record and the updated record is written onto the output inventory database.
In our example midway through the processing of the eighth reel of the input inventory database the computer gets a persistent writing error on the output reel of the inventory data base, that happens to be the ninth output reel. More than 7 hours of processing has gone by. In most computer installations at the time computer operations would have no choice but to scrap the 9 reels of output that had already been created and instead begin the job again from the beginning. The time and processing that had been done would be lost.
Checkpoint/Restart
The ideal solution to the problem would be to go back to a point in the processing where the problem occurred and begin from there rather than running the entire job again. In 2009 terms go back to the last backup and continue the run from there.
Today whenever we install some new application on our personal computers the computer operating system first makes a record of our settings — a restore point — and, if we later discover a problem we can go back to the condition our computer was in when the restore point was created. In 1957 there was no such thing as backup nor a restore point (checkpoint), and to complicate matters more we were dealing with databases on continuous reels of tape not on directly accessible hard drives. Some thinking was being done about the problem, but there was no generally available application package or even an approach to solving it, the solution being thought of as a “checkpoint/restart system.”
The key questions to be solved for a checkpoint/restart system in an installation such as ours were:
- When to take a checkpoint
- Where to take a checkpoint
- What to take in a checkpoint
The answer to the first question was easy: often. A further answer was: every time you start a new reel of input or output tape. The answer to the last question was to take a picture of the exact state of the computer [memory] and a further corollary was to note the position of all tapes in the run. So, we wanted to instruct the operator to take a checkpoint every 15 minutes or so. Also, we wanted to take a checkpoint any time we mounted a new input reel or a blank (to be written) output reel.
An obvious conclusion was where else could we write a checkpoint arbitrarily taken by the operator or caused by the loading of an input tape, but on another tape called the checkpoint tape. It was also obvious that the ideal place to write a checkpoint when a new output reel was begun was on the output reel itself.
Various technical problems had to be solved; one I particularly remember was that we could not write the contents of memory and registers on tape in a format that would enable the system to restore them correctly. I remember solving that problem by the use of an unprintable character to fill the area where data would be written, but I no longer recall how or why that solved the problem. If we had not already made it standard that tapes would be labeled with a label identifying the file, reel number within the file and the date produced, that the computer run would keep a record count of what record of each file was being processed at any time and the record count for each reel of tape would also be written at the end of the tape along with the checksum, these things now became standard to make checkpoint/restart workable.
Establishing a standard and ensuring that our relatively inexperienced programmers would adhere to it were two different things. There was no high level language available for the programmers to use writing their programs; COBOL had not been invented yet. All that was available to them was Autocoder which allowed them to write generally one line of coding for each [to be ] generated machine instruction using easier to remember mnemonics for operators and symbolic labels for constant and variable fields.
Those of us developing the tape handling standards and the checkpoint/restart programs believed we were unlikely to get programs that handled tape files and checkpoint/restart in a consistent, standard
manner if programmers had nothing but guidance to program in a certain way. To ensure tape files and checkpoint/restart was handled consistently we developed macro instructions also linked to subroutines, e.g. checkpoint program, that the programmers were to use for all tape file operations.
To enforce programming standards and ease the programming work required we developed a set of macro instructions that were to be the only “tools” programmers could use in writing their programs. In effect, this set of macro instructions became a “high level language” for programming the IBM 705, and, as such, was a precursor for COBOL and other high level language developments. I was given the assignment to define the language, and a staff of 12 officers, enlisted men and civilian employees to develop the actual code that would be generated for each and every programmed macro.