Progress beyond the state of the art
In our vision each memory organisation, research institution and other organisations responsible for our digital knowledge and cultural heritage will benefit greatly from the proposed emulation access platform. This section will identify the aspects that form the key to ensuring long-term access to digitally stored information. The following illustration shows a schematic decomposition of all steps in the process of the emulation access platform, based on a digital data carrier as input A and possible outputs B and C offering two different approaches for representing the original information that was captured on the original carrier.
Media transfer
Without the original bits, authentic recreation of the digital object is impossible. Capturing the bits from the original data carrier is therefore crucial and requires sophisticated and reliable tools. Currently, many such data extraction tools exist, but are only suitable for newer carriers like CD-ROMs or DVD-ROMs. These tools are capable of copying bit-by-bit the information from the source media to a different target media thereby permitting data to become independent from the medium (or carrier) on which it happens to be stored. Generally older carriers such as tapes and 8-inch floppy disks contain less data in comparison with newer media, but are far more vulnerable to media decay at present. Because almost all physical readers and software drivers for these carriers have been lost, data is effectively trapped inside the medium. In order to release this information, new tools have to be created. KEEP will facilitate the creation of such tools by undertaking research into a wide variety of data carriers and their readers. This will result, for the first time, in a clear overview of data carriers noting their vulnerabilities and making proposals about how to treat them. Based on this, the extent to which existing tools are suitable for secure and robust extraction of data from their carriers will be investigated. If suitable tools are not presently available, the project will play a leading role in the development of new tools and offer a transfer tool framework that will be capable of securely extracting bits from a wide variety of media.
Pre-process
The biggest risk does not come from loss of data by hardware malfunctions, but by hardware obsolescence rendering files inaccessible in practice. Therefore, once the bits have been recovered by the data extraction tools, they should, ideally, be stored in a common and logical data format. At present, a wide variety of formats already exist but as yet no standardised format has emerged. KEEP will contribute to the process of developing standards by carrying research into which formats are best suited for particular data sources. The intention is not to create new formats but to determine the suitability of existing formats for a variety of data transfer tasks. To encourage the industry in creating more robust and standardized formats in the future, a set of guidelines will be drawn up.
Emulation
Existing emulators can be classified as follows: entertainment (often gaming), cross-platform support (running OS-dependent software on a different platform) and business efficiency (reducing expenses by improving hardware and software utilization). But there is one aspect that is uncovered by all emulators: durability.
Only one of the emulators presently available has been designed with long-term preservation in mind: Dioscuri , the modular emulator for digital preservation. This emulator was designed and developed by KB in cooperation with TSSP and the Dutch National Archives and features aspects such as durability and flexibility. It is durable because it uses a virtual machine as a hardware-independent layer. It is flexible by virtue of its modular architecture. Each hardware component is mimicked by a software equivalent called a module. Modules can be arranged in any configuration to permit the recreation of a wide range of various hardware architectures.
Modular emulation has appeared a necessity to tackle emulation of arcade video games that are all specific, but usually share a large number of standard widely-available chips. An early attempt to such an approach was Sparcade , released in 1996. It was eventually updated to include a record/playback feature available for all emulated arcade games. It was soon superseded by the Multiple Arcade Machine Emulator (MAME) , which has been a very successful emulator since its first release in February, 1997. Its modular structure has been used as a base for the Multiple Emulator Super System (MESS ) project that aims at emulating all computer and gaming console platforms in a modular way within a unified software system. However, this project did not succeed in providing a coherent and user-friendly interface. The Fake Amiga/ST (FAST) project is another example of a multi-platform emulator built with a modular structure. The Computer History Simulation Project (SIMH) is also a similar project aimed at early computers. However, none of these emulators mentioned above is designed to survive over time. This is exactly the issue KEEP will solve.
There are, in addition, a number of useful open source emulators which are often very accurate in virtually reproducing the exact behaviour of hardware components. Rather than recreating everything from scratch, KEEP will re-use work already done in the emulation world and will extend it with durability and flexibility required to make it suitable to endure time.
KEEP will develop an emulation framework that will be capable of running various emulators in a sustainable environment. Furthermore, the current prototype of Dioscuri will be extended with more software modules to allow a wider range of hardware configurations.
Rendering
In general emulators make very high demands on the processing power of the computer running them – often far in excess of what is reasonable in view of the ‘relatively’ simple environments being emulated.
Emulation software is typically difficult to use. The many different interfaces and options require users to be specially skilled in operating this software while the only thing they want is to access digital content. The process of configuring the emulator is a task that should be automated as much as possible.
KEEP aims to ease the setup and operation of emulated environments. It will achieve this by offering two approaches:
An online browsing system that enables a user on distance to select a digital object, browse its metadata and start the emulation process. The process of configuring the emulation environment will be transparent to the user as this will be done completely automatically by the emulation framework.
A plug-and-play interface to the emulation framework for integrating this framework into existing electronic deposit systems, such as KB’s e-Depot , DNB’s KOPAL , BnF’s SPAR or TSSP’s Safety Deposit Box . It should also be possible to integrate the emulation framework with the interoperability framework of the European project Planets currently under development. A selected digital object from the repository by the user will automatically be loaded and configured into the emulation framework.
Research will be carried out into understanding the standards and metadata models used by electronic deposit systems. Based on this knowledge, a generic interface will be created and guidelines will be published for emulation support into operational deposit systems. Via both approaches, KEEP will offer a remote user interface via the internet browser to the user, taking away the need to download and install emulation software locally at the client. Furthermore, this remote user interface will be enriched with extra services like stop/resume emulation process, record a movie of the process, etc.
Data transfer
Another common deficiency in the current generation of emulators is the lack of data exchange between the emulated environment and the host environment. Original digital objects often contain text, images, files, movies and more. During emulation, this information is trapped inside the emulation process and can not easily be copied out of it or vice versa. As data often serves as input for research, re-use of that information is crucial. Unfortunately, very few emulators support effective data transfer between the emulator and the host environment and if they do it is via various non-standardised methods. KEEP will conduct research into the ways in which users would like to re-use existing information from digital objects under emulation. Based on these outcomes, it will develop services on top of the emulation framework that support effective re-use of original contents.
Portability and durability
Portability is the first condition for durability. As many solutions have been developed to emulate old machines, none have yet addressed those issues having in mind the problem of long term emulation. Commercial emulators and virtualization solutions mainly focus on business efficiency. The current trend is to virtualize computer architectures to reduce hardware expense and optimize the storage and computing capacity. Although this sounds like today’s software is becoming less restricted to current hardware, in fact it still is. The following figure shows a schematic difference between business virtualization and virtualization by KEEP. While business virtualization is very fixated on increasing performance, their virtualization layer becomes tightly coupled with the underlying hardware. This is not the case with the proposed virtualization of this project. It shows a virtual layer as well, but will be very easy to port to different underlying architectures.
Two approaches in virtualization with digital preservation in mind are known: Universal Virtual Computer (UVC) -based preservation method from IBM and Olonys virtual machine by Joguin. The UVC is currently being developed further within the Planets project and shares many aspects with Olonys. However, the UVC is less focused on peripheral input and output support while most computers and other devices rely heavily on external communication with other devices. It is well capable of long-term access to static digital objects like images, text, spreadsheets, and sound, but it offers less support for executable objects like multimedia, user applications and games. KEEP will build on the Olonys virtual machine that was designed with maximum portability and flexibility in mind. This virtual machine features a multi-layered software structure, comprising several increasingly-complex stacked virtual machines. The main advantage of this approach is that programmes meant to execute on the virtual machine, such as the emulation framework, are compiled to a mostly complete virtual system similar to present computers, but only a very simplified virtual machine architecture needs to be ported or migrated to a real machine to make it all work.
As a consequence, to be able to execute applications compatible with Olonys, only a very short programme has to be ported, reasonably ensuring that it will always be easy to implement this programme on any future computer system: with no help from any specific programming tool, the implementation process should always require less than a week for a single programmer.
Moreover, the Olonys peripheral management is self-adaptive, using matching algorithms between physically available peripherals and peripherals requested by programmes, so that programmes written for Olonys do not need to include any kind of mechanism to adapt to different or future hardware peripheral configurations.
These features, which do not exist in any current system, thus provide durability to software compatible with Olonys, and to the Emulation Access Platform in particular. From 2000 to 2003, a thorough specification of the lower-layer parts of this architecture was already defined and implementations based on the three lowest layers have been successfully prototyped. The specification of this virtual machine led to the filing of three French patents (publication numbers FR2833728, FR2833729 and FR2833731), as well as an international patent (WO03052542), which today have entered the public domain. Within the KEEP project, the higher-layer parts of the Olonys specification will be completed, and development of the virtual machine internals will be carried out based on this completed technical specification, as well as on the very base of the architecture, which, although already fully-functional, represents less than 10% of the total development effort for the complete virtual machine.
