The Smithsonian Institution is the largest museum complex in the world. Its 19 museums, 9 research centers and the National Zoo contain over 130 million artifacts, specimens and/or works of art. Results of research at the Institution are published by hundreds of scientists, historians and other scholars every year. Recently the Smithsonian Institution Libraries (SIL) has begun collecting the digital publications of Institution researchers to ensure both the long-term care and public availability of these objects via the Smithsonian Digital Repository.
Currently the Repository contains published reprints from scientific, peer-reviewed journals, including the Institution's Smithsonian Contributions series. A limited amount of additional (ephemeral) material produced by the Institution is also archived in the system.
The open source software, DSpace was chosen because of it's widespread adoption and user community. Among the specific requirements for the system were the use of persistent URLs and the ability to batch-ingest multiple items at once. Because of security concerns the system was designed so that content ingest is done on a server which is maintained inside the Institution's firewall. A second, mirrored server is available for public search and data harvest. It is updated nightly via file synchronization and database dump/restore routines from the secure (internal-only) server. The public- facing server does not permit user login, edit or upload.
Development of the Repository was somewhat challenging at first due to a shortage of IT staff and support. Because the program was undertaken by the SI Libraries, the lack of technical knowledge of library staff required that a contractor install the initial hardware and software. A librarian then took computer-based training course(s) in UNIX, SQL and XML, among other modules in order to develop the skills necessary to maintain and manage the data. Many universities which begin digital repositories have a ready supply of student help who may be studying or have had training in server management, network architecture and the Unix/Linux environment. The Smithsonian does not have a similar pool of talent to draw from.
For both the internal and public servers, the Smithsonian's Office of the Chief Information Officer maintains the operating system, updates and backups. All other applications (database, repository, utilities, etc.) are installed, maintained and updated by library staff. Due to this shortage of technical expertise devoted to the project, it has not been possible for the Smithsonian Digital Repository to migrate to latest version of DSpace or to employ many plug-ins such as statistics enhancements, which might make the service more appealing.