MSFIO File I/O for MS-DOS formatted diskettes This subroutine package provides the ability to read MS-DOS formatted diskettes under other operating systems. * Copyright (c) 1991 Shal Farley * Cheshire Engineering Corporation * 650 Sierra Madre Villa Avenue, Suite 201 * Pasadena, California 91107 * (818) 351-5493 * (818) 351-8645 FAX * shal@alumni.caltech.edu * * This software may be used and distributed for any purpose without license or * royalty payments so long as the above copyright notice is preserved. If you * have any comments, bug fixes, improvements, or new programs based upon this * software I'd like to hear from you. My primary purpose in creating this package was to allow me to write a copy utility that would copy files to and from an MS-DOS formatted diskette. This provides me with "Sneaker-Net" access between my RT-11/TSX-Plus system and my MS-DOS systems. Normally I use Kermit to transfer files, but at 9600 Baud that just takes too long for large files. To keep this effort more organized, and make it more generically useful, I've implemented these routines using the syntax and semantics of the ANSI C file I/O routines. I've only bothered to implement the functions I needed for my applications. See "The C Programming Language, Second Edition (ANSI C)", or nearly any text on ANSI C, for user documentation. To avoid name conflicts I've prefixed my names with "ms". Hence you will find in this library: msfopen, msfclose msfread, msfwrite msgetc, msputc msfflush msperror Not (yet) implemented: msfseek, msftell msfeof, msferror, msperror, msclearerr msremove, msrename msungetc (getc, putc) msfgets, msfputs msfprintf, msfscanf I've also implemented a few other routines modelled upon Microsoft C routines: msffirst, msfnext, msfstat, msfilelength Limitations ----------- In order to keep this weekend project to no more than a year's worth of Sundays, I've accepted several significant limitations: No support for subdirectories. Only implement the file functions I need. Write for portability, not efficiency in size or speed. None of these limitations stand in the way of my creating a cross-filestructure copy utility, but they would certainly be problems if someone expected to write and use many kinds of programs that use this package. I've included discussions, below, for the benefit of anyone who might wish to overcome some of these limitations or otherwise enhance this package. Other Implementations --------------------- I considered implementing this capability by modifying the C library sources at the lowest level, and implementing a new flag to fopen(), perhaps "-m", rather than providing parallel routines. The advantage would be that I'd have to write a lot less code; and existing programs could be made to use MS-DOS files with minimal source alterations. I didn't, mostly because the C library at hand (DECUS-C) was written in assembly and I didn't want to mess with it. Furthermore such an approach would be non-portable in the worst way: it would break with each new revision of the DECUS-C library. A better approach than modifying the existing library would be to write a set of replacement functions which the user would link in ahead of the library. With suitable attention to the modularity of the library it should be possible to simply supercede the low level functions, and let the library's upper level functions use my new routines. In this best-of-all-possible-worlds I get to write only a few low level functions and leverage all the existing high-level functions. Again, such an approach is altogether too non-portable. Physical Device I/O ------------------- I'm using an Andromeda ESDC disk controller attached to a 5.25" PC/AT compatible disk drive. This combination emulates an RX33 diskette drive. I'm told that I could add a 3.5" diskette drive and that would give me RX23 emulation as well. Whether DEC or third party, RX33 diskettes are format compatible with the PC/AT High Density (1.2 MB) diskettes, and RX23 diskettes are format compatible with IBM PS/2 high density 3.5" (1.44 MB) diskettes. Format compatibility means that not only are the media physically and magnetically compatible, but that the low- level sector formatting is compatible. A diskette purchased for and formatted by an MS-DOS system can be initialized and used on the DEC system; and vice-versa. What's missing, of course, is the directory and file structure compatibility provided by this module. With my RX33 drive I can do physical I/O to the MS-DOS diskette as a device, but the MS-DOS file structure is gibberish to RT-11. File Structure -------------- Fortunately, MS-DOS and RT-11 agree upon the internal structure of files: files are merely ordered collections of bytes. Also fortunately, MS-DOS and RT-11 agree upon the usage of ASCII to represent text, and both agree that lines of text are separated by the pair of characters and . This means that no further translation is required for text files, the only kind of files I'm interested in. One difference between RT-11 files and MS-DOS files that needs to be accounted for is that RT-11 file sizes are recorded in whole blocks (of 512 bytes each), whereas MS-DOS file sizes are recorded to the byte. When copying from MS-DOS to RT-11 this means that past the end of an MS-DOS file the copy utility will have to fill out the RT-11 file to the nearest block. Presumably null (zero) is the best byte value to fill with. Conversely, when copying from RT-11 to MS-DOS the file size is only known to the nearest block. If the file is ASCII text it might make sense to truncate any trailing null bytes rather than include them in the MS-DOS file size. With a binary file such truncation would not be safe. The trouble is, there is no established way of knowing whether a file is binary or ASCII. The RT-11 COPY command solves this dilemma by having "/ASCII" and "/BINARY" options so the user may specify the copy mode. ASCII mode removes all null and rubout bytes, masks off the eigth (parity) bit of each byte, and treats CTRL+Z (032 == 26 == 0x1A) as an end-of-file mark. Binary mode makes no changes to the file content and is the default mode. Directory Caching ----------------- The MS-DOS file structure is embodied in two objects: the Root Directory (with its subordinate directories) and the FAT (File Allocation Table). Unlike RT-11 there is no concept of a "tentative" entry, that is, a file which has been opened for writing but not yet closed. Under MS-DOS fopen() typically creates the new directory entry as a permanent file, with zero length. Each fwrite() at the current end of the file extends its length. This means that each fwrite() must update the file's length in the directory entry, and potentially update the FAT to allocate it more clusters of disk space. Rather than constantly reread these items I have implemented a cache for the Root Directory and for the FAT. This may make some users unhappy because the cache occupies a fair amount of malloc()'d memory: about 11 KB for a high density (1.2 MB) diskette. If that's a problem for somebody, I guess they'll have to implement a smaller mechanism. It should be possible, although painfully slow, to eliminate the cache and just fseek() to the bytes you need. Perhaps if your C library, operating system, or hardware provide some data caching underneath fread() & fwrite() it wouldn't be too hideous. I considered implementing this cache as a write-through cache, but that would defeat its benefit in the one case where it is most useful, in fwrite(). Rather, I hold all of the changes until fclose(). I'm not too concerned about having stale data around nor about having the application crash without ever getting to fclose(); after all, this is only for Sneaker-Net'ing files around. If I lose a file or even screw up the whole diskette its no great loss. I'd be a lot more concerned if this were the basis of an actual runtime library or operating system. The squeemish should find it easy to make this a write-through cache. Another problem for the squeemish to worry about is having MS-DOS files open on multiple devices (or device images). I made a simple decision that mitigates this problem: I only allow one MS-DOS file at a time. Beware though, nothing prevents two jobs from attempting to use the same MS-DOS file structure at the same time! A sure recipe for disaster if more than one of them opens a file for writing. The inquisitive will note that I defined the data structures in an array, so that it would be trivial to support multiple MS-DOS file opens (as would be necessary, say, to copy an MS-DOS file within its directory). I've even made a simple-minded attempt to support that without corrupting my cache: I check the device and/or file name containing the MS-DOS directory and only create one cache per unique instance. Be warned: if two unique names point to the same physical device and/or file, you're set up for disaster. Its real easy to do this to yourself if you use logical assignments for your diskette drive.