I got interested in this as I was bored and enjoy an occasional puzzle.
I've done some previous reverse engineering work on the MsBackup format
used in Win9x. I've reverse engineered this format sufficiently to
parse sample backup files I created with the 1 Step Backup
version 5.3 installed on a couple of my Win9x systems equipped with Zip
drives. The following description is sufficient, but preliminary and has many holes in
it as described below. However the information provided allows one to parse the files
and recover the original data from them.
The file header appears to be 0x200 bytes long. Typically the raw file data
section immediately follows this header. Backups could span multiple
target disks using a set of removable Iomega media for a single backup.
Only one *.1-Step file was allowed on each of the media disks. Often
this file filled the disk, but there was an option to leave existing files
and just use the remaining space on the media. The files had names
which embedded the job number, disk number, and date in a string like:
"Backup Job 7, Disk 1, 16-10-27 19.36.39.1-Step"
ie "Backup Job #, Disk #, <date> <time>.1-Step"
After creation these files can obviously be renamed, but the name format
above is what the program initially creates. The Job # is maintained in
a database by the 1 Step Backup program on the target machine. Each new backup
increments the job # for the machine. The Disk # is the number of the disk in
the backup set. A one disk set will have Disk # = 1, multiple disk sets will
have Disk # 1 to # = max number of disks used. The job and disk numbers are
also stored in each files header as show below.
The data format in these files is Intel specific big endian.
The sample code I have written is targeted at 32 bit systems which
is what I believe this software ran on. The only 64 bit value I've
seen maybe the date/time stamp used in the header. Oddly I have not
been able to decipher the format used for this, but since this time stamp
is also displayed as an ASCII string in the catalog region there is no
compelling reason to understand it. It can be used as an identifier in
backups that span multiple disk media as it remains constant for all
file headers.
A dump of a typical headers is shown below:
This is the file header at file offset 0
00000: CD AB CD ABÂ 00 02 00 00Â 02 00 01 00Â B3 C3 D4 25 |...............% 00010: DA D5 E4 40Â 00 00 00 00Â 07 00 02 00Â 04 93 EA 00 |...@............ 00020: 00 00 00 00Â 00 00 00 00Â 00 00 00 00Â 01 00 00 00 |................ 00030: 01 00 00 00Â 74 65 73 74Â 38 20 32 20Â 64 69 73 6B |....test8 2 disk 00040: 20 75 6E 63Â 6F 6D 70 72Â 65 73 73 65Â 64 00 00 00 | uncompressed...
offset bytes use 0x0 4 appears to be a signature always bytes shown above 0xc 8 appears to be a date time stamp, format 8 unknown byte binary 0x18 2 job number 0x1a 2 disk number 0x1c 4 offset to catalog section in file 0x34 0-0x1cc optional descriptive text string entered by user
My Name # of fields description
1 drives 13 generic backup data, drive names and letters
2 directories 4 maps directory names to numeric id
3 files 9 maps file names to drives and directories by #
4 comp 11 compression info, even if compressed backup some not
5 job 11 target machine job # and if a compressed volume
apparently one one such record in the catalog
6 paths 3 maps file name to path to file
Records appear to be in order that files above are
listed, normally two records per file, 1st has a path string,
2nd associated file name. In some of my samples the last
path has no file names after it implying the remaining
files are all in the last path/directory.
7 session 6 one record for each media file in data set
Following each binary structure definition header is an ascii region
consisting of 0x20 byte blocks which start with the name of each field.
Other than the name most bytes in this block are zero. Guessing at the
3 other non-zero bytes:
0 11 ascii chars creating a field name, some maybe NUL
0xb data type 0x4E => Numeric, 0x43=>Character, 1 byte long
0xc binary offset from start of data for this entry to start of field
value in data block, monotonically increasing from 1 for each field
0x10 binary length of field, ie sum of lengths is byte length for each entry.
This list of fields is terminated with an 0xd byte where the next
field name would start. The ascii data region follows. It is in turn
terminated with an 0x1a byte.
Backup name: test1
Job 3 Backup Disk 1
Catalog at offset 0x44944
attempt to step through catalog and locate structure definition regions
;one data record for each drive traversed in the backup
start of Structure Definition 1: Disk at 0x45a14
1 SERIAL type N len 12 ; record #
2 NUMDIRS type N len 12 ; ? typically '0'
3 NUMFILES type N len 12 ; ? typically '0'
4 VLSRDW type N len 12 ; ?
5 USED_HI type N len 12 ; 2 fields for # of bytes used?
6 USED_LO type N len 12
7 FREE_HI type N len 12 ; 2 for fields for # bytes available?
8 FREE_LO type N len 12
9 LABEL type C len 12 ; Drive Volume name
10 VLSRDT type C len 14 ; time stamp as a string (possibly previous backup date)
11 DATETIME type C len 14 ; time stamp as a string (looks like current date)
12 DRV_LTR type C len 2 ; drive letter followed by a colon
13 HASDSKST type N len 12 ; ? typically '1'
note: so far I have ignored fields 5 and 7 and used a long int for the _LO value
in my program. For the sample files I have looked at the _HI value is always 0
;one data record for each directory traversed in the backup
start of Structure Definition 2: Dir at 0x47b04
1 SERIAL type N len 12 ; record #
2 DISKSER type N len 12 ; record # in Disks array, ie source disk for data
3 DIRSER type N len 12 ; record # of parent directory
4 NAME type C len 240 ; directory name
;one data record for each file in backup
start of Structure Definition 3: File at 0x497fa
1 SERIAL type N len 12 ; record #
2 DIRSER type N len 12 ; ndx of dir in Structure Definition 2: Disk
3 STATUS type N len 12
4 DISKSER type N len 12 ; ndx of disk in Structure Definition 1: Disk
5 ATTRIB type N len 12 ; file attribute, typically 32
6 SIZE_HI type N len 12 ; 2 field # of bytes in the file
7 SIZE_LO type N len 12
8 DATETIME type C len 14 ; file timestamp as an asci string
9 NAME type C len 240 ; file name
at least one File data record per file in backup, and records may be continued per
below maximum ORGSIZE and COMPSIZE appears to be 65535
start of Structure Definition 4: Comp at 0x4be7a
1 SERIAL type N len 12 ; record #
2 ORGSER type N len 12 ; record # in files array
3 SEQUENCE type N len 12 ; sequence # in this record set >= 1
4 ORGSIZE type N len 12 ; original file size (if 65535 its part of a set)
5 COMPSIZE type N len 12 ; compressed file size
6 ARCDSKSE type N len 12 ; ?
7 CHK_SUM type N len 12 ; ? probably a check sum for compression used
8 COMP_LVL type N len 12 ; ? type of compression ? if compressed has been 4
9 OFFS_HI type N len 12 ; 2 field cumulative offset into backup data to start of this file
10 OFFS_LO type N len 12
11 IS_LAST type N len 12 ; boolean 0 unless its the last record of this group
note: only the first record exists if the file is not compressed, if compressed
there is a minimum of 1 of these records for each file record, and there maybe multiple
records for a file. Records for a give file continue until IS_last = 1
start of Structure Definition 5: Job at 0x4ddb3
1 SERIAL type N len 12 ; record #
2 JOBNUM type N len 12 ; Job # from machine backup was run on
3 NUMDISKS type N len 12 ; number of media disks used in this backup
4 TGDRV type N len 12 ; ? possibly drive # of target Iomega drive
5 TGDRVT type N len 12 ; size of target media, Zip drive use 100
6 HASPSW type N len 12 ; ?
7 ISCUST type N len 12 ; ? appears to be boolean, if 1 a customized backup selection
8 ISCOMP type N len 12 ; 0 if not compressed, 1 if compression is used
9 CUSTDATE type C len 14 ; ascii time stamp for backup
10 PASSWORD type C len 32 ; apparently a password could be used, has been all spaces in my samples
11 DESCR type C len 256 ; descriptive string input by user at time of backup
note: I believe there is only one Job data record related to the current backup
it is always record #1 following the initial configuration record # 0
This appears to be a list of the paths selected for backup, it does not include the subdirectories
which may have been included below this path. To cover all possible directories accessed use
the Dir Structure definition 2
start of Structure Definition 6: Path at 0x4ee53
1 SERIAL type N len 12 ; record #
2 DATATYPE type N len 12 ; 1 if path name, 2 if file name
3 DATATEXT type C len 240 ; asciii path or file name
note: typically two Path data records for each file in Backup, one for the path
with datatype 1 and one for a file name with datatype 2. In some of my
samples the list ends with the last path name and no following file
name, in this case all additional files in the backup set go in this path.
start of Structure Definition 7: Session at 0x51480
1 SERIAL type N len 12 ; record #
2 SESSFROM type N len 12 ; starting buffer read count in this media file
3 SESSTO type N len 12 ; last buffer read count in this media file
4 JOBNUM type N len 12 ; Job # from machine backup was run on
5 DISKNUM type N len 12 ; # of backup file with catalog => # of disks in set
6 SESSDATE type C len 14 ; ascii date string
Note: there appears to be one of Session record for each media disk (file) in the
backup set. It helps map the files to a specific media disk number. I believe the program
uses a fixed size buffer, and advances the read count by 1 each time it refreshes this
buffer as it reads the cumulative data from the media disks. It apparently fills the
buffer each time until all the data is read, but writes it out on a file by file basis using
the file specific length required in the write (based on file length if not compressed, or
otherwise the compressed length). This means that a files data may well span the more than
one disk, ie there is no guarantee the data area on any of the disk except the first begins
at the start of a file. It may well contain data a continuation of a files data from the
previous disk. My session data display via the -vs7 command line shows the session data and
total number of files in the backup as a comparison. I currently only have two examples with
two media files. This would be more useful if I knew the buffer size used, but I am still
trying to work this out!