- This article has been moved or is in the process of being moved to the Sinclair FAQ Wiki, under various articles. You may find more complete information there.
The comp.sys.sinclair FAQ has documentation for many file formats used by Spectrum emulators.
Disk image formats
+3 DSK format
The official +3 DSK specification can be found at Kevin Thacker's site.
+D / DISCiPLE DSK format
'DSK' as used for MGT (+D / DISCiPLE) disks is simply a raw disk image with 10 tracks per sector.
HDF files are used to store hard-disk data into image files for emulation purposes. They consist in a file header, followed by a raw dump of the tracks data.
The following is the format of the HDF header. All numbers are hexadecimal and in little endian (Intel x86) byte order.
Offset Len Meaning ----------------------------------------------------------------------------- 00 06 "RS-IDE" 06 01 0x1A 07 01 Revision number (BCD): 0x10 (v1.0) 08 01 b0: halved sector data (only LSB of sector words is stored) 09 02 offset of hard-disk data (0x0080) 0B 0B reserved (MUST be set to 0x00) 16 6A first 53 words (106 bytes) of IDE/ATA identification data, as returned by ATA command 0xEC word ?? raw hard-disk data (C0 H0, C0 H1 ... C0 H15, C1 H0, C1 H1 ...)
This differs from version 1.0 in that the full 512-byte (256 word) identification data packet is included in the header.
Offset Len Meaning ----------------------------------------------------------------------------- 00 06 "RS-IDE" 06 01 0x1A 07 01 Revision number (BCD): 0x11 (v1.1) 08 01 b0: halved sector data (only LSB of sector words is stored) 09 02 offset of hard-disk data (0x0216) 0B 0B reserved (MUST be set to 0x00) 16 200 IDE/ATA identification data, as returned by ATA command 0xEC word ?? raw hard-disk data (C0 H0, C0 H1 ... C0 H15, C1 H0, C1 H1 ...)
Note: IDE devices transfer data in 16-bit words. Since the Z80 data bus is only 8-bit, so some IDE adapters use additional logic to split the IDE word into two bytes so that the Z80 can fetch them. However, other adapters discard the most significant byte of the word completely, in favour of a simplified circuitry; in this case, only half of the nominal capacity of a disk sector is used. Bit 0 at offset 0x08 is introduced to indicate this: when it is set, it means that the sector size specified by the IDE identification data is actually halved in the HDF file. This is done to reduce the HDF file size, by storing only the "usable" significant data; for all the supported adapters, the least significant byte is stored.
The IDE identification data format is reported into any IDE/ATA technical paper. It contains information about the drive geometry (cylinders, heads, sectors, sector size), the device model name, the supported features and so on.
Tape image formats
TAP format (and variants)
- This section has been moved or is in the process of being moved to the Sinclair FAQ Wiki, under the "TAP format" article. You may find more complete information there.
The .TAP files contain blocks of tape-saved data. All blocks start with two bytes specifying how many bytes will follow (not counting the two length bytes). Then raw tape data follows, including the flag and checksum bytes. The checksum is the bitwise XOR of all bytes including the flag byte. For example, when you execute the line SAVE "ROM" CODE 0,2 this will result:
|------ Spectrum-generated data -------| |---------| 13 00 00 03 52 4f 4d 7x20 02 00 00 00 00 80 f1 04 00 ff f3 af a3 ^^^^^...... first block is 19 bytes (17 bytes+flag+checksum) ^^... flag byte (A reg, 00 for headers, ff for data blocks) ^^ first byte of header, indicating a code block file name ..^^^^^^^^^^^^^ header info ..............^^^^^^^^^^^^^^^^^ checksum of header .........................^^ length of second block ........................^^^^^ flag byte ............................................^^ first two bytes of rom .................................^^^^^ checksum (checkbittoggle would be a better name!).............^^
Note that it is possible to join .TAP files by simply stringing them together; for example, in DOS / Windows: COPY /B FILE1.TAP + FILE2.TAP ALL.TAP ; or in Unix/Linux: cp file1.tap all.tap && cat file2.tap >> all.tap
For completeness, I'll include the structure of a tape header. A header always consists of 17 bytes:
|0||1||Type (0,1,2 or 3)|
|1||10||Filename (padded with blanks)|
|11||2||Length of data block|
The type is 0,1,2 or 3 for a Program, Number array, Character array or Code file. A SCREEN$ file is regarded as a Code file with start address 16384 and length 6912 decimal. If the file is a Program file, parameter 1 holds the autostart line number (or a number >=32768 if no LINE parameter was given) and parameter 2 holds the start of the variable area relative to the start of the program. If it's a Code file, parameter 1 holds the start of the code block when saved, and parameter 2 holds 32768. For data files finally, the byte at position 14 decimal holds the variable name.
(originally from TECHINFO.DOC supplied with Z80 by Gerton Lunter)
De-facto TZX format conventions
The TZX format specification says that text fields should exclusively use ASCII symbols. Over time, this has been found to fall short of what has been required in creating Archive info blocks which at least require the pound and Euro currency symbols as well as accented characters (for European names) and should probably also extend to at least Cyrillic text in the future to accommodate software from Eastern Europe.
While the 1.20 TZX format updates were being drafted it was agreed that as a practical first step that the string encoding would be redefined to be ISO Latin 1 (also known as ISO 8859-1) which would formalise the encoding used for the pound sign being used by World of Spectrum and allow the use of many more accented characters at the same time. Unfortunately this text was accidentally omitted from the final document leaving the change as an informal extension.
Since then there have been several ZX Spectrum software releases sold in Euros which has created a desire to support that symbol in the Archive info blocks as well. This is missing from the ISO Latin 1 character set so a new approach was required to achieve the goal.
A discussion on the World of Spectrum forums considered some options on how this could be accommodated without too much disruption to existing software and tools. The result of the discussion was that World of Spectrum has moved to using the Windows code page 1252 character mapping for the Euro symbol in its Archive info blocks from the Infoseek tool and that character mapping should be considered a de-facto standard for TZX files in any tool wanting to correctly interpret blocks from this source.
Hopefully a future revision of the TZX file format will formalise this usage and endorse the use of UTF-8 in the future to support the remaining characters missing from Windows code page 1252.
The PZX format is another file format for efficient storage of tape-saved data designed to be simpler to support in utilities while retaining the important features of the TZX format. The official PZX specification can be found on its own home page.
ZX-State (SZX) format
- This section has been moved or is in the process of being moved to the Sinclair FAQ Wiki, under the "ZX-State format" article. You may find more complete information there.
Moved to Sinclair FAQ Wiki by original author with no modifications by other parties.