The AMB Ambisonic File Format
This file format was developed during 1999 as a simple example of customisation of the then new WAVEFORMATEXTENSIBLE (WAVE_EX) file format. It was described briefly in an appendix to my paper Developments in Audio file Formats presented at ICMC 2000. Prior to this, I had published the Soundcard Attrition Page, identifying soundcards which variously supported true multi-channel streams, or were confined to multiple stereo devices.
While WAVE_EX had made a modest appearance in immediately prior versions of Windows, it was with Windows 2000 that it became fully integrated into the OS, including the introduction of a new audio device driver model. This dramatically advanced the scope for playing ("rendering") multi-channel files. This was of incalculable importance to electro-acoustic composers, many of whom had already been working with Ambisonics (as well as with generic multi-speaker diffusion) for decades. Equally important was the possibility for users to define arbitrary customisations of WAVE_EX (in effect, to create entirely custom application-specific file formats) without any need to register them with Microsoft.
The challenge posed by Ambisonic B-Format is simple: it is a multi-channel time-domain format which can seemingly be represented by any available file format, such as WAVE and AIFF, yet stands apart from them by being (possibly uniquely) a matrix-encoded format which, in most cases, musicians do not merely wish to play back as is but to decode directly to a chosen speaker layout. There was no choice but to create B-Format files using one of the existing standard WAVE or AIFF formats. These were therefore intrinsically ambiguous – only the author could know which were B-format files and which were plain multi-channel files. Ad hoc strategies to resolve this included the use of structured file names, and/or custom but unofficial file name extensions (e.g. .wxyz).
The "composer's dilemma" can be summed up thus:
The AMB format fully resolves this dilemma by meeting the necessary criterion for an unambiguous file format - that only software that recognises the format (and can therefore decode it automatically) can open the file. For example, this criterion was supported from the outset by the soundfile play programs (playsfx, later paplay) in the CDP Multi-Channel Toolkit first published around the same time.
The descriptions that follow are intended for use by application developers wishing to support the AMB format. It is not necessary for users to be fully conversant with these relatively low-level details.
Being based on WAVE_EX, the AMB format is both a file format and a stream rendering format. The latter property is relevant primarily to Windows systems, where the format obtained from a file or audio input device can be passed directly to an output device or software plugin, if they support the format. The use of a custom GUID ensures that AMB files will not (and should not) be recognised as a soundfile by applications unaware of the format.
The AMB format supports up to 16 channels. This is sufficient to support Ambisonic streams up to and including full 3rd-Order periphonic (3D or "with-height"). For more information on Ambisonic B-Format, see e.g. http://en.wikipedia.org/wiki/Ambisonics.
The number of channels in the file is sufficient to identify unambiguously which combination of horizontal and vertical B-Format signals is represented. By convention these are identified by a single letter, commencing with the four comprising classic first-order: WXYZ,RSTUV,KLMNOPQ. In an AMB file, channels are always interleaved in this order - unused channels are simply omitted.
Channels in File
|
Order (Horizontal+Height)
|
description
|
B-Format signals
|
1
|
1
|
mono
|
W
|
2
|
1
|
"Mid-Side"
|
WY
|
3
|
1
|
first-order horizontal
|
WXY
|
4
|
1+1
|
first-order 3D
|
WXYZ
|
5
|
2
|
2nd order horizontal
|
WXY,UV
|
6
|
2+1
|
mixed
|
WXYZ,UV
|
7
|
3
|
3rd order horizontal
|
WXY,UV,PQ
|
8
|
3+1
|
mixed
|
WXYZ,UV,PQ
|
9
|
2+2
|
2nd-order 3D
|
WXYZ,RSTUV
|
11
|
3+2
|
mixed
|
WXYZ,RSTUV,PQ
|
16
|
3+3
|
3rd-order 3D
|
WXYZ,RSTUV,KLMNOPQ
|
The mono and stereo examples are included only for the sake of completion. There is no point in creating a mono AMB file, and "Mid-Side" is a form of stereo recording (cardioid, omni or other "central" microphone, plus figure-of-eight microphone) which is typically treated somewhat differently from a canonical B-format stream – it offers what may be called an "interesting comparison". The AMB format does not support UHJ (stereo-compatible) encodings. A file format for this has been defined by Martin Leese, described here.
File formats for Higher Order Ambisonics supporting 4th-Order and beyond are under development by the surround sound community. These are intended in time to become the file format of choice for B-Format data, and will inevitably be much more elaborate. One recently published example is described here.
The format definition below should be read in conjunction with the Microsoft document detailing WAVE_EX, familiarity with which is assumed.
The WAVE_EX format allows for new 'Subtype' Globally Unique IDentifers (GUIDs) to be defined by anyone for custom soundfile formats. It is appropriate to use this for B-format since it is reasonable to send such data directly to a soundcard - e.g to an external B-Format decoder, or to a software 'plugin' decoder. While not conventional speaker feeds, the B-format channels are nevertheless normal audio signals, and can reasonably be processed in a real-time audio streaming environment.
There are two B-Format GUIDs, for integer and 32 bit floating-point sample types. Note that they follow Microsoft's own practice with the multimedia GUIDs, where the two GUIDs are identical except for the first field, which reflects the format flag values for the corresponding standard WAVE format.
SUBTYPE_AMBISONIC_B_FORMAT_PCM
{00000001-0721-11d3-8644-C8C1CA000000}
SUBTYPE_AMBISONIC_B_FORMAT_IEEE_FLOAT
{00000003-0721-11d3-8644-C8C1CA000000}
This "type-3" GUID enables the file to distinguish between 32 bit integer and 32 bit float samples.
The B-format signals are interleaved for each sample frame in the channel order given above.
The file should be given the extension .amb. Applications should recognise the file by inspecting its header data (containing the AMB GUIDs).
B-Format signals may be recorded directly using a microphone such as the SoundField or the Tetramic.
Alternatively, an aribtrary mono source can be located or panned via B-Format encoding in 2D or 3D (periphonic) space.
Encoding follows the Furse-Malham (FuMa) scheme. This reflects the original form of the B-Format specification, not least as associated with the Soundfield microphone. Technical descriptions of Ambisonic theory, explaining the many engineering and mathematical reasons that lead to the various rules for encoding each B-Format signal, are widely available on the net, and are beyond the scope of this page. The references below provide the primary formulae for encoding source data in the FuMa scheme required by the AMB format.
Note that these formulae reflect the classic Ambisonic convention (associated particularly with the Soundfield microphone) that the W ("omni") signal is scaled by -3dB; this scaling is required by the AMB format. Encoding software will apply this automatically as part of the encoding process.
A decoder program must either degrade gracefully, or reject formats it cannot handle. Many options are available for decoding B-Format streams, ranging from classic formulae to decode to simple regular speaker layouts, to more specialised schemes to support irregular and (moderately) arbitrary layouts. The above references give several examples. Decoding may also incorporate shelf filtering and distance compensation, to optimise reproduction in small spaces.
For all B-format configurations, the WAVE_EX dwChannelMask field should be set to zero.
Though strictly speaking an optional chunk, it is recommended that the PEAK chunk be included in all B-Format files. It should always precede the <data> chunk. Apart from its general utility, it has the special virtue for B-format in that applications can, for example for a first-order file or stream, determine from the peak value for the Z channel whether the file is indeed full periphonic B-format (i.e. with height information), or horizontal-only (Z channel present but empty).
The GUID (as defined by Microsoft) is expressed in C code as a structure:
typedef struct _GUID
{
unsigned long
Data1;
unsigned short
Data2;
unsigned short
Data3;
unsigned char
Data4[8];
} GUID;
Thus, the SUBTYPE_AMBISONIC_B_FORMAT_PCM GUID will be written as:
{0x00000001,0x0000,0x0010, {0x80,0x00, 0x00,0xaa,0x00,0x38, 0x9b,
0x71}}
Note that the three numeric elements of this structure are written to disk (as are all numeric values in WAVE files) in little-endian format (least significant bytes at the lower addresses). The remaining eight bytes are written in sequence as for any string.
The surround sound file repository ambisonia.com includes a large number of files in AMB format, including some 2nd-order examples. They range widely from classical music recordings to ambient and other soundscapes and electro-acoustic compositions.
Last updated: Oct 26 2012