Difference between revisions of "Mohawk Sounds"

From A look inside The Link @ wiki
Jump to: navigation, search
(split off tWAV to a new "Mohawk Sounds" page as many games use this format under various tags)
 
(I added a link to more information on ADPCM encoding)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
 
{{Myst}}
 
{{Myst}}
 
{{Riven}}
 
{{Riven}}
This page shows the structure of Mohawk Sound resources, which store game sounds and music. Though this document is complete enough to correctly decode sounds (at least "by ear"), the format has a number of obscure details and unknown fields, especially those connected to data sizes. Finally, thanks to Ron Hayter for providing details on the DVD version of Riven's tWAV resources.
+
It happens that Myst, Riven and other products share a somewhat common format for storing audio data (sounds and music): we called it the ''Mohawk sounds'' format. Though this page is complete enough to correctly decode sounds (at least "by ear"), the format has a number of obscure details and unknown fields, especially those connected to data sizes. Thanks to Ron Hayter for providing details on the DVD version of Riven's tWAV resources.
  
The resources are structured in chunks, much like the common WAV audio format. The audio data can be compressed in three ways. The first is raw unsigned PCM data, used in older games. The second is Intel DVI ADPCM format, a (lossy) differential encoding which stores the difference between consecutive samples as 4-bit delta samples, yielding a compression factor of 4:1. The third (used in the DVD version of Riven) is MPEG-2 Layer II encoding.
+
The format is structured in chunks, much like the common WAV audio format. The audio data can be compressed in three ways:
 +
* Raw unsigned PCM data, used in older games.
 +
* Intel DVI ADPCM format, a lossy differential encoding which stores the difference between consecutive samples as 4-bit delta samples, yielding a compression factor of 4:1.
 +
* MPEG-2 Layer II encoding - Used in the DVD version of Riven.
  
It's important to note that the compressed data block contains more data than necessary (in Riven): the additional data is just garbage at the end of the block, and is skipped by the Riven decoder when the sound is played. Unfortunately, the tWAV headers seem to ignore this excess data: this makes reverse-engineering of these fields very difficult, since every relation with the full resource size is lost. It's necessary to introduce an "effective resource size" excluding the excess data.
+
It's important to note that the compressed data block contains more data than necessary (in Riven): the additional data is just garbage at the end of the block, and is skipped by the Riven decoder when the sound is played. Unfortunately, the headers seem to ignore this excess data: this makes reverse-engineering of these fields very difficult, since every relation with the full resource size is lost. It's necessary to introduce an "effective resource size" excluding the excess data.
  
 
==The header==
 
==The header==
Line 59: Line 62:
  
 
==The Cue# chunk==
 
==The Cue# chunk==
This chunk is rare, only a few tWAV resources have it and just one resource has an interesting one (that's tWAV 3 from p_Sounds.mhk). This chunk seems to contain "cue points", in a way similar to the corresponding chunk of the WAVE format. This is much more common in other games, especially Myst.
+
This chunk is rare in Riven, only a few tWAV resources have it and just one resource has an interesting one (that's tWAV 3 from p_Sounds.mhk). This chunk seems to contain "cue points", in a way similar to the corresponding chunk of the WAVE format. This is much more common in other games, especially Myst.
  
 
{| class="structure"
 
{| class="structure"
Line 87: Line 90:
 
*''name'' is the associated string (zero-terminated).
 
*''name'' is the associated string (zero-terminated).
  
Most Cue# chunks have ''point_count'' set to 0, so they contain nothing. tWAV 3 from p_Sounds.mhk has two cue points, named <code>Beg Loop</code> and <code>End Loop</code>. Please note that the chunk structure has been guessed from this single case, so the statistics is very poor :-\
+
Most Cue# chunks have ''point_count'' set to 0, so they contain nothing. Riven tWAV 3 from p_Sounds.mhk has two cue points, named <code>Beg Loop</code> and <code>End Loop</code>. Please note that the chunk structure has been guessed from this single case, so the statistics is very poor :-\
  
 
I don't know if the engine uses this chunk at all.
 
I don't know if the engine uses this chunk at all.
Line 125: Line 128:
 
*''channels'' is the number of audio channels.
 
*''channels'' is the number of audio channels.
 
*''encoding'' tells how the audio data is stored. It's 0 for raw unsigned PCM, 1 for ADPCM, 2 for MPEG-2 Audio Layer II.
 
*''encoding'' tells how the audio data is stored. It's 0 for raw unsigned PCM, 1 for ADPCM, 2 for MPEG-2 Audio Layer II.
*''loop'' means loop if the value is 0xFFFF (NOTE: Not used in Riven).
+
*''loop'' means loop if the value is 0xFFFF (not used in Riven).
*''loop_start'' is the starting point of the loop (NOTE: Not used in Riven).
+
*''loop_start'' is the starting point of the loop (not used in Riven).
*''loop_end'' is the ending point of the loop (NOTE: Not used in Riven).
+
*''loop_end'' is the ending point of the loop (not used in Riven).
 
*''audio_data'' is the audio data stream, encoded according to encoding. In case of 1-channel ADPC audio, each byte holds 2 compressed samples (higher and lower 4 bits of the byte); in case of stereo ADPC sounds, each byte stores one compressed sample for each channel (higher and lower 4 bits of the byte).
 
*''audio_data'' is the audio data stream, encoded according to encoding. In case of 1-channel ADPC audio, each byte holds 2 compressed samples (higher and lower 4 bits of the byte); in case of stereo ADPC sounds, each byte stores one compressed sample for each channel (higher and lower 4 bits of the byte).
 +
 +
A good reference for ADPCM decoding (which sounds right to me) can be found [http://wiki.multimedia.cx/index.php?title=IMA_ADPCM here]. Use the decoding method involving the if statements and bit-shifts, which sounds the best. It can be found near the bottom of the page.

Latest revision as of 22:04, 10 March 2010

Myst
Mohawk Overview
CLRC EXIT HINT INIT
MJMP MSND PICT RLST
VIEW WDIB HELP RSFL
Scripts Variables
Riven
Mohawk Overview
BLST CARD FLST HSPT
MLST NAME PLST RMAP
SFXE SLST tBMP tMOV
tWAV VARS VERS ZIPS
Scripts Variables
External commands

It happens that Myst, Riven and other products share a somewhat common format for storing audio data (sounds and music): we called it the Mohawk sounds format. Though this page is complete enough to correctly decode sounds (at least "by ear"), the format has a number of obscure details and unknown fields, especially those connected to data sizes. Thanks to Ron Hayter for providing details on the DVD version of Riven's tWAV resources.

The format is structured in chunks, much like the common WAV audio format. The audio data can be compressed in three ways:

  • Raw unsigned PCM data, used in older games.
  • Intel DVI ADPCM format, a lossy differential encoding which stores the difference between consecutive samples as 4-bit delta samples, yielding a compression factor of 4:1.
  • MPEG-2 Layer II encoding - Used in the DVD version of Riven.

It's important to note that the compressed data block contains more data than necessary (in Riven): the additional data is just garbage at the end of the block, and is skipped by the Riven decoder when the sound is played. Unfortunately, the headers seem to ignore this excess data: this makes reverse-engineering of these fields very difficult, since every relation with the full resource size is lost. It's necessary to introduce an "effective resource size" excluding the excess data.

The header

The data block begins with the following header:

4 bytes mhwk_magic
unsigned long size
4 bytes wave_magic
  • mhwk_magic is the string 'MHWK'. It's equal to the Mohawk format signature!
  • size is the effective resource size, minus the ADPC chunk size, minus 2.
  • wave_magic is the string 'WAVE'.

After this header come the chunks. Until now, 3 chunk types have been identified: 'ADPC', 'Cue#' and 'Data'.

The ADPC chunk

This chunk holds some information about the audio sample format. Its size is not constant.

4 bytes chunk_type
unsigned long chunk_size
unsigned short u0
unsigned short channels
unsigned long u1
unsigned long u2[channels]
  • chunk_type is the string 'ADPC'.
  • chunk_size is the chunk size minus 8.
  • channels is the number of audio channels.
  • u0 is 2 when there is the Cue# chunk, and 1 when there isn't.
  • u1 is always 0.
  • u2 is always 0x00400000 for both channels.

If there is the Cue# chunk, then there is additional data in the ADPC chunk:

unsigned long u3
unsigned long u4[channels]
  • u3 seems to be in units of samples (maybe it's a position within the audio stream).
  • u4 looks more like a record than a single unsigned long value, but values are obscure and I have no idea about its meaning.

Finally, the ADPC chunk is only present when there is ADPCM sound.

The Cue# chunk

This chunk is rare in Riven, only a few tWAV resources have it and just one resource has an interesting one (that's tWAV 3 from p_Sounds.mhk). This chunk seems to contain "cue points", in a way similar to the corresponding chunk of the WAVE format. This is much more common in other games, especially Myst.

4 bytes chunk_type
unsigned long chunk_size
unsigned short point_count
  • chunk_type is the string 'Cue#'.
  • chunk_size is the chunk size minus 8.
  • point_count is the number of cue points.

Following this fixed structure there are point_count records; each record describes a cue point with a position inside the audio stream and an associated ASCII text string:

unsigned long position
unsigned char name_len
unsigned char name[name_len+1]
  • position is the cue point position within the audio stream, in units of samples.
  • name_len is the length of the associated string.
  • name is the associated string (zero-terminated).

Most Cue# chunks have point_count set to 0, so they contain nothing. Riven tWAV 3 from p_Sounds.mhk has two cue points, named Beg Loop and End Loop. Please note that the chunk structure has been guessed from this single case, so the statistics is very poor :-\

I don't know if the engine uses this chunk at all.

The Data chunk

This chunk is always present since it contains the actual audio samples.

4 bytes chunk_type
unsigned long chunk_size
unsigned short sample_rate
unsigned long sample_count
unsigned char bits_per_sample
unsigned char channels
unsigned short encoding
unsigned short loop
unsigned long loop_start
unsigned long loop_end
variable audio_data
  • chunk_type is the string 'Data'.
  • chunk_size is the full chunk size, including chunk_type and chunk_size itself.
  • sample_rate is the audio sampling rate (always 22050).
  • sample_count is the number of audio samples.
  • bits_per_sample is the number of bits per sample per channel.
  • channels is the number of audio channels.
  • encoding tells how the audio data is stored. It's 0 for raw unsigned PCM, 1 for ADPCM, 2 for MPEG-2 Audio Layer II.
  • loop means loop if the value is 0xFFFF (not used in Riven).
  • loop_start is the starting point of the loop (not used in Riven).
  • loop_end is the ending point of the loop (not used in Riven).
  • audio_data is the audio data stream, encoded according to encoding. In case of 1-channel ADPC audio, each byte holds 2 compressed samples (higher and lower 4 bits of the byte); in case of stereo ADPC sounds, each byte stores one compressed sample for each channel (higher and lower 4 bits of the byte).

A good reference for ADPCM decoding (which sounds right to me) can be found here. Use the decoding method involving the if statements and bit-shifts, which sounds the best. It can be found near the bottom of the page.