A Standard MIDI File (SMF) is a type of IFF file, similar to a RIFF or WAVE file. Blocks of data are stored as chunks where each chunk has a 4-byte ID followed by a 4-byte length value. Unlike a RIFF file, the SMF does not have a file header, but starts with the first chunk. Note that the SMF file stores binary values in big-endian byte order rather than the little-endian order of the WAVE file.

The first chunk in the SMF has an ID of "MThd" and a size of six bytes. The six data bytes of the header chunk represent the file format (two bytes), the number of tracks in the file (2 bytes) and the time division value (2 bytes). Altogether, the "MThd" chunk has 14 bytes.

struct MTHDchunk {

   uint8  chnkID[4]; // MThd

   uint32 size;

   uint16 format;

   uint16 numTrk;

   int16  tmDiv;


The format value can be 0, 1 or 2. A type 0 file contains a single track with all channels merged into one stream. A type 1 file contains multiple tracks that should be played simultaneously. A type 2 file is used to store independent sequences. Type 2 files are typically used for patterns such as drum loops.

The numTrk value indicates the number of track chunks that follow. For a type 0 file, there is only one track. For type 1 files track 0 is typically used to contain general information such as song title, copyright, program changes, tempo, etc. The remaining tracks store the events that are to be played. However, this is only a convention, not a requirement.

The tmDiv value is used to calculate timing information. If the high-order bit is clear, it represents the number of divisions of a quarter note. The actual length of a quarter note is determined by the tempo event embedded in the track data. If no tempo event is found, then the tempo is assumed to be 120 bpm, i.e. a duration of 1/2 second per quarter note. When the tmDiv value has the high-order bit set, it is used to indicate SMTPE time. The lower seven bits of the upper byte represents frames per second while the lower byte represents number of "ticks" per frame.

The ‘MThd' chunk is followed by one or more track chunks. The track chunk has an ID of ‘MTrk' and a variable size indicated by the 4-byte chunk length value. The data content of the chunk is a stream of MIDI events and timing information. To load a SMF, we read and verify the header chunk, store the header values, then loop through the file looking for track chunks until the total number of tracks have been processed.

When reading binary values, such as the chunk size, we can read the values directly into the appropriate size variable, and then rearrange the byte order if needed. However, it is just as easy to read the value one byte at a time and build the value from the bytes. The same code can then be used on any processor architecture. For example, the following function reads a 32-bit quantity in a portable manner. Similar code is used to read a 16-bit quantity. Note that this code does not check for end of file. The feof function can be used for that purpose.

uint32 ReadLong() {

   return (fgetc(fp) << 24)

        + (fgetc(fp) << 16)

        + (fgetc(fp) << 8)

        +  fgetc(fp);



uint16 ReadShort() {

   return (fgetc(fp) << 8) + fgetc(fp);


The track data stream consists of interspersed time and event information. Each event is preceded by a relative time that indicates how long the sequencer should wait before executing the event. The timing values are stored with a variable number of bytes where all but the last byte has the high bit set. The lower seven bits of each byte contain the value. To construct the timing value, we have to process bytes until we locate the byte without the high bit set, accumulating the 7-bit quantities into the value.

uint32 GetVarLen() {

   long value = 0;


      value = (value << 7) + (*inpPos & 0x7F);

   while (*inpPos++ & 0x80);

   return value;


MIDI timing values represent a somewhat arbitrary value called a delta time. The actual length of the delta time is controlled by two other values contained within the file, the tempo event and division value from the header chunk. The tempo event is a META event containing three bytes of data. Unlike other MIDI data values, these are full eight bit values. The tempo value indicates the number of microseconds per quarter note. Calculation of actual time for an event is discussed below.

The delta time value is followed by one of the MIDI events described above. To process the event stream we first read the delta time then examine the next byte to determine the message. Depending on the message, we then process the data bytes appropriately. However, we must watch for a running status. We need to check the byte to see if it really is a message and reuse the last channel message if the high bit is not set. Note that system common messages do not change the running status.

Many of the MIDI events are real-time events that determine how the synthesizer responds to MIDI events. These events can be discarded if we only want to play all the notes in the file, but we have to check each event in order to skip the appropriate number of bytes. Likewise, we can ignore most META events since they only provide information about the sequence such as title, copyright, time signature, etc. The two META events that are essential are end of track and tempo. The events that are of most concern for sequencing are the channel messages. These include the note on, note off, control change, program change, and pitch bend events. These events need to be stored for playback.

Note that we use unsigned char type for the input buffer. Otherwise, the messages will get sign-extended when stored in an integer value. In the following code, the AddEvent function is a place-holder for the code to create an appropriate sequencer event and add the event to the event list for the current track.

FILE *fp;

unsigned char *inpBuf;

unsigned char *inpPos;

unsigned char *inpEnd;

MTHDchunk hdr;

short trkNum;

short lastMsg;

long deltaTime;


LoadFile(file) {

   char chunkID[4];

   short msg;

   long trackSize;

   fp = fopen(file, “rb”);

   fread(hdr.chunkID, 4, 1, fp);

   hdr.size = ReadLong();

   if (memcmp(chunkID, ‘MThd’, 4) == 0) {

      hdr.format = ReadShort();

      hdr.numTracks = ReadShort();

      hdr.tmDiv = ReadShort();

      trkNum = 0;

      while (trkNum < hdr.numTracks && !feof(fp)) {

         fread(chunkID, 4, 1, fp);

         trackSize = ReadLong();

         if (memcmp(chunkID, ‘MTrk’, 4) == 0) {

            inpBuf = malloc(trackSize);

            fread(inpBuf, 1, trackSize, fp);

            inpPos = inpBuf;

            inpEnd = inpBuf + trackSize;

            while (inpPos < inpEnd) {

               deltaTime = GetVarLen();

               msg = *inpPos;

               if ((msg & 0xF0) == 0xF0) {


                  if (msg == 0xFF)




               } else {

                  if (msg & 0x80) {


                     lastMsg = msg;

                  } else

                     msg = lastMsg;






         } else

            fseek(fp, trackSize, SEEK_CUR);






MetaEvent() {

   long tempo;

   short meta = *inpPos++;

   long metalen = GetVarLen();

   switch (meta) {

   case 0x2F: // end of track

      AddEvent(0xFF, meta, 0, 0);


   case 0x51: // tempo

      tempo  = (*inpPos++ << 16);

      tempo += (*inpPos++ << 8);

      tempo += *inpPos++;

      AddEvent(0xFF, meta, tempo);



      inpPos += metalen;





SysCommon(short msg) {

   long datalen;

   switch (msg) {

   case 0xF0: // SYSEX

      datalen = GetVarLen();

      inpPos += datalen;


   case 0xF1: // MIDI time code

   case 0xF3: // Song select

      inpPos += 1;


   case 0xF2: // Song position

      inpPos += 2;


   default: // remaining values have no data





ChannelMessage(short msg) {

   short val1, val2;

   short chnl = msg & 0x0F;

   switch (msg & 0xF0) {

   case 0x80: // Note Off

   case 0x90: // Note On

   case 0xA0: // After Touch

   case 0xB0: // Control change

      val1 = *inpPos++; // key/controller

      val2 = *inpPos++; // velocity/pressure/value

      AddEvent(msg, chnl, val1, val2);


   case 0xC0: // Program Change

   case 0xD0: // Channel Pressure (Aftertouch)

      val1 = *inpPos++;

      AddEvent(msg, chnl, val1, 0);

      cnt = 1;


   case 0xE0: // Pitch bend

      val1 = *inpPos++;

      val1 = (val1 << 7) + *inpPos++;

      AddEvent(msg, chnl, val1, 0);

      cnt = 2;






Dan Mitchell's Personal Website