Coding In-Band Tracks for HTML DOM Media Element
DOM and HTML Embedded Content – Part 3
Foreword: In this part of the series, I talk about tracks that are in the media resource file; and how to code the tracks.
By: Chrysanthus Date Published: 22 Mar 2016
Introduction
Categories of Media Tracks
Audio Tracks
"translation"
Assume that you have an audio resource file. This audio resource can have three tracks in the file. The primary track can be a speech of the president in English. You can have another audio track in the file, which is the same speech, but in French (translated into French). You can have a third track, which is the same speech but in German. These three tracks cannot be played at the same time. The author (programmer) has a way of making the user choose one track at a time (see below).
So, translation is a category of tracks. When verifying what kind of track you have, using code, the category (e.g. translation) is returned as a string.
"descriptions"
Assume that you have a film (video). How does a blind man or the driver of a car play the film? The description of the film can be written (as text). This text can be converted into audio, by a synthesizer. Such sound is of the description category. So, descriptions is a category of a video track.
"main-desc"
This is the primary audio track, mixed with audio descriptions. Assume that you have a play (theatre) that is recorded as audio. That play can be mixed with audio descriptions (for a blind man).
Video Tracks
"sign"
You might have watched a president at the TV set, making a speech. At the bottom right corner of the TV set, in a rectangle, you may see a person, waving his hands in a special way. That person is communicating to those who cannot hear, what the president is saying. That person is using what is called, sign-language. Those who cannot hear can still later on see only that person on the entire TV screen. That is video without sound. That is a track, of the sign category. So "sign" is a sign-language interpretation of an audio track (speech of president).
"captions"
Assuming that you are witching a video, in which a journalist is describing a town. Assume that the town has one main street. In the video, when the main street is being shown, you can see at the bottom of the video, the text, “Main Street”. When the journalist is talking about the outskirts of the town, you can see at the bottom of the video, the text “Outskirts”.
Now the texts, “Main Street” and “Outskirts” are called, captions. All such captions, would form, one text track. So "captions" is a version of the main video track with captions burnt in, resulting in one overall (video) track.
"subtitles"
You can have a film (video) in English, and the talking appears as text, at the bottom of the screen in French for French speakers who cannot hear English. When the text track is burnt into the video, the resulting video is of the subtitles track category.
"alternative"
It is possible to have an audio song written by one artist, and then the same song but modified, written by another artist. These two songs can be in one audio resource file. The author (programmer) will make it possible for the user to choose between the two songs (see below).
It is possible to have a video, which is the story in a book. It is possible to have the same video, of the same story but directed by a different film director, with different actors. These two videos can be in the same video resource file. The author will make it possible for the user to choose between the two songs (see below).
So, the “alternative” category is a possible alternative to the main track, e.g. a different take of a song (audio), or a different angle (video).
"main"
You can have a media resource with more than one track. "main" is the primary audio track or the primary video track.
"commentary"
It is possible to have commentary, along side the primary audio. It is possible to have commentary, along side the primary video. That commentary is a track.
""
The empty string is also a category. This can be returned by code, to indicate no explicit kind (category); or the kind given by the track's metadata is not recognized by the user agent.
Text Tracks
Text track can be burnt into video. You can also have a text track in the video resource file, but separated from the video. Whether burnt or separated, as long as it is in the resource file, it is in-band. Text tracks are also in categories (kinds):
subtitles
This is transcription (text) or translation (text) of the dialogue, suitable for when the sound is available but not understood (e.g. because the user does not understand the language of the media resource's audio track).
captions
This is transcription (summary) or translation of the dialogue, sound effects, relevant musical cues, and other relevant audio information, suitable for when sound is unavailable or not clearly audible (e.g. because it is muted, drowned-out by ambient noise, or because the user is deaf).
descriptions
This is textual descriptions of the video component of the media resource, intended for audio synthesis when the visual component is obscured, unavailable, or not usable (e.g. because the user is interacting with the application without a screen while driving, or because the user is blind).
chapters
Among the first pages of a textbook, you see a Table of Contents. The table of contents consists of chapter titles and section titles. In this programming topic, chapter or section means the same thing. The teacher who wrote the textbook can reproduce the whole content as video. When he is teaching a chapter, the chapter title should appear in text form at the bottom of the screen. Chapters can be nested just as sections in a textbook can be nested. That is, you can display a section title; after some time, you erase it and display a sub (nested) section title; after some time, you erase that and display the section title again.
metadata
Metadata is data about data (information about information). You can have a text track that has text data about the video. Such a track is not displayed. Such a track is intended to be used by script (ECMAScript).
A media resource (file) may have one or more of the following: AudioTrackList, VideoTrackList and TextTrackList. A track list is a list (array object) of track identifiers (references).
AudioTrackList
The interface (object) for the audio track list is:
interface AudioTrackList : EventTarget {
readonly attribute unsigned long length;
getter AudioTrack (unsigned long index);
AudioTrack? getTrackById(DOMString id);
attribute EventHandler onchange;
attribute EventHandler onaddtrack;
attribute EventHandler onremovetrack;
};
The following shows how to return (and/or set) the values of the attributes:
media.audioTracks.length
A media element is an audio element or a video element, the reference of which is held by the variable, media. audioTracks is the name of a property (attribute) of the media object (interface). That is why you have a dot between media and audioTracks. The expression, media.audioTracks returns an audio track list (reference). As seen above, length is a property of the AudioTrackList interface (object). So, in the expression, media.audioTracks.length, there is a dot between audioTracks and length.
In the audio track list, length means number of audio tracks in the audio track list memory object.
There is no corresponding HTML element for the AudioTrackList interface (memory object).
audioTrack = media.audioTracks[index]
The different tracks in the audio track list are numbered (indexed) from 0, 1, 2, 3, etc. The expression, media.audioTracks[index] returns the reference of the track determined (identified) by the index, in square brackets (array). The returned value is held by audioTrack, a name of your choice. In the expression, media.audioTracks[2], audioTracks is not a name of your choice (note the s at the end). It is an attribute in the media interface.
audioTrack = media.audioTracks.getTrackById(id)
In this statement, AudioTrackList(id) is a method (function) in the AudioTrackList interface. The argument is the id of the track. The expression, media.audioTracks.getTrackById(id) returns the track based on the id. The id is provided to you by the manufacturer of the resource (file). In the absence of the id, you should use index and the previous statement.
DOM has an audio track object, typed as, AudioTrack (it does not have an s). There is no corresponding HTML element for the AudioTrack interface (memory object). The interface is:
interface AudioTrack {
readonly attribute DOMString id;
readonly attribute DOMString kind;
readonly attribute DOMString label;
readonly attribute DOMString language;
attribute boolean enabled;
};
audioTrack.id
Returns the id of the audio track, which can be used in getTrackById(). However, the id may not be returned (not available).
audioTrack.kind
where audioTrack is the name of your choice above (gotten above), this expression returns the category of the track, as a string.
audioTrack.language
Returns the language of the given track, if known, or the empty string otherwise. It can return en for English, fr for French, de for German, etc.
audioTrack.label
May be your name is John. That identifies you. A label is an identifier, for the user. This expression, returns the label of the given track, if known, or the empty string otherwise.
audioTrack.enabled [ = value ]
Returns true if the given track is active, and false otherwise. A track that is playing is enabled. This attribute can be set, to change whether the track is enabled or not. If multiple audio tracks are enabled simultaneously, they are mixed (you hear all the sounds).
VideoTrackList
The interface (object) for the video track list is:
interface VideoTrackList : EventTarget {
readonly attribute unsigned long length;
getter VideoTrack (unsigned long index);
VideoTrack? getTrackById(DOMString id);
readonly attribute long selectedIndex;
attribute EventHandler onchange;
attribute EventHandler onaddtrack;
attribute EventHandler onremovetrack;
};
The following shows how to return (and/or set) the values of the attributes:
Returns the number of video tracks in the video track list
videoTrack = media.videoTracks[index]
Returns the specified VideoTrack object.
videoTrack = media.videoTracks.getTrackById( id )
Returns the VideoTrack object with the given identifier, or null if no track has that identifier.
Video Track
DOM has a video track object, typed as, VideoTrack (it does not have an s). There is no corresponding HTML element for the VideoTrack interface (memory object). The interface is:
interface VideoTrack {
readonly attribute DOMString id;
readonly attribute DOMString kind;
readonly attribute DOMString label;
readonly attribute DOMString language;
attribute boolean selected;
};
videoTrack.id
Returns the id of the video track, which can be used in getTrackById(). However, the id may not be returned (not available).
videoTrack.kind
where videoTrack is the name of your choice above (gotten above); this expression returns the category of the video track, as a string.
videoTrack.language
Returns the language of the given track, if known, or the empty string otherwise.
videoTrack.selected [ = value ]
With audio, more than one audio track can be enabled. In that case you hear more than one track. This would not really make sense with video, especially for the primary track. So one track (the primary track) should always be selected (enabled).
This expression returns true if the given track is active, and false otherwise. The selected attribute can be set, to change whether the track is selected or not. Either zero or one video track is selected; selecting a new track while a previous one is selected will unselect the previous one.
media.videoTracks.selectedIndex
Returns the index of the currently selected track, if any, or −1 otherwise.
Text Track API
DOM has an interface (object) called, the Text Track API. It is:
interface TextTrackList : EventTarget {
readonly attribute unsigned long length;
getter TextTrack (unsigned long index);
TextTrack? getTrackById(DOMString id);
attribute EventHandler onchange;
attribute EventHandler onaddtrack;
attribute EventHandler onremovetrack;
};
This expression returns the number of text tracks associated with the media element (e.g. from track elements – see later). This is the number of text tracks in the media element's list of text tracks.
media.textTracks[ n ]
Returns the TextTrack object (reference) representing the nth text track in the media element's list of text tracks.
textTrack = media.textTracks.getTrackById( id )
Returns the TextTrack object with the given identifier, or null if no track has that identifier.
track.track
Returns the TextTrack object representing the track element's text track - see later.
The text track API is similar to the AudioTrackList or VideoTrackList interface.
Text Track
DOM has a text track object, typed as, TextTrack (it does not have an s). There is no corresponding HTML element for the TextTrack interface (memory object). The interface is:
enum TextTrackMode { "disabled", "hidden", "showing" };
enum TextTrackKind { "subtitles", "captions", "descriptions", "chapters", "metadata" };
interface TextTrack : EventTarget {
readonly attribute TextTrackKind kind;
readonly attribute DOMString label;
readonly attribute DOMString language;
readonly attribute DOMString id;
readonly attribute DOMString inBandMetadataTrackDispatchType;
attribute TextTrackMode mode;
readonly attribute TextTrackCueList? cues;
readonly attribute TextTrackCueList? activeCues;
void addCue(TextTrackCue cue);
void removeCue(TextTrackCue cue);
attribute EventHandler oncuechange;
};
textTrack = media.addTextTrack( kind [, label [, language ] ] )
Creates and returns a new TextTrack object, which is also added to the media element's list of text tracks – see details later.
textTrack.kind
Returns the text track kind string.
textTrack.label
Returns the text track label, if there is one, or the empty string otherwise (indicating that a custom label probably needs to be generated from the other attributes of the object if the object is exposed to the user).
textTrack.language
Returns the text track language string.
textTrack.id
Returns the ID of the given track.
textTrack.mode [ = value ]
Returns the text track mode, represented by a string from the following list:
"disabled"
The text track disabled mode.
"hidden"
The text track hidden mode.
"showing"
The text track showing mode.
Can be set, to change the mode.
Mozilla Firefox Browser
Today, the media files that would play in the HTML media elements (video and audio), are files whose extensions are: .wav and .wave. A file type with the extension, .org, should also work. For many other file types, you will need a plugin and the use of the HTML embed element.
Windows operating system has an application, called Windows Movie Maker. It may be possible to use this application to create a .wave file.
That is it for this part of the series. We stop here and continue in the next part
Chrys
Related Links
DOM Basics for HTMLDOM Event Basics for HTML
HTML Text and Other Elements in DOM
HTML Grouping and Sectioning Content Elements in DOM
DOM and HTML Embedded Content
HTML Canvas 2D Context
More Related Links
PurePerl MySQL API
Major in Website Design
Web Development Course
Producing a Pure Perl Library
BACK NEXT