Browsing video content
Peter J. Macer
Hewlett Packard Research Laboratories
Filton Road
Stoke Gifford
Bristol BS12 6QZ
pejm@hplb.hpl.hp.com
Peter J. Thomas
Centre for Personal Information Management
Faculty for Computer Studies and Mathematics
University of the West of England
Bristol BS16 1QY
Peter.Thomas@uwe.ac.uk
Extended Abstract
One of the major problems experienced by Web users is the amount of time needed to download data. As the speed and power of the typical desktop computers have increased, it has become possible for almost anyone with access to a PC to produce web content featuring large images, audio, and video. Therefore, even though the available bandwidth of the Internet is increasing, the bandwidth requirements of the media available through it, and the number of users trying to access that information are also increasing. These factors combine to ensure that the effective bandwidth available per person is still low enough to make the downloading of some Web content take a long time.
The browsing model on which the web is based exacerbates this problem. Because hyperlinks are merely references to the new information the user often does not know exactly what that new information is until it has been fully downloaded. This is not a problem if the new information can be accessed almost immediately and its relevance can be quickly assessed since the user can simply step back to the previous web page if this new page is not what was required. However, if the data takes many minutes, or even hours, to download and its content cannot be quickly assessed, the browsing model begins to break down, and finding the required information can become a time consuming, expensive, and frustrating task.
In the case of one type of web content – video – the problems become more pronounced. The large volume of raw data required for digital video sequences makes accessing video sequences from remote servers a relatively time consuming and potentially expensive task, with even small sequences taking many minutes or even hours to download, particularly if the server and client are geographically distant.
It is therefore important to ensure that the video data being retrieved by the user is exactly that which meets their requirements. Downloading more - or different - data than that which is actually required, is more expensive in the case of video than with virtually any other medium. With other media requiring large data structures, such as images, a smaller version of the image - a ‘thumbnail’ - is often presented to the user before the full image is transferred across the communications channel, in order that they might confirm that the image truly meets their requirements. With video, however, the overhead of generating and/or storing a reduced frame size version of a sequence is prohibitive. Furthermore, the data size of a ‘thumbnail’ video sequence will still generally be considerably larger than can be transmitted over a typical network in a matter of seconds. Simply reducing the spatial dimensions is not sufficient.
This paper describes an approach to accessing video over the web using a ‘storyboard’ representation (see also Macer, P. and Thomas, P, (1996a, 1996b); Macer, P., Thomas, P. Chalabi, N. and Meech, J., (1996) for an early discussion of the approach), which automatically selects video frames to best represent video content. Because the data size of a storyboard representation is negligible when compared to that of the video sequence itself, each frame of the storyboard may be stored together with information about the start and finish frame numbers of the shot which it represents with negligible additional storage requirements. By presenting a storyboard representation of the video sequence to the user as a preview of a video sequence, the user will be able to quickly assess which parts of the video sequence are required, and download only those parts of the sequence.
The system described here – ‘Rosetta’ – can be used to automatically generate an web-based representation displaying any part of the storyboard in a number of different possible formats including arbitrarily complex user interaction capabilities made possible using Java applets. Once a browser has downloaded and displayed an HTML page showing the storyboard, the users may look through the sequence of still images and decide if they require any additional representations of each shot to be downloaded. A larger version of the representative frame, the audio track associated with the shot, or the full-motion digital video clip itself may also be downloaded.
References
Macer, P. and Thomas, P. (1996a) From Video Sequence to Comic Strip: Summarising Video for Faster WWW Access. Proceedings of 3D Graphics and Multimedia on the Internet, WWW and Networks, British Computer Society Computer Graphics & Displays Group, 16-18 April 1996.
Macer, P. and Thomas, P. (1996b) Video Storyboards: Summarising Video Sequences for Indexing and Searching of Video Databases. IEE E4 Colloquium Digest on Intelligent Image Databases, Savoy Place Wednesday, 22 May 1996.
Macer, P., Thomas, P. Chalabi, N. and Meech, J. (1996) Finding the Cut of the Wrong Trousers: Fast Video Search Using Automatic Storyboard Generation. Proceedings of CHI'96 Human Factors in Computing Conference, Vancouver, April 1996.