1.1 What is meant by Multimedia Data?
A number of data types can be characterized as multimedia data types. These data types are normally the essentials for the building blocks of core multimedia environments, platforms and integrating tools. The basic types can be described as text, images, audio, video and graphic objects. Following is a detailed explanation for the same.
Text can be stored in a variety of different forms. In addition to American Standard Code for Information Interchange (ASCII) based files, text is usually stored in spreadsheets, annotations, processor files, databases and common multimedia objects. The task of text storage is becoming more and more complex due to the easy availability and abundance of Graphical User Interfaces (GUIs) and text fonts, permitting unique effects such as text color, text shade etc.
Digitalized images are nothing but a string of pixels that signify an area in the user’s graphical exhibit. There is an immense variation in the quality and dimension of storage for motionless images. For motionless (still) images, the space overhead varies with respect to complexity, size, resolution and compression format used to store any given image. The frequently used and accepted image formats (file extensions) consist of bmp, jpeg, tiff and png.
Audio, being another frequently used data type is relatively space intensive. A minute of sound takes up to 3 Megabytes (MB) of space. Numerous methods can be deployed to compress an audio into suitable formats.
Another data type which consumes majority of space is categorized as the digitalized video data type. Videos are normally stored as a series of frames, the capacity of which depends on its resolution. A solo video frame can take up to 1 MB of space. Continuous transfer rate is needed to get a reasonable video playback with its proper transmission, compression, and decompression.
This data type consists of unique data structures that can define 2D and 3D shapes which further helps in defining multimedia objects. Today one can use different formats for image applications and video-editing applications. To list few examples Computer Aided Design (CAD) and Computer Aided Manufacturing (CAM) are graphic objects
1.2 How is Multimedia Data Different?
Theoretically multimedia data should be considered like any regular data based on the data types for instance numbers, dates and characters. Though, there are a few challenges that arise from multimedia as described in :
- Multimedia data is usually captured with various unreliable capturing techniques such as image processing. These multimedia processing techniques require capabilities for handling these various available methods of capturing content, this includes both automated and manual methods.
- In multimedia database, the queries created by the user rarely come back with textual answer. To a certain extent, the answer to user query is a compound multimedia presentation that the user can glance through at one’s leisure.
- The size of the multimedia data being large not only affects the storage, retrieval but also the transmission of data.
- Time to retrieve information may be vital while accessing video and audio databases, for example Video on Demand.
- Automatic feature extraction and Indexing: User explicitly submits the attribute values of objects inserted into the database in contrast to advanced tools with conservative databases, such as image processing and pattern recognition tools for images to extract the various features and content of multimedia objects. Special data structures for storage and indexing are needed due to the large size of data.
1.3 Basic Approaches for Data Retrieval
Data management is being implemented since long. Many approaches have also been invented for the same to manage and inquire various types of data in the computer systems. The commonly used approaches for data management comprise of conventional database system, information retrieval system, content based retrieval system and graph/ tree pattern matching. The details for the same are as follows:
Conventional database system
It is the most extensively used approach to manage as well as investigate structured data. Data in a database system must match to some predefined structures and limitations (schema’s). The user should specify the data objects to be retrieved and the tables from which data has to be extracted. The user also has to predicate on which the retrieval of data will be based to formulate a database query. SQL, a query language has a restricted syntax and vocabulary that can be used for such databases.
Information retrieval (IR) system
This system is prominently used to search enormous text collections; where in the content of the data (text) is illustrated with the help of an indexer using keywords or a textual summary. The query demands are expressed in terms of keywords or natural language. For instance, searching for an image or video, the user is required to describe using words and also need means to store large amount of metadata in textual form.
Content based retrieval (CBR) system
This approach facilitates in the retrieval of multimedia objects from an enormous collection. The retrieval is based on various features such as color, texture and shape which can be extracted automatically from the objects. Though keyword can be considered a feature for textual data, conventional retrieval of information has a higher performance as compared to content-based retrieval.
This is due to the fact that keyword has the demonstrated ability to characterize semantics while no other features have revealed convincing semantic describing capability. A key disadvantage of this particular approach is its lack of accuracy.
Graph or tree pattern matching
This particular approach seeks the retrieval of object sub-graphs from an object graph as per several designated patterns.
Data Structures for Multimedia Storage
Many modern database applications deal with large amounts of multidimensional data. Multimedia content-based retrieval is one of the examples. Access Methods are essential in order to deal with multidimensional data efficiently. They are used to access selective data from a big collection.
2.1 Importance of Access Methods
Efficient spatial selection support is the key purpose of access methods. These include range queries or nearest neighbour queries of spatial objects. The significance of these access methods and how they take into account both clustering techniques and spatial indexing is described by Peter Van Oosterom . In the absence of a spatial index, every object in the database needs to be checked if it meets the selection criteria. Clustering is required to group the objects that are often requested together. Or else, many different disk pages will have to be fetched, resulting in a very slow response.
For spatial selection, clustering implies storing objects that are not only close in reality but also close in computer memory instead of being scattered all over the whole memory.
In conventional database systems sorting the data is the basis for efficient searching. Higher dimensional data cannot be sorted in an obvious manner, as it is possible for text strings, numbers, or dates. Principally, computer memory is one-dimensional. However, spatial data is 2D, 3D or even higher and must be organized someway in the memory. An intuitive solution to organize the data is using a regular grid just as on a paper map. Each grid cell has a unique name e.g. ’A1’, ’C2’, or ’E5’. The cells are stored in some order in the memory and can each contain a fixed number of object references. In a grid cell, a reference is stored to an object whenever the object overlaps the cell. However, this will not be very efficient due to the irregular data distribution of spatial data because of which many cells will be empty while many others will be overfull. Therefore, more advanced techniques have been developed.
2.2 kd Trees
A kd-tree or a k-dimensional tree is a space-partitioning data structure used for organizing points in a k-dimensional space. kd-trees are a useful for several applications such as searches involving a multidimensional search key like range searches and nearest neighbour searches. Kd-trees are a special case of Binary Space Partitioning (BSP) trees.
A kd-tree only uses splitting planes that are perpendicular to one of the coordinate axes. This is different from BSP trees, in which arbitrary splitting planes can be used. In addition to this, every node of a kd-tree, from the root to the leaves, stores a point. Whereas in BSP trees, leaves are typically the only nodes that contain points. As a consequence, each splitting plane must go through one of the points in the kd-tree. 
2.2.1 Addition of elements to kd trees
A new point is added to a kd tree in the same way as one adds an element to any other tree. At first, traverse the tree, starting from the root and moving to either the left or the right child depending on whether the point to be inserted is on the left or right side of the splitting plane. Once you get to a leaf node, add the new point as either the left or right child of the leaf node, again depending on which side of the node’s splitting plane contains the new point.
2.2.2 Deleting from kd trees
Deletion is similar as in Binary Search Tree (BST) but slightly harder.
Step1 find node to be deleted.
Step2 two cases must be handled:
(a) No children – replace pointer to node by NULL
(b) Has children – replace node by minimum node in right subtree. If no right subtree exists then first move left subtree to become right subtree. 
Each node of a quad-tree is associated with a rectangular region of space. The top node is associated with the entire target space. Each non-leaf node divides its region into four equal sized quadrants, likewise, each such node has four child nodes corresponding to the four quadrants and so on. Leaf nodes have between zero and some fixed maximum number of points.
2.3.1 Simple definition of node structure of a point quad-tree
qtnodetype = record
NW, SW, NE, SE: *qtnodetype
Here, INFO is some additional information regarding that point .
XVAL, YVAL are coordinates of that point.
NW, SW, NE, SE are pointers to regions obtained by dividing given region. 
2.3.2 Common uses of Quad-trees
- Image Representation
- Spatial Indexing
- Efficient collision detection in two dimensions
- Storing sparse data, such as formatting information for a spreadsheet or for some matrix calculations.
2.3.3 Representing Image Using Quad-tree: 
Let us suppose we divide the picture area into 4 sections. Those 4 sections are then further divided into 4 subsections. We continue this process, repeatedly dividing a square region by 4. We must impose a limit to the levels of division otherwise we could go on dividing the picture forever. Generally, this limit is imposed due to storage considerations or to limit processing time or due to the resolution of the output device. A pixel is the smallest subsection of the quad tree.
To summarize, a square or quadrant in the picture is either :
- entirely one color
- composed of 4 smaller sub-squares
To represent a picture using a quad tree, each leaf must represent a uniform area of the picture. If the picture is black and white, we only need one bit to represent the colour in each leaf; for example, 0 could mean black and 1 could mean white. Now consider the following image : The definition of a picture is a two dimensional array, where the elements of the array are colored points.
Figure 2.3: First three levels of quad-tree
Figure 2.4: Given Image
This is how the above image could be stored in quad-tree.
Figure 2.5: 8×8 pixel picture represented in a quad-tree
Figure 2.6: The quad tree of the above example picture. The quadrants are shown in counterclockwise order from the top-right quadrant. The root is the top node. (The 2nd and 3rd quadrants are not shown.)
2.3.4 Advantages of Quad-trees:
- They can be manipulated and accessed much quicker than other models.
- Erasing an image takes only one step. All that is required is to set the root node to neutral.
- Zooming to a particular quadrant in the tree is also a one step operation.
- To reduce the complexity of the image, it suffices to remove the final level of nodes.
- Accessing particular regions of the image is a very fast operation. This is useful for updating certain regions of an image, perhaps for an environment with multiple windows.
The main disadvantage is that it takes up a lot of space.
R-trees are N-dimensional extension of Binary trees, but are used for spatial access methods i.e., for indexing multi-dimensional information. They are supported in many modern database systems, along with variants like R+ -trees and R*-trees. The data structure splits space with hierarchically nested, and possibly overlapping, minimum bounding rectangles.
A rectangular bounding box is associated with each tree node. 
- Bounding box of a leaf node is a minimum sized rectangle that contains all the rectangles/polygons associated with the leaf node.
- Bounding box associated with a non-leaf node contains the bounding box associated with all its children.
- Bounding box of a node serves as its key in its parent node (if any)
- Bounding boxes of children of a node are allowed to overlap.
2.4.1 Structure of an R-tree node
rtnodetype = record
Rec1, ….Reck : rectangle
P1, ….Pk : ∗rtnodetype
A polygon is stored in one node, and the bounding box of the node must contain the polygon. Since a polygon is stored only once, the storage efficiency of R-trees is better than that of k-d trees or quad-trees.
The insertion and deletion algorithms use the bounding boxes from the nodes to ensure that close by elements are placed in the same leaf node. Each entry within a leaf node stores two-pieces of information; a way of identifying the actual data element and the bounding box of the data element.
2.4.2 Inserting a node
1. Find a leaf to store it, and add it to the leaf.
- To find leaf, follow a child (if any) whose bounding box contains bounding box of data item, else child whose overlap with data item bounding box is maximum
2. Handle overflows by splits. We may need to divide entries of an overfull node into two sets such that the bounding boxes have minimum total area.
2.4.3 Deleting a node
1. Find the leaf and delete object; determine new MBR.
2. If the node is too empty:
- Delete the node recursively at its parent
- Insert all entries of the deleted node into the R-tree
2.4.4 Searching R-trees
Similarly, for searching algorithms, bounding boxes are used to decide whether or not to search inside a child node. Here we need to find minimal bounding rectangle. In this way, most of the nodes in the tree are never touched during a search.
- If the node is a leaf node, output the data items whose keys intersect the given query point/region
- Else, for each child of the current node whose bounding box overlaps the query point/region, recursively search the child.
2.5 Comparison of Different Data Structures 
- k-d trees are very easy to implement. However, in general a k-d tree consisting k nodes may have a height k causing complexity of both insertion and search in k-d trees to be high. In practice, path lengths (root to leaf) in k-d trees tend to be longer than those in point quad-trees because these trees are binary.
- R-trees have a large number of rectangles potentially stored in each node. They are appropriate for disk access by reducing the height of the tree, thus leading to fewer disk access.
- The disadvantage of R-trees is that the bounding rectangle associated with different nodes may overlap. Thus when searching an R-tree, instead of following one path (as in case of quad-tree), we might follow multiple path down the tree. This difference grows even more acute when range search and neighbour searches are considered.
- In case of point quad-trees, while performing search/insertion each case requires comparisons on two coordinates. Deletion in point quad-trees is difficult because finding a candidate replacement node for the node being deleted is not easy.
Metadata is data about data. Any data that is used to describe the content, condition, quality and other aspects of data for humans or machines to locate, access and understand the data is known as Metadata. Metadata helps the users to get an overview of the data.
3.1 Need of Metadata
The main functions of metadata can be listed as follows: 
To describe and identify data sources. These descriptions help create catalogs, index, etc., thereby improving access to them.
Formulation of queries.
To provide information to help manage and administrate a data source, such as when and how it was created, and who can legally access it.
To facilitate data archival and preservation like data refreshing and migration, etc.
To indicate how a system functions or metadata behaves, such as data formats, compression ratios, scaling routines, encryption key, and security, etc.
To indicate the level and type of use of data sources like multiversion, user tracking, etc.
3.2 Metadata in the Life Cycle of Multimedia Objects
A multimedia object undergoes a life cycle consisting of production, organization, searching, utilization, preservation, and disposition. Metadata passes through similar stages as an integral part of these multimedia objects :
Objects of different media types are created often generating data of how they were produced (e.g., the EXIF files produced by digital cameras) and stored in an information retrieval system. Associated metadata is generated accordingly for administrating and describing the objects.
Multimedia objects may be composed of several components. Metadata is created to specify how these compound objects are put together.
Searching and retrieval
Created and stored multimedia objects are subject to search and retrieval by users. Metadata provides aids through catalog and index to enable efficient query formulation and resource localization.
Retrieved multimedia objects can be further utilized, reproduced, and modified. Metadata related to digital rights management and version control, etc. may be created.
Preservation and disposition
Multimedia objects may undergo modification, refreshing, and migration to ensure their availability. Objects that are out-of-date or corrupted may be discarded. Such preservation and disposition activities can be documented by the associated metadata.
3.3 Classification of Metadata
Metadata directly affects the way in which objects of different media types are used. Classifying metadata can facilitate the handling of different media types in a multimedia information retrieval system. Based on its (in)dependence on media contents, metadata can be classified into two kinds, namely content independent and content-dependent metadata :
- Content-independent metadata provides information which is derived independently from the content of the original data. Examples of content independent metadata are date of creation and location of a text document, type-of-camera used to record a video fragment, and so on. These metadata are called descriptive data.
- Content-dependent metadata depends on the content of the original data. A special case of content-dependent metadata is content-dependent descriptive metadata , which cannot be extracted automatically from the content but is created manually: annotation is a well-known example. In contrast, content-dependent non-descriptive metadata is based directly on the contents of data.
3.4 Image metadata
Some of the image files containing metadata include Exchangeable image file format (EXIF) and Tagged Image File Format (TIFF).
Having metadata about images embedded in TIFF or EXIF files is one way of acquiring additional data about an image. Image metadata are attained through tags. Tagging pictures with subjects, related emotions, and other descriptive phrases helps Internet users find pictures easily rather than having to search through entire image collections.
A prime example of an image tagging service is Flickr, where users upload images and then describe the contents. Other patrons of the site can then search for those tags. Flickr uses a folksonomy: a free-text keyword system in which the community defines the vocabulary through use rather than through a controlled vocabulary.
Digital photography is increasingly making use of metadata tags. Photographers shooting Camera RAW file formats can use applications such as Adobe Bridge or Apple Computer’s Aperture to work with camera metadata for post-processing. Users can also tag photos for organization purposes using Adobe’s Extensible Metadata Platform (XMP) language, for example. 
3.5 Document metadata
Most programs that create documents, including Microsoft PowerPoint, Microsoft Word and other Microsoft Office products, save metadata with the document files. These metadata can contain the name of the person who created the file, the name of the person who last edited the file, how many times the file has been printed, and even how many revisions have been made on the file. Other saved material, such as document comments are also referred to as metadata.
Document Metadata is particularly important in legal environments where litigation can request this sensitive information which can include many elements of private detrimental data. This data has been linked to multiple lawsuits that have got corporations into legal complications. 
3.6 Digital library metadata
There are three variants of metadata that are commonly used to describe objects in a digital library:
- descriptive – Information describing the intellectual content of the object, such as cataloguing records, finding aids or similar schemes. It is typically used for bibliographic purposes and for search and retrieval.
- structural – Information that ties each object to others to make up logical units e.g., information that relates individual images of pages from a book to the others that make up the book.
- administrative – Information used to manage the object or control access to it. This may include information on how it was scanned, its storage format, copyright and licensing information, and information necessary for the long-term preservation of the digital objects. 
Basic text comprises of alphanumeric characters. Optical character recognition (OCR) practices are deployed to translate analog text to digital text. The most common digital representation of characters is the ASCII code. For this, seven bits are required (eight bits might be used, where in the eighth bit is reserved for a special purpose) for each character. Storage space for a text document that is required is equivalent to the number of characters. For instance, a 15 page text document consisting of about 4000 characters generally consumes 60 kilobytes.
Now days, structured text documents have become extremely popular. They comprise titles, chapters, sections, paragraphs, and so forth. A title can be presented to the user in a different format than a paragraph or a sentence. Different standards are used to encode structured information such as HTML and XML (hyper text markup language and extensible markup language)
There are different approaches like Huffman and Arithmetic Coding, which can be used for text compression, but as the storage requirements are not too high, these approaches are not as important for text as they are for multimedia data. 
4.1 Text Documents
A text document consists of identification and is considered to be a list of words. Likewise, a book is considered to be a document, and so is a paper in the events of a conference or a Web page. The key identification used for a book may be an ISBN number or the title of the paper together with the ISBN number of the conference event or a URL for a Web page.
Retrieval of text documents does not normally entail the presentation of the entire document, as it consumes a large amount of space as well as time. Instead, the system presents the identifications of the chosen documents mainly along with a brief description and/or rankings of the document.
Indexing refers to the derivation of metadata from their documents and storage in an index. In a way, the index describes the content of the documents. The content can be described by terms like social or political for text documents. Also, the system utilizes the index to determine the output during retrieval.
The index can be filled up in two ways, manually as well as automatically. Assigned terms can be added to documents as a kind of annotation by professional users such as librarians. These terms can be selected often from a prescribed set of terms, the catalog. A catalog describes a certain scientific field and is composed by specialists. One of the main advantages of this technique is that the professional users are aware of the acceptable terms that can be used in query formulation. A major drawback of this technique is the amount of work that has to be performed for the manual indexing process.
Document content description can also be facilitated automatically resulting in what are termed as derived terms. One of the many steps required for this can be a step in which words in English text are identified by an algorithm and then put to lower case. Basic tools are used in other steps such as stop word removal and stemming. Stop words are words in the document which have a little meaning and most of the times include words like the and it. These stop words are erased from the document. Words are conflated to their stem in the document through stemming. As an example, the stemmer can conflate the words computer, compute and computation to the stem comput.
4.3 Query Formulation
Query formulation refers to the method of representing the information need. The resultant formal representation of information is the query. In a wider perspective, query formulation denotes the comprehensive interactive dialogue between the system and the user, leading to both a suitable query and also a better understanding by the user of the information need. It also denotes the query formulation when there are no previously retrieved documents to direct the search, thus, the formulation of the preliminary query.
It is essential to differentiate between the expert searcher and the relaxed end user. The expert searcher is aware of the document collection and the assigned terms. He/ she will use Boolean operators to create the query and will be able to adequately rephrase the same as per the output of the system. In case the result is too small, the expert searcher must expand the query, and in case if the result is too large, he/she must be able to make the query more restrictive.
The communication of the need for information to the system in natural language interests the end user. Such a statement of the need for information is termed as a request. Automatic query formulation comprises of receiving the request and generating a preliminary query by the application of algorithms that were also used for the derivation of terms. In general, the query consists of a list of query terms. This list is accepted by the system and it composes a result set. The system can formulate a successive query based on this relevant feedback.
The matching algorithm is mainly the most important part of an information retrieval system. This algorithm makes a comparison of the query against the document representations in the index. In the exact matching algorithm, a Boolean query, which is formulated by an expert searcher, defines precisely the set of documents that satisfy the query. The system generates a yes or a no decision for each document.
In the case of an inexact matching algorithm, the system delivers a ranked list of documents. Users can traverse this document list to search for the information they need. Ranked retrieval puts the documents that are relevant in the top of the ranked list, thus, saving the time the user has to invest on reading those documents. Simple but effective ranking algorithms make use of the frequency allocation of terms over documents. Ranking algorithms that are based on statistical approaches, halve the time the user has to spend on reading those documents.
Digital images can be defined as an electronic snapshot scanned from documents or taken of a scene, for example printed texts, photographs, manuscripts, and various artworks.
Digital image is modeled and mapped as a grid of dots, pixels or commonly known picture elements. A tonal value is allocated to each of these pixels, which can be black, white, and shades of gray or color. Pixel itself is symbolized in binary code of zeros and ones. Computer stores these binary digits or bits corresponding to each pixel in a sequence and are later reduced to mathematical representation by compressing them. After compression these bits are interpreted and read to generate an analog output by the computer for display or printing purposes.
Figure 5.1: As shown in this bitonal image, each pixel is assigned a tonal value, in this example 0 for black and 1 for white.
To further describe the grayscale of a pixel one needs to say that one byte is of eight bits. For a color pixel one needs three colors of one bye each, these colors are red, green and blue. So, for a rectangular screen one can compute the amount of data required for the image using the formula:
A = xyb
Where A is the number of bytes needed,
x is the number of pixels per horizontal line,
y is the number of horizontal lines, and
b is the number of bytes per pixel.
Using this formulae for a screen with value of x being 800, y being 600, and for b being 3; A=xyb thus A = 1.44 Mbyte.
Compression is required for this significant amount of data. Image compression is based on exploiting redundancy in images and properties of the human perception. Pixels in specific areas appear to be similar; this concept of similarity is called Spatial Redundancy. Human’s views of images are tolerant r