Mobile Multimedia Application for the Deaf with Facial Frame Animations with an Enhancement of Hand-Gestures

do not necessarily reflect the views of UKDiss.com.

AbstractThere are many advancements in the mobile applications field. Due to the drawbacks of the previous modes of communication that the deaf and APDs used in the past, present advancements helped. This paper outlines a mobile multimedia application for deaf users, which converts audio to facial animations using the frames, which are used to maintain the quality of animations in a 2D platform. Polygon topology is used for creating animated sequences. In this paper, the research deals with showing efficiency of hand gestures over lip reading.

Keywords—Speech recognition; Face animations; feature points; face mapping; operator Susan; MPEG4 animation; hand gestures; APD;HMM; HGR;

1.Introduction

The mobile applications being the most integral part in the society, have created an intelligent interaction between the hearing and machines but have failed to give the same to the disabled. The most important factor being the audio-visual interaction. Applications in the mobile for voice calling, “FaceTime” [2] in video calling have made massive evolutions in the history of technology but the disabled could not make a much advantageous progress in this evolution. In the past, the deaf and (Auditory Processing Disorder) APDs [11] relied more on services like the Short messaging service(SMS) which is the most common mode of communication between the deaf or APDs and the hearing. Other services like Telephone typewriter(TTY), FAX, email and relay services played a key role as a mode of communication too [1]. A point to be noted is that these applications had some impacts and issues [1]. The deaf and APDs had faced difficulties relying on face to face communications since the hearing had to have knowledge on sign language. Since this is not practically possible, many technological advancements took place to replace the existing modes of communication. In this paper, we have outlined the tools that have supported in the development of a multimedia application that can be identified as a prototype, beneficial in changing audio into face animations. Here we are considering a case of deaf people who have abilities to just understand speech based on lip-reading only.

The multimedia applications have been recently developed specially for the hard to hear and these have further benefits to be implemented in the mobile environment with a limit in memory and power to compute and being featured as an independent operation. Now that the memory capacity of the mobile’s sets is very high compared to the olden days, they can hold up many applications. In this report the research study is about a mobile multimedia application for the deaf users which is used to convert audio to animations. The multimedia application that has been researched is a very well developed practice for deaf people that aids to convert the audio signals that have been spoken into an animated speaking face.  The basic operation of this application is that it converts the audio multimedia which is the speech to a frame animation and techniques are used to combine these frames on a mobile platform.

The polygon topology is used to create parameter values to make these animated sequences from the frames. The first phase happening here is the audio occurring. This application works poorly in a noisy environment such as the pronunciation of ‘b’ and ‘p’ which is one of the confused pairs in speech signals if there is noise around such as car honking etc., The basic audio conversion to animation happens with an audio speech recognition system which uses relevant phonetic details of the tongue and the teeth modelling. According to our research the speech conversion happens using ASR (Automatic Speech Recognition) [17] for platforms in mobile device. These systems use these applications into synthesizing the speech and incorporating it into the mobile devices. The duration of the phenomes in the speech are taken as an input string [9]. Acapela works for individuals with disorders for capturing the voice [10]. The speech waveforms which are unprocessed are not enough to predict what kind of facial motions are produced and therefore its having an importance of phonetic perception [6]. Hidden Markov models are used as a tool in the recognition of speech [3]. Since 1994 the modelling of facial expressions and its motions are referred in[18][19][20].In order to give an accurate synthesized video animation. Using 3-D polygonal points. There are strategies to be developed to properly combine the information in such a way that they improve the performance, which requires the integration of the audio information and the visual information also combining with the extraction of visual features. There is image based models (2-D models or 3-D models) and they are many methods which are traditional for producing a facial animation from a speech which includes phonetic language using rule based methods, non-linear methods, HMM [4]. Using the MPEG-4 standard [5]. The 2-D model face which is animated in such a way that it speaks the audio speech that was given as input to the mobile application. Operator Susan is used to extract the edges and corners of a feature area from the speech phonetics [21]. The interpolation of the key face is selected from the records of the professional lip-speakers [12][13][14].

Figure 1.Working of the speech conversion [19]

However, as this paper deals with mouth shapes which are modelled with the conversion of audio it was difficult to communicate. Humans mainly look at the face and the representing gestures during any natural patterns of language. The paper presented, focuses on the advancements in the field of both facial animations along with sign language recognition.

  1. PROBLEM IDENTIFICATION

The background to specify the problem by its architecture and its development which is used to address the identified problem by considering the issues connecting to this application’s current state implementation. Background- Since 1982, the models for facial animation have been developed [7] using the feature extraction points which are pointed on the facial image. These feature points are located based on front view. If on a 2D image, the chin points are located on the X axis with the corners of the mouth on the front view and the Y axis with the points located on the line of the chin which is the front view. The problem identified is that the feature points provided are based on a front view on X axis and Y axis of a 2-D facial image. This problem has been identified in the multimedia application that has been developed for deaf users specifically where the usage of 2-D frames to construct the video animated sequence of frames [15]. In this paper, the representation of an application has been implemented using 2-D animations [16] as they tested whether to use a 2-D or a 3D and the testing process has been done wherein they have collected frames in a 2-D base through the front view and by performing a 3-D testing simultaneously this was done by collecting frames using motion cameras from the front view and the side view by interpolating the points in them by representing them in three coordinates x, y and z.

The paper resulted that using a 2-D was a more simplified process compared to a 3-D frame because 3-D was expensive, also the collected images from multiple camera consistencies of image was lacking leading to inaccuracy of the points on x and y coordinates [15]. The mouth movement which is implanted as a simplest solution is the polygon which is defined by a matrix [3] The Lip reading is used to design the 2-D motion and because of which the mouth polygon is more important. The MPEG -4 standard shows the working of a moving mouth polygon by representing the parameters on the X and Y coordinates. Enhancements can be done by implementing the morphing technique of the individual points.

Figure 2.Facial Animation parameters on the lips[19]

The fundamental obstacle is performing this application in 2-D as this system requires quality in extracting visual features and creating an efficient video animation in order to have higher success rate but because of the 2-D the feature points are represented only as P(X1,Y1) which can degrade the overall performance and may not function as idle as possible. The efficiency of the algorithm that is used to create the facial animations.

Facial animations are itself not enough to understand the communication, so we need to make some enhancements to improve the understanding. As facial animations give a partial communication which is not efficient enough to give a complete medium between the device application and the deaf. This is because mouth shapes modelling is difficult to recognize in some situations of the deaf, but still in some cases mouth modelling give a simplicity in providing the additional kind of information that is not irredundant to the gesture information of the hand and which does not require the need to understand the sign.

2.RELATED WORK

Until now our research concentrated on how deaf people communicated in past, how they drastically adapted to the upcoming technologies. Even though they adapted to technologies they were not very comfortable in using them. They were some narrow disturbances between the applications and the deaf. To overcome these disturbances many came up with ideas apt to deaf. Starting from Lip reading, facial animation and now sign language playing the key role in new systems. As stated by research in American sign language [35]. Gesture recognition systems are playing the vital roles in the lives of disabled. Helping them to cope up with the evolving technologies. As the work that has been related to these gestures by using several algorithms such as EMG surface features implantation for gesture recognition in American sign language [34]. A research in 2007, that has been obtained using an approach that implements the posture recognition for a real-time performance with algorithms that give high recognition accuracy based on CFG (Context free grammar) synthesis with HAAR like features and AdaBoost algorithm [36] as referred by Somayeh. The hand gesture recognition was modelled using Bayesian dynamic networks. The result of this application gave 90% accuracy of gestures [37] by using SMP algorithm.

As the first is to explain the differences between the phonemes that have been spoken to their visemes [3] that are corresponding to that particular mouth shape and how well nowadays phonemes, lip reading and the corresponding visemes that are well established in the recognition system as these all context to the audio and visual speech recognition system [19]. Facial recognition system is just another field which is related to the algorithms that have been discussed by Tian in 2001 related to corresponding mouth shapes for instance, a model that recognizes the actions in its different states such as the mouth is open, partially closed, closed[33].

In our research work we have first highlighted the importance of the algorithm, HMM [7]. In which we model the spoken audio to their corresponding phonemes. The audio is represented in acoustic vectors which are given thus to their phonemes. This algorithm is used when there is a statistical data available. The sequence of movements is taken into consideration and every single movement is treated as a hidden state. The state will not be visible to the observer but the output which is dependent on the state is visible, which are mainly used in speech recognition, handwriting, gesture recognition which have fast pace of actions.

After this alignment of phoneme vectors which are to be represented to their corresponding visemes [3]. In this we use a three state HMM where the state of HMM represents the probability that is to be represented to a Gaussian density which is single to a matrix of covariant in a diagonal form. These states have a structure that it must be strict left to its strict right. Where visemes are represented in vector form mapping them to their phoneme using the k-clustering methods mapping them on one-to-many basis identifying their cluster as a feature [6]. Smallest Univalue Segment Assimilating Nucleus operator is chosen to remove noise and helps in filtering the speech. The cover image finding, edge finding and corner finding, Non-linear filtering is used to define which parts of the image are closely related to individual pixels. As we must extract the edges and corners as points of a local feature area, this operator’s principle is useful in masking a circle area of a single point along with its radius, r and then observe that every single point in that image has consistency of this single point with all other points that are contained in the area that is masked. Comparing operator Susan to others such as SOBEL. CANNY, this is more important to extract the features of the mouth and eyes on the face specially in locating the corner points of the mouth and eye automatically.

The feature extraction is performed in two stages that are doing it manually by experimentation of tracking the full data set provided and the second method being ROI which is the region of interest. This feature extraction technique is an automatic technique which is done by tracking the speaker’s nostrils and using these positions to detect the mouth and the eyes. This technique using cappelletta and harte. The feature extraction that is required in this system that is based on acoustic modelling that is, dealing with the viseme vectors and the phoneme vectors which have relatively been mapped and created a dimensionality to their feature vectors. One of the most popular methods for this feature extraction based on the time domain and the distance among the clusters which points to a feature is obtained using the BAUM-Welch Dynamic Bayesian Network Inversion.

The next focus is on the BAUM-Welch Dynamic Bayesian Network Inversion which states that the parameters of HMM in decoding the noisy data from a stream of data are found. The process of decoding information without knowing the entire data stream parameters can be called as reverse engineering. It has many applications in solving HMMs in the analysis of speech. The process is performed in step-wise procedures as we convert a set of observations that have been correlated to the possible variables into values that are a set of uncorrelated variables which are represented as the components that are principle by nature.

In this paper, the presentation of the facial animation that is the framework which is implemented on web and mobile platforms This system generates lip movements on a facial animation basis using MPEG [Moving Pictures Experts Group] 4 which is frame interpolation. In this standard, there are many tools for representing these animations as a rich multimedia content. Using animation parameters to the facial animation that is controlled by 68 FAPs that are specified on human face. Now we go onto the next enhancement which is using hand gestures as given in the solution.

3.PROPOSED SOLUTION & MATHEMATICAL MODEL

3.1 PROPOSED SOLUTION

As discussed in the problem identification regarding the  Multimedia application for deaf users that has been criticized for its low computational complexity that is just the mapping of 2-D and 3-D using minimum acoustic vectors to its feature points, leading to less recognition rate and more processing time by giving a bad classification performance and also the need for high memory requirement that needs further processing of audio and video dataset in the process of creating a new feature dataset as an input to the animation interpolation.

As we all know that gestures by the hand and the expression by the face and the language that is spoken are the natural forms that are mostly used for communication that is inter-personal. Due to this fact, the methods that are conventional have been used in the application which is used to create an interface between the deaf and the device application by using facial animations. However, the sign language is proved to be a tool that is valuable for deaf people in interaction which represents our main goal in the project which is to research over the development of a vision system of an animated gesture implementation of a sequence acquired from multiple cameras. This application choses phonemes from the sign language dataset and recognizes the hand gestures and represents them in patterns. The matching algorithms in this pattern recognition is done using NI LabVIEW. This software has a vast collection of the tools that are used for filtering of the process segmentation, testing image-acquisition. As a previous application represented, did not have robustness of its patterns and feasibility in 2-D and 3-D detections that have been proving inefficiency leading to the introduction of appearance based approach that is Gesture Implementation System.

The process focuses on conversion of the text or speech to the sign language. This app is to create a system which provides integrated services in efficiently filling the gap of communication between the normal and the deaf. This is done by fixing small problems in the application that provides facial animations by adding gestures from the sign language dictionary in pictures that is explained as letters to finger sign language. There were many such services that were implemented which had small goals in creating a better integrated life about the concerns for the deaf which were Peripheral services and Central services. Peripheral services included services that had no effect of this converting process but still gave help to the deaf and the normal people to know each other such as Deaf tube- These are the list of channels provided by the YouTube. Deaf News which contains useful news from numerous websites. Text-to-Speech these are the services provided for the deaf which is an application in which the text is to be typed and understood by the deaf. In this part of the paper the system that makes the real job for the deaf is mentioned which is mostly integrated on sign language, the grammar which has a set of rules for using this language and their structures. This deals with natural language processing and lot of data mining algorithms of text and speech. There are two components in sign language which are represented as sign-language dictionary and finger sign-language. The sign language dictionary deals with more than 7,000 words for using in the daily life of the deaf. In sign language, each word differs from country to country. In finger sign language, each finger can be used in the culture of deaf to pronounce different names, locations, things, etc.

Next, we are going to talk about the other two algorithms for our solution which are Hand Gesture Recognition Algorithm SOM HEBB Classifier and Sequential cluster detection algorithm. The Som Hebb Classifier deals with hand gestures. They include inputs which generally include gloved based and vision based. This work therefore deals with the SOM Hebb classifier that includes the acquisition of images and processing of images. It also includes feature extraction.  It includes feature vector and identifying hand signs. The sequential cluster detection algorithm deals with the clusters from feature assemble being found using a scheme in sequential clustering. The basic morphological operators have already been used and here Cr channel of the frontal view is depicted.

There are steps to measure and recognize the data as follows. The identification process of converting the speech into hand signals from the words that have been saved detecting their flow to the database that must be matched to the direction of the body movement and the hands. As the next step is capturing a real time display of graphical user interface (GUI application) which has been made. After this we should segment and retrieve data based on the color, movement, object that has been identified and obtained the object detection and pre-processing process for the pattern recognition methods as the following step is to perform the data separation process for the feature extracted. This is done on different frames extracted from their features by calculating the positions on x, y, z coordinates on each point earned. The processed data which is stored in a folder that is the data taken from the training and the value of comparison in recognition of the sign language gestures. The experiments that have been performed are done using the Euclidean Distance where the output is a set of processes of the positions of x, y and t as the object of main features. The classification process uses Euclidean Distance to detect the position of X and Y which is taken as an input. All the algorithms discussed help in the implementation of the mobile multimedia applications.

In these figures, we see that every word that has been obtained has a different pattern. The patterns which are similar between the words in the chart X and Y are not visible. Different time is influenced in different movements.

The sign language being the most used language by the deaf which does not use sound that is the speech or writing of humans and is coordinated by the moments produced by the gestures of the hands and body because of combining shapes and orientating it to a language that is to express thoughts [26]. The Sign language that is given as a reference was the words or vocabulary in Indonesian including American sign language which has been taken originally [26][27].

Object tracking and representative image is a way to follow the elements that are to be localized into moving video automatically in a dataset. The tracking here plays a vital role for the process of video which is to extract the feature property for creating the objects that are to be shown in a video. Continued by the concept of optical flow which is to forecast the movements that have been taken place in the previous phase which is the tracking of objects which means it can be done in two kinds of spacing which is 2-D and 3-D. The 2-D space is to create two frame images sequentially and the 3-D space which

Figure 3 Position of X chart [25]

Figure 4 Position Y chart [25]

Figure 5 Position Z chart [25]

Figure 6 Value of the distance represented for each word [25]

is used to create pixels of the successive volumes created. The algorithm used here is Lucas-Kanade algorithm [28][29].

In this pyramid of Lucas-Kanade which is the differential method for the optical flow that is to be estimated and to be created as it assumes the pixel flow essentially to be constant in its localized neighborhood and solves the pixel equations of optical flow in that neighborhood.  This method is used out of all the methods that have been present because it solves the problem of inherent ambiguity and is sensitive to image noise then the points.

Then we use the Euclidian distance this is the method used in pattern recognition that is obtained from the charts between two coordinates or three to find the pattern similarity for the words that are not visible. This is done using the Pythagorean formula by taking Cartesian coordinates for X being X1,X2,X3,…Xn; and Y being Y1,Y2,Y3,…Yn; after the words have been acquired the destination image that highly corresponds to the template work where the template is compared to the acquired work from the chosen coordinate system[X0,Y0]. The geometric pattern here is derived from the dataset for the gestures that are corresponding to the words that are extracted in the patterns using the geometry pattern matching algorithm which is the method implemented based on mapping spatial patterns on edge based matching type which is geometric.

Figure 7Lucas-Kanade algorithm [30]

The process of gesture implementation is based on subsequent parts presented above in the end we use the GUI which is the Graphical User Interface that creates a template of all the patterns acquired in three main sections. The first being the gesture template indications where each gesture is ordered to the possible patterns that is matched. In case of an error there is a cluster provided which gives an error status. The red being the gesture not detected and the green being the gesture detected. The second section being processing of raw images to display. The images to be displayed are unprocessed they are basically from eye camera and the processed image are ready to present and are placed in so called box which is the “pattern matching box “. The third section being the control acquisition where there is an acquisition tab which tells the user to select, present, ignore or stop. The acquisition of gestures to be presented along with the animations. It is the user’s choice to choose the acquired information or ignore. For this the NI Lab View is used which is the special programming environment for the graphical implementation of these gestures using basic tools for gesture acquisition software [32].

Figure 8Lucas-Kanade Pyramid extension[31]

Figure 9 Algorithm code for image processing and Geometric pattern

3.2 MATHEMATICAL MODEL

A lot of pattern recognition problems contain pattern matching of point sets as an important sector. The model assumes that the number of points are denoted as p which are to be matched. We state that as O(p(log p)3/2) axiom which is used in a matching problem of point patterns and we demonstrate that it is quicker than any algorithm that is present.

We consider two point patterns S and R present in dual dimensions. The sets {s1,s2,…sn} in set S, {r1,r2,…rm} in set R where K2 consists of sand ras the points. Assumption made is S and R have equal number of points. Here affinal transformation is to be found, ATf,θ,tx,ty ,in a way that AT(S) maps to R precisely. Affinal transformation ATf,θ,tx,ty, f depicts measuring factor, Referring to (x,y)ϵ K2,

The translations are given as tx, ty x and y. The rotation angle is stated as θ. We consider two parameters which are matching probability mp ϵ[0,1] and the matching size to explain a match t ϵ K+. ATf,θ,tx,ty,(S) which matches R. S` is a subset of S having sn elements in manner saying each s ϵ S`. Now

| ATf,θ,tx,ty,(s)-r|<at such that rϵR.

Assuming the range of points R is r. Considering point distribution in a uniform way, on a mean distance in R is r/√n. It is apt to select t as fraction constant for this mean distance. Consider λ as that fraction called as matching factor.

Resulting t as t= λ(r/√n)

Here it is considered that R is local affinal distorted transformation of S with points missing.

The concept of the model is that with large amount of probability one of the primary random points in S will relate to some point in R. We then consider a random point in S and then look for a point in R. In that way, a transformation which is affine is present which maps the neighbors which are closer to S than to R. Then checking whether the match is global is important.

Point locator in T: In O(1) duration, checking of precomputation is carried out to check if there is a point located in R. It should be of a short proximity to any coordinate r0(x,y). Consideration happens with respect to t0 being less than r/√n. Determining the square (j,k) belonging to the array which is in lookup form having the coordinate reduction and occurrence for all the points r` containing in a number of 8 squares around (j,k). The square (j,k) then checks for a match and determination of  |r0-r1|<t0 takes place. As depicted in figure 10.

Figure 10 Point locator in R[40]

Analyzation of closest neighbors: It depends on two parameters l2,l3. Finding the closest neighbors of s ϵ S are similar to r ϵ R. Consideration stating that the two closest neighbors are at a large location from s and r. Then analyzation of closest neighbors equal in this transformation are computed. Consideration of s=(sx,sy) and b=(bx,by) are two different points located in S and r=(rx,ry) and c=(cx,cy) are two different points located in R. Evaluation of affinal transformation uniquely, AT, i.e AT(s)=r and AT(b)=c  Determination of l,θ,tx,ty is as follows-

l

Antonio BonafonteJosé Luis Landabaso2002, IEEE  International Conference on Acoustics, Speech, and Signal   Processing, Year:2002, Volume: 4, Pages: IV-3624 – IV 3627,    DOI: 10.1109/ICASSP.2002.5745440Cited by: Papers(4) Patents (1)

  1. “Chapter11:Matchingin2D,https://courses.cs.washington.edu/courses/cse576/book/ch11.pdf
  2. C.Benoit and B.Le Goff,”Audio-visual speech synthesis from french text:Eight years of models,designs,and evaluation at the ICP,”Speech Commun,vol.26,pp.117-129,1998
  3. D.H.Klatt,”Prediction of perceived phonetic distance from critical band spectra:A first step,”in Proc.Int.Conf.Acoustics,Speech,Signal Processing,1982,pp.1278-1281.
  4. “Parameterized models for facial animation”,IEEE Comput.Graph.Applicat,  vol.2, no.9,pp.61-68,1982.
  5. My library My History Books on Google Play Intelligent Computation in Big Data Era:International Conference of Young Computer Scientists, Engineers and Educators, ICYCSEE 2015,Harbin,China,January10-12,2015, Proceeding 223
  6. JSR 113:Java Speech API 2.0

http://jcp.org/aboutJava/communityprocess/final/jsr113/index.html

  1. Acapela Group. http://www.acapela-group.com/index.html.
  2. APD http://www.ncapd.org/What_is_APD_.html
  3. G. Takács, A. Tihanyi, T. Bárdi, G. Feldhoffer, B. Srancsik: “Speech to facial animation conversion for deaf applications” 14th European Signal Processing Conf., Florence, Italy, September 2006.
  4. G. Takács, A. Tihanyi, T. Bárdi, G. Feldhoffer, B. Srancsik: “Database Construction for Speech to Lip-readable Animation Conversion” 48th Int. Symp. ELMAR-2006 on Multimedia Signal Processing and Communications, Zadar, Croatia, June 2006.
  5. G. Takács, A. Tihanyi, T. Bárdi, G. Feldhoffer, B. Srancsik: “Signal Conversion from Natural Audio Speech to Synthetic Visible Speech” Int. Conf. on Signals and Electronic Systems, Lodz, Poland, September 2006.
  6. ”Mobile multimedia application for deaf users”-Author :Attila Tihanyi Published in :ELMAR, 2007.
  7. https://courses.cs.washington.edu/courses/cse576/book/ch11.pdf Computer Vision 2003,(Matching in 2D),Chapter 11.
  8. Partha Niyogi, Eric Petajan, “Feature Based Representation for Audio-Visual Speech Recognition”,Jialin Zhong Bell Labs – Lucent Technologies Murray Hill, NJ 07974, USA. (Figure 8.Speech signal

-2.2. Visual Feature Representation-Figure 3: The FDP points around the lips and their corresponding FAPs.).

  1. T.Darrell,A.Pentland.”Tracking facial motion”Proceeding of IEEE Workshop on Motion of Non-rigid and Articulated Objects,1994.pp:36-42.
  2. P.Kalra,N.Magnenat-Thalmann”Modelling of vascular expressions in facial animation”.Proceedings of  IEEE Computer Animation ,1994.
  3. L.Moubaraki;J.Ohya;F.Kishino “Realistic 3D facial animation in virtual space teleconferencing”Proceeding 4th IEEE International Workshop on Robot and Human Communication,1995, pp:253-258.
  4. Feature Points Extraction from Faces Hua Gu Guangda, Su Cheng Du Research Institute of Image and Graphics, Department of Electronic Engineering, Tsinghua University , Bejing China.
  5. “Nostril detection for robust mouth tracking”  Luca CappellettaNaomi Harte,IET Irish Signals and Systems Conference (ISSC 2010),Year: 2010,Pages: 239 – 244, DOI: 10.1049/cp.2010.0519,Cited by: Papers (1)
  6. System for people with hearing impairment to solve their social integration” , Figure 6,Figure 7,Figure 8, Figure 9, Khalil SaleemIyad AlAgha,2015 5th International Conference on Information & Communication Technology and Accessibility (ICTA) ,Year: 2015 Pages: 1 – 6, DOI: 10.1109/ICTA.2015.7426913
  7. Pusat Bahasa(1988)-KAmus Besar Bahasa Indonesiall.Jakarta:Balai Pustaka.
  8. Imas A.R.Gunawan 1996-Kamus Umun Bahasa Isyarat Indonesiall,Jakarta:Lembaga Komunikasi Total Indonesia.
  9. Affan Mahtarami, Moch. Hariadi, ―Tracking Gerak Tangan Berbasis Pyramidal Lucas-Kanade‖, Jurusan Teknik Elektro, Fakultas Teknologi Industri, Its – Surabaya
  10. Ubaidillah Umar, Reni Soelistijorini, Haryadi Amran Darwito, ―Tracking Arah Gerakan Telunjuk Jari Berbasis Webcam Menggunakan Metode Optical Flow‖, Jurusan Teknik Telekomunkasi – Politeknik Elektronika Negeri Surabaya, Surabaya
  11. Figure 7: Lucas Kanade Algorithm, http://www.ri.cmu.edu/research_project_detail.html?project_id=515&menu_id=261
  12. “Pyramid optical flow procedure outline figure” https://www.codeproject.com/articles/840823/object-feature-tracking-in- csharp?fid=1873032&df=90&mpp=25&noise=3&prof=true&sort=position&view=expanded&spc=relaxed
  13. “Real-time detection of hand gestures” Figure 6:Image processing and geometric pattern matching algorithm code,Piotr Muzyka; Marek Frydrysiak; Elzbieta Roszkowska 2016 21st International Conference on Methods and Models in Automation and Robotics (MMAR)Year: 2016,Pages: 168 – 173, DOI: 10.1109/MMAR.2016.7575127.
  14. “Recognizingactionunits for facial expression analysis.” Y.- L.Tian,T.Kanade,andJ.Cohn.  IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2):97–115, Feb. 2001
  15. “Evaluation of surface EMG features for the recognition of American Sign Language gestures”Vasiliki E. Kosmidou; Leontios J. Hadjileontiadis; Stavros M. Panas2006 International Conference of the IEEE Engineering in Medicine and Biology SocietyYear: 2006Pages: 6197 – 6200, DOI: 10.1109/IEMBS.2006.259428,Cited by: Papers (14)
  16. J. R. Pansare and M. Ingle, “Real-time static hand gesture recognition for American Sign Language (ASL) in complex background,” Journal of Signal and Information Processing, vol. 3, pp. 364-367, 2012.
  17. Q. Chen, et aI., “Real-time vision-based hand gesture recognition using Haar-like features, 2007”, in Proceedings IEEE IMTC, 2007, pp. 1-7.
  18. S. Shiravandi and M. Rahmati, “Hand gestures recognition using dynamic Bayesian networks,” IEEE, pp. 1-6,2013
  19. Real time Hand Gesture Recognition using different algorithms based on American Sign Language,Md. Mohiminul Islam; Sarah Siddiqua; Jawata Afnan,IEEE International Conference on Imaging, Vision & Pattern Recognition(icIVPR),Year:2017,Pages:1-6,IEEEConferencePublications
  20. HMM-based human motion recognition with optical flow data,Dirk Gehrig; Hildegard Kuehne; Annika Woerner; Tanja Schultz,9th IEEE-RAS International Conference on Humanoid Robots,Year: 2009,Pages: 425 – 430
  21. A fast algorithm for the point pattern matching problem,P.B.

Vanwamelen; Z.Li; S.Iyengard

  • Mobile Multimedia Application for the deaf  with facial frame animations with an enhancement of hand-gestures.Instructor name: Dr. Abdelrahman Elleithy
  • Mobile Communications
  • Agenda
  1. 1.Review
  2. 2. Introduction
  3. 3. Related Work
  4. 4.Proposed Solution
  5.     4.1 Mathematical Model
  6.     4.2 Detection of finger tips
  7. 5. Simulation Results
  8.      5.1 Matlab Simulations
  9.      5.2 Comparison Results & Analysis
  10. 6. Conclusion and Future Work
  • 1.REVIEW
  • Problems identified in Audio-Visual facial animations using 2D and 3D space [1].
  • Proposal of Speech to hand signals.
  • Hand gestures enhancements
  • Matching algorithm
    • Lucas Kanade
    • Optical Flow
    • Object Tracking
    • 2. INTRODUCTION
  • Identification process based on Object Tracking.
  • Calculation of positions on coordinates using Euclidean distance.
  • Tracking and extracting feature in gradients.
  • Geometric pattern from the gesture dataset.
  • 3. RELATED WORK
  • Implementation of approach for posture recognition
    • Real time performance based on CFG
    • HAAR like features and algorithm of ADA Boost
  • Implementation of emotion recognition system
  • Blob analysis of hand gesture recognition
  • 4. PROPOSED SOLUTION
  • Criticizing of 2D and 3D facial animations
  • Processing of audio and video dataset in gestures
  • Deals with natural language processing and datamining algorithms
  • Pattern similarity by Lucas Kannade with object tracking
  •  Hand gesture mapping using Matlab.
  • 4.1 MATHEMATICAL MODEL
  • O(p(log p)3/2) axiom utilized in matching point patterns
  • Affinal transformation: AT(S) mapping to R; ATf,θ,tx,ty   
  • Point locator in AT
  • Analyzation of closest neighbors
  • Affinal transformation computation
  • Computing a global match
  • 4.2 DETECTION OF FINGER TIP
  • q looping method is used in clockwise direction.
  • q is initialized to p and p is assigned a pixel.
  • Number of points are detected exceeding the max value.
  • We obtain the character outputs as shown further.
  • STRUCTURE OF RECOGNITION
  • 5. SIMULATION RESULTS
  • ASL- Backbone of Simulation
    • Input Layer
    • Hidden Layer
    • Output Layer
  • Four phases –
    • Preprocessing
    • Extraction of features
    • Recognition of features
    • Image Acquisition
  • RGB conversion
  • 5.1 MATLAB SIMULATION
  • Selection of image from input database.
  • HGR function
  • Pixel Conversion
  • K-mean clustering
  • Contour tracing phases
    •  Scanning of the pixels
    • Assignment of s to p.
    • Initialization of q
  • 5.2 COMPARISON RESULTS & ANALYSIS
  • Analysis based on 10 people.
  • Using set of words p={hello, cake, fruit, typing, punch, bye, ok}
  • Experiment 1- Lip reading
  • Experiment 2- Gestures
  • Experiment 3- Combination of 1 & 2.
  • Experiment 3 has better efficiency than 1,2.
  • 6.CONCLUSION
  • Inefficiency of various lip reading applications
  • Resulting in gesture implementation
  • Performance of Matlab simulations
  • Overcoming the difficulties of Lip reading
  • Creation of better interface for the deaf
  • Improving accuracy and efficiency
  • 6.1 FUTURE WORK
  • Utilization of both hands for gestures.
  • Interpolating gestures in framing sentences.
Professor

You must be logged in to post a comment