ABSTRACT
Handwriting recognition has been a problem that computers are not efficient at. This is obviously due to the varying writing styles that exist. Today, efficient handwriting recognition is limited to ones, using hardware like light pens wherein the strokes are directly detected and the character is recognized. But if you want to convert a handwritten document to digital text, we have to extract the characters and then recognize the extracted character. But the problem with this approach is that there are not many algorithms that could efficiently extract characters from a sentence. Therefore character recognition using software is still not as efficient as it could be.
In this paper, we suggest the design of a software which could do this job of translating handwritten text to digital text. We propose a new approach using which the problem of recognizing handwritten text can be solved. We also provide the implementation details of this software. For the implementation of our idea, we propose a new Neural Network Architecture, which is a modified version of the conventional Back Propagation Network.
The technique adopted by the Lexical analysers used in Compilers guided and stimulated us to design this paper. Details regarding the automatic preprocessing that would be essential and the shortcomings are also included in the paper.
THE SCENARIO
The scenario in any developing country is almost the same today. The western countries with their superior infrastructure, are making the most out of the technology while in countries like ours, technology has been limited to the metropolitans where the infrastructure has been developed. But if we are to equal the developed countries in exploiting these technologies, we need to expand our infrastructure into the rest of our country including villages.
But to this we should first implement a completely computerized way of doing everything. Everything from rations, electricity, grocery, to taxes should be kept track of by computers.This would drastically reduce the speed of processing documents and manpower needed. This has to happen at some point of time, if not immediately, if we are to compete with the developed countries in any means.
As it is obvious, establishing such an infrastructure in a country as huge as ours would require huge resources. The required number of computers, networks etc would be massive. This is one of the main areas of research in which the Indian department of information technology is involved. It is constantly working on producing cheaper computers, components and software.
But even if we do produce such an infrastructure by the year 2020 as is expected, it would take a massive amount of manual work to do the switching or transition to such a system where everything is computerized.
This is so because for over half a century, we have been recording every detail manually, and if we are to switch to a digital means, all these need to be converted into the digital media too. This means more work. Manually feeding all these details into a digital media would take mammoth manpower and time which cannot be practically accomplished. So a solution for this problem would be to computerize this transformation too, by developing software that would recognize handwritten documents and transform it into digital files or databases.
OUR MODEL
The basic design of our software includes a buffer which is of equal size as the input bitmap file. The whole of the file is first scanned into this buffer. All the processing involved in the recognition process is performed on the data present in this buffer. The innovation or the difference in the recognition process is the way these pixels are handled.
As explained before in the already existing software, first a raster scan is performed on the bitmap and each character is recognized making them not suitable for recognizing running letters. To avoid this we prefer not to segment the characters before recognizing them.
We accomplish the recognition process as and when the scanning is done. We use two simultaneous scans, a vertical and horizontal one, feed them to two different neural networks.
The key in this is that when a horizontal scan is performed, the pixels are fed into a neural network which is trained to identify the bases of various letters. Once the base of a letter is identified, the characters, in the order in which they appear in the sentence, are transferred to a second buffer where the sentences are placed. This is where the vertical scan comes into play.
The sentence buffer is scanned vertically and the pixels are fed into the second neural network which identifies the characters based on the pixel pattern. So when each of the pixel is read , it is fed into the neural network which recognizes it at real time. But the output of this is not determined until the character is decided for sure.
THE INSPIRATION
The solution that we have proposed is inspired by the lexical analyzers used in compilers. The lexical analyzer typically scans each character of the input string and based on current input, the state transition is made. The transition continues till a predefined final state is reached. The final state that is reached decides what the lexical unit is. This is exactly what we do with the image file containing the handwritten text. We scan each pixel column and based on the pattern so far scanned and the current pixel column the match percentage is calculated for the various character. If a match is found to be good enough, then the recognition is made.
THE PROBLEM
The problem here is that during such a real time operation, letters like ‘h’ and ‘b’ could be recognized as ‘l’ before the complete character could be scanned. So there is a need for an extra constraint which would make sure that this kind of real time processing does not end up in wrong results. The constraint is brought in to the network such that the recognition of the character is based on the character recognized till the previous column of pixels and the current column. That is if a character (c1) has been recognized till the previous column of pixels and if the next few columns of pixels do not take it closer to any other character in terms of the hamming distance, then the corresponding character is recognized. If the next few columns do take it closer to some other character then, the column till which the character (c1) was recognized is kept track of, and then if it starts to deviate from that character then it is still recognized as c1, and the next character starts from the column next to c1. On the contrary, if it takes the recognition extremely close to some other character (c2), then the character is recognized as c2 and not c1. For example, ‘l’ could be c1 and ‘b’ could be c2. In that case, c2 is recognized. If some other character like ‘ ’ were in the following pixels, it might take the letter closer to ‘b’ but starts to deviate there after then it is recognized as l and some other character. Thus the recognition could be performed at real time. Therefore the recognition could be extended to scripts written in cursive form.
IMPLEMENTATION
USING BACK PROPAGATION NETWORKS
The idea that we proposed can be easily implemented using a conventional back propagation network. The network as usual can have one or two hidden layers. The number of output units is equal to the number of characters present in the language that is being recognized.
The number of units in the input layer is equal to the sum of the number of units in the output layer and the maximum height of the character. The input to the network consists of two different sets of data.
One set represents the pattern of the column of pixels that are currently being analyzed. The other set is basically the output that is being fed back into the input layer. The output of the network would represent the percentage match of the pattern so far scanned with the various characters. So when this information is fed back as a part of the input, and the other part being the information on the pattern, then the output depends on both the previously recognized patterns and the current pattern.
Based on this it is decided whether the pattern matches more with the characters or not. Thus when the percentage match exceeds a threshold, then the character is recognized.
But the problem with this implementation is that we would have to create a way to link the values representing the percentage match with the patterns and the characters that they stand for. Instead of this, we refined the neural network so as to give a new architecture that would perfectly suit the problem in hand.
THE CUSTOM-MADE
NEURALNETWORK ARCHITECTURE
The proposed idea for recognizing the handwritten document demand a network where in the state transition during the analysis of each column of pixels could be represented. So we propose a new architecture of neural network that would suit this need. The network that we propose would accept inputs in each layer. The network would have more hidden layers than the maximum number of pixels forming the width of the characters.
The no. of hidden layers = maximum width of character + look ahead length.
But these hidden layers do not behave exactly as hidden layers in the fact that they take inputs too. The output of each layer would depend on the input it receives from the previous layer as well as the input representing the pixel pattern.
The architecture of the proposed network is given below.
Here Pmn is the array corresponding to the pixels of the sentence. m is the number of layers, and n is the maximum height of the character.
The first layer of the network takes the input as the pattern of pixels in the first column. The output is fed into the second layer as usual. But the difference is that the pattern in the second column is fed as the input to the second column.
Thus all the layers except the first, accept two inputs, one from the output of the previous layer and the pattern of the pixel from the corresponding pattern. Thus the output of any of these layers would converge to the value corresponding to a character if the pattern recognized so far corresponds to that character.
On that case, a marker is set to mark the column of pixels before which the character was recognized, then the look ahead is performed. In case, the look ahead yields in a better match, then the marker is updated and the process repeated. If the look ahead does not yield a better match, then the previously recognized character is taken into account and the scanning is done from where the marker points to. The training of this network could be done using delta rule much the same way as in the case of back propagation network, except for the fact that the input for each layer (pixel patterns) would be taken into account only during the feed forward.
THE TRAINING
The training of our neural network would be done in a different fashion as compared to the other already existing models. In our architecture of neural network, we are literally treating each layer as both input as well as output layer. This means that during the training we’d have to establish a mechanism to take care of the changing character length. We solve this technique too by using the approach described before.
The training used here is a supervised one. The training set includes the bitmap of the pattern that corresponds to a character, and the value of the character that needs to be generated during the pattern recognition. The training itself is controlled by a program.
The program’s modules split the input pattern of a character into columns of pixels and inputs each column to the corresponding layer, synchronizing it with the feed forward. It is the job of this trainer module to identify the height and width of the pixels. Based on these info the number of units and the layers in the network is chosen. During the training, the input pattern happens to be the pattern of the character that is split into columns of pixels and are fed into the corresponding layers so that the feed forward is synchronized with the input feed. The training is done using the delta rule as
w(t+1)i = w(t)i + 2 μεkxki
But the problem with our architecture is that when training for characters of smaller length like ‘I’, ‘l’ etc, only a part of the network is trained and the rest of the weights are left as such. Therefore the network would not perform as expected if trained randomly. Therefore this order is also determined by the training module. This effect could be explained using the following diagrams.
The above network shows the nodes involved in the recognition of character ‘w’. This shows that all the weights are adjusted during the training. But the diagram shown below is for the character ‘l’ wherein only the first layer is involved. So if we are to train the character ‘l’ after training the network for ‘w’ then some weights of the network is changed independent of the others which help in recognizing the characters.
We overcome this problem simply by training the characters of shorter width first and the longest at the end, thereby eliminating the scenario where in there would be independent change. The other option that we had to overcome this was to maintain individual personality files for each character and running a sequential processing on these files during the addition of each column of pixels. But this method would be extremely time consuming.
OUR ALGORITHM (RECOGNITION)
repeat for j = 0,1,2,…, height of character
Layer[0].output[j]=0;
layer_index = 1;
repeat for pix_index = 0,1,2,…, length of sentence
• read pix_pattern_buffer[pix_index][] into pix_pattern_vector[]
• repeat for j = 0,1,2,…, height of character
o layer[layer_index].input[j] = func(layer[i-1].output[j], pix_pattern_vector[j]
• Propogate output to logical layer given by layer_index.
• If check(layer_index) == TRUE
o Look_ahead_ptr[char_match_arr_index] = pix_index
o Char_match[char_match_arr_index] = character recognized.
o Match_found = TRUE
o char_match_arr_index = char_match_arr_index + 1
o repeat while ( LookAhead() == TRUE and look_ahead_cnt < 3)
o Look_ahead_ptr[char_match_arr_index] = pix_index
o Char_match[char_match_arr_index] = character recognized.
o char_match_arr_index = char_match_arr_index + 1
o best_match_index = findmax(charmatch)
o string_concat(string_output, Char_match[best_match_index]
o pix_index = Look_ahead_ptr[best_match_index]
o layer_index = 1;
• else
o layer_index = layer_index + 1
o pix_index = pix_index + 1
write string_output to output file;
THE LOOK AHEAD
As already discussed, when we go for a recognition based on the current and previous columns of pixels, chances are that letters like ‘b’, ’P’, ‘K’ etc would be recognized as ‘l’ and some other character. To avoid this we need to look ahead of the column of pixels till which a character is recognized. That is if the hamming distance of the pattern so far recognized is close enough to a character to be recognized and from the next pixel if it starts deviating, then we would keep track of the column of pixels (p1) where the distance was minimum. From there on we would look ahead if the closeness increases with some other character. If it does, then the second character is given as the output, and the column mark p1’s value is changed to current column. If that is not the case, then the first character is given as the output and the next pattern is considered from p1.
THE PREPROCESSING
The preprocessing required for the recognition is pretty simple , the whole of the bmp is resized by stretching the bmp as is necessary. This is done so that the characters to be recognized are in standard size. If only this is the case would the number of states per character would be the same as the ones given during training phase.
CHALLENGES
There are a few challenges in this method that we are yet to overcome which we hope to do in the next couple of months. The major one is setting the thresholds for optimized handwriting recognition. These thresholds include the number of pixel columns that needs to be looked ahead before the network could deliver the output. Another factor of concern with this algorithm, when the proposed neural network is used, is the time constraint. But with faster processors coming up, this should not be much of a concern, at least not the major concern.
Provided these challenges are taken care of, we believe that our algorithm could prove to be a highly effective means of recognizing characters with a high degree of precision.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment