Approximate Pattern Matching
Defining The Problem
This is a new project. I think (hope) readers will appreciate the opportunity to see an Engineers' approach to a technical question. I am a System Engineer. A generalist. Fundamentally, I apply scientific and mathematical principles to practical ends. I rely heavily on science and mathematics when working on a problem. Consequently, my notes and papers will reflect this dependency. I have included a list of references below as an aid to better understanding of the subject at hand and the methodology of my approach.
Approximate Pattern Matching (APM) is a method of searching a relatively large data source to find occurrences of a relatively small given pattern. A match is found if a data source sequence is identical or close to the given pattern. Just how close is close enough is part of the problem. If the data source is a text file of surnames and the pattern is "O'Keefe", we might want to count the sequence "Okeefe" as a match but if the sequence is "Okeafe", we might not want to count that as a match. The point here is that if the match criteria is too loose, we get too many matches (like Web search sites). If the criteria is too tight, we lose the advantage sought by using APM.
EBNF 4.0 is a tool for specifying linear procedural grammars. It produces parser/compilers that employ top-down successive alternation with back-tracking and conditional compilation. A pattern, in EBNF terms, is called a token and tokens must be identical to the source sequence to be counted as a match.
The question (i.e. problem space) is can I develop a reasonable model and specification for APM that, preferably, makes use of EBNF 4.0 in the implementation? The implication, of course, is that the APM model can be expressed as an EBNF 4.0 linear procedural grammar.
My thinking process about this is that every specific area of human endeavor has its own distinct language. There is, I believe, some inherent advantage in developing specialized languages for specialized subject areas. The consequence, however, is that when two areas of specialization are working the same problem in parallel, neither may be intuitively aware of that fact. The result can be duplication of effort and missed opportunities for cooperation. I suspect that APM is closely related to compiler theory so that one may benefit from the other.
Methodology and APM Sources
Note: I will place document links in the pane on the left as they become available.
This Web Page Created with PageBreeze Free Website Builder