Sunday, March 23, 2014

String Matching Approaches

The text entry box at the beginning of our user experience (UX) is going to be one of the next aspects of Mechanapp to be tackled by the group, and we have some choices as to how to go about it. I am going to outline some of those choices and my thoughts on them below.

Naive String Matching:
We could chose to do a very simple naive string matching, looking for year, make, model, and one of our described symptom. Any parts that are exactly matched are populated and the rest is discarded and the user is prompted to fill in the missing pieces.
This is simple and would be easy to implement but lacks enough functionality in my opinion. It may however, be a good place to start.

"Hint" matching:
One other idea that has been proposed is to have a database object that represents something called "Hints". What this would do is map multiple strings to single symptoms. For instance, the symptom "engine will not start" is a formal symptom, but that sentiment could be expressed in many ways. So using a database object we would have many examples of strings that all result in a reference to the formal symptom. For example strings like "won't start", "isn't turning over", and "engine seems broken" would all map to the formal symptom "engine will not start". We would then do string matching on all the hints.
This is functionally just an extension to the naive string matching approach. I believe it has the same advantages and disadvantages and is just a natural extension of the first. As such, I think it would be a good second step.

More complicated things like String Edit Distance:
One interesting idea is to make it more complicated but smarter and have the strings not need to match exactly. This would allow for smart handling of typos, and potentially phrasing that we haven't anticipated but can be mapped to symptoms we have described. One way to do this would be to use an algorithm like string edit distance.
I think this is a great idea. Additionally, I think I would be well suited personally to implement it. While I've never implemented string edit distance, I have as part of my research implemented a graph edit distance algorithm that I designed which is very similar.
String edit distance would allow our search to be much smarter, but still be relatively easy to implement (as well as give me another task that I am actively excited about implementing).

No comments:

Post a Comment