Easy AI with Microsoft.Text.Recognizers
I recently discovered that there's an XMPP client library (NuGet) for .NET that I overlooked a few months ago, and so I promptly investigated the building of a bot!
The actual bot itself needs some polishing before I post about it here, but in writing said bot I stumbled across a perfectly brilliant library - released by Microsoft of all companies - that can be used to automatically extract common data-types from a natural-language sentence.
While said library is the underpinnings of the Azure Bot Framework, it's actually free and open-source. To that end, I decided to experiment with it - and ended up writing this blog post.
Data types include (but are not limited to) dates and times (and ranges thereof), numbers, percentages, monetary amounts, email addresses, phone numbers, simple binary choices, and more!
While it also lands you with a terrific number of DLL dependencies in your build output folder, the result is totally worth it! How about pulling a DateTime
from this:
in 5 minutes
or this:
the first Monday of January
or even this:
next Monday at half past six
Pretty cool, right? You can even pull multiple things out of the same sentence. For example, from the following:
The host 1.2.3.4 has been down 3 times over the last month - the last of which was from 5pm and lasted 30 minutes
It can extract an IP address (1.2.3.4
), a number (3
), and a few dates and times (last month
, 5pm
, 30 minutes
).
I've written a test program that shows it in action. Here's a demo of it working:
(Can't see the asciicast above? View it on asciinema.org)
The source code is, of course, available on my personal Git server: Demos/TextRecogniserDemo
If you can't check out the repo, here's the basic gist. First, install the Microsoft.Recognizers.Text
package(s) for the types of data that you'd like to recognise. Then, to recognise a date or time, do this:
List<ModelResult> result = DateTimeRecognizer.RecognizeDateTime(nextLine, Culture.English);
The awkward bit is unwinding the ModelResult
to get at the actual data. The matched text is stored in the ModelResult.Resolution
property, but that's a SortedDictionary<string, object>
. The interesting property inside which is value
, but depending on the data type you're recognising - that can be an array too! The best way I've found to decipher the data types is to print the value of ModelResult.Resolution
as a string to the console:
Console.WriteLine(result[0].Resolution.ToString());
The .NET runtime will helpfully convert this into something like this:
System.Collections.Generic.SortedDictionary`2[System.String,System.Object]
Very helpful. Then we can continue to drill down:
Console.WriteLine(result[0].Resolution["values"]);
This produces this:
System.Collections.Generic.List`1[System.Collections.Generic.Dictionary`2[System.String,System.String]]
Quite a mouthful, right? By cross-referencing this against the JSON (thanks, Newtonsoft.JSON!), we can figure out how to drill the rest of the way. I ended up writing myself a pair of little utility methods for dates and times:
public static DateTime RecogniseDateTime(string source, out string rawString)
{
List<ModelResult> aiResults = DateTimeRecognizer.RecognizeDateTime(source, Culture.English);
if (aiResults.Count == 0)
throw new Exception("Error: Couldn't recognise any dates or times in that source string.");
/* Example contents of the below dictionary:
[0]: {[timex, 2018-11-11T06:15]}
[1]: {[type, datetime]}
[2]: {[value, 2018-11-11 06:15:00]}
*/
rawString = aiResults[0].Text;
Dictionary<string, string> aiResult = unwindResult(aiResults[0]);
string type = aiResult["type"];
if (!(new string[] { "datetime", "date", "time", "datetimerange", "daterange", "timerange" }).Contains(type))
throw new Exception($"Error: An invalid type of {type} was encountered ('datetime' expected).");
string result = Regex.IsMatch(type, @"range$") ? aiResult["start"] : aiResult["value"];
return DateTime.Parse(result);
}
private static Dictionary<string, string> unwindResult(ModelResult modelResult)
{
return (modelResult.Resolution["values"] as List<Dictionary<string, string>>)[0];
}
Of course, it depends on your use-case as to precisely how you unwind it, but the above should be a good starting point.
Once I've polished the bot I've written a bit, I might post about it on here.
Found this interesting? Run into an issue? Got a neat use for it? Comment below!