Training Your AI Platform
We’re continuing with our series about the basic steps in making your own AI platform. This time we’re looking at training your AI platform to ensure the algorithms process data correctly.
You need the algorithms in your Artificial Intelligence platform to find relationships in the data and come to decisions. To do this, you need to train the algorithms using data you have available.
The training data used must of course be the same type as the data that the platform will use once it goes live. The amount of training data is not an exact science; the more you have available the better of course, but this becomes more important depending on the complexity of the task. If the algorithms are only making very simple choices – two groups or classes, for example – then it’s not important to have huge amounts of information to use.
Is Accuracy Important?
You might read advice that data accuracy is not so important if the platform is not undertaking critical tasks. For example, the platform is not processing health related data. While it is true you can afford to have an accuracy threshold that takes into account the occasional error, we always recommend you still strive to have and use the highest quality data possible.
The reason behind our standpoint is this. Even if you don’t mind the occasional error, they’re still a pain and the output has to be monitored for this happening. Ultimately, you are creating an AI platform to streamline tasks. If errors are being thrown up then any time saved is then spent monitoring incorrect output. High quality training data will help you mitigate against this happening.
As an aside here, try to make sure the data you have has a good even weighting of the categories you are looking to identify. Having a low level of one or more categories can impact on the platform’s ability to produce consistently correct results.
So, you now have a good chunk of data available, all labelled and ready to go. But before you start there is one more step. You need to split the data in two, with one portion for training and the other for testing. That way you have optimum data which can be used to validate the accuracy of the algorithms after the training has been completed.
Get Ready to Rumble
Now you have your data sorted and ready to go, you can make a start with the training phase, allowing the algorithms to process data and learn their roles. With luck, the testing data will then confirm they perform as expected and you can move on. If your platform produces results that are outside of the expected results, then you will need to make necessary adjustments; this could be changing weighting or reviewing the labelling used for example. Once done you can then return to the start of the training and begin again.
One thing to note here is you should use exactly the same base data for the training and testing steps. This allows you to accurately compare the performance of the algorithms and the results your receive.