Project failures are very common in IT. This risk is higher if you are adopting a new technology and which is unfamiliar to your organization. Machine learning is not at all new to the world but development and awareness have now reached a point at which its benefits are becoming attractive for business. Though machine learning has a huge potential of reducing costs and finding new revenues by applying new technology aptly but if not implemented properly there could be many pitfalls.
There is a lot to do for developers in machine learning as it offers the promise of applying business critical analytics to any applications. Machine learning enabled applications are able to accomplish everything from improving customer experience to providing product recommendations for serving up hyper-personalized content.
There are certain key points for successful machine learning enabled applications developers must be aware of:
1. Choosing Machine Learning Method Wisely
Consider Gradient Boosting Trees (GBT) which is a popular supervised learning algorithm widely used by industry practitioners due to its accuracy. Despite the fact that it is highly popular, it should not always be a solution to every problem. One should always use the algorithm which fits best for achieving most accurate results. For better understanding, the comparison can be done between GBT and linear Support Vector Machine (SVM) algorithm. Compare accuracy between GBT and linear SVM algorithm on popular text categorization dataset rcv1. It has been observed that linear SVM is superior to GBT in terms of error rate. SVM is a linear classifier. Simpler the model, the less problematic it is to learn. On the other hand, GBT is highly nonlinear and more powerful but more difficult to learn. It often ends up with inferior accuracy.
2. Avoid Subsampling
The more data algorithm has, the more accurate it becomes hence it is suggested to avoid subsampling. Machine learning holds a very peculiar feature of prediction error. The gap in prediction error between machine learning model and optimal predictor can be broken into three parts-
- 1. Error because of not having a right functional form for the model
- 2. Error due to not finding optimal parameters for the model
- 3. Error because of not feeding enough data to the model
Researchers say that best practice is to use all the data rather a subsample.
3. Choosing Aptly The Method And Parameters
Obtaining great model, it is important to choose the method and parameters appropriately. This can be well understood as machine learning algorithms have a variety of knobs to tweak. GBT algorithm alone has dozen parameters settings which may include learning rate, controlling tree size, sampling methodology for rows or columns, loss function and much more. A project may require finding best values for each of those parameters in order to get highest possible accuracy for a given data set. This is not at all an easy task. In some cases, intuition and experience may help, but for more better and accurate result a data scientist has to look for a large number of models, see their cross-validated scores and deciding on what parameters to try next.
4. Selection Of Suitable Business Objective
Machine learning algorithms are formulated as optimization problems. For successful machine learning, it is important to adjust the objective function of optimization and this can be done by knowing the basic nature of business. SVM optimizes the generalization error for binary classification problem by assuming all types of errors are equally weighted. For cost-sensitive problems like failure detection (where certain types of errors might weigh more than the others) SVM seems inapt. It is suggested to adjust the SVM loss function. This can be done by adding more penalties on certain types of error to account for their weights.
5. Understanding Of Generalization Error
Generalization error signifies how well the model performs on unseen data. It is not necessary that because a model performed well on training data it will perform well on unseen data. Before actual a cautiously designed model evaluation process, which explains the real deployment usage is a prerequisite to estimating generalization error of the model. There are cent percent chances of ignoring rules of cross-validation and there are many such incidents where cross-validation is performed incorrectly. This generally happens when there is an attempt to take computational shortcuts. Therefore, it is imperative to pay attention and diligently perform cross-validation before deploying any model.
Machine learning is all about identifying the output which we can expect from the system and therefore implementing each parts step by step. This cannot definitely be achieved overnight and therefore a great amount of effort and time is required for artificial intelligence system to work. Organizations can make better decisions without human intervention, by using models which are built by using algorithms.
Always be aware of the false promises of simple-to-use machine learning functionality which can be applied without a thought of correctness, usability or scalability. This will definitely not yield high predictive accuracy and high business value which machine learning has to offer. Moreover delivery of poor models many actually backfire and build distrust in product and service among users. Conclusively, it is mandatory for developers building machine-learning enabled applications to follow best practices for machine learning application development.
References: infoworld.com, sas.com, media.bemyapp.com