Rules of Thumb for Building Recommendation Systems
Here are my rules of thumb from a few years of experiencing in building recommendation systems:
- Own the input data - Duplicate the data if needed, decouple the data from the persistence service or any other related service.
- Avoid schedulers, use events, use jobs queue. Every task in the system should be pass through the jobs queue.
- Independence - If possible, build a system that is decouple from any other service in terms of deployment, storage, downtime, logic, performance, analytic, etc.
- Randomness - Adding randomness here and there is never too bad, but you should always have control of it, actually, strive to understand and be able to explain the reason behind each recommendation. Don’t forget the Richer get Richer problem.
- Simple models - Start with a simple model, work in iterations, you don’t need to be a Machine Learning expert to make things work. Don’t jump and use the latest coolest ML libraries if you have no idea how they work.
- Scale & Performance - Don’t over-design, don’t under-design too much. Just do perfect-design :)
- Tests, Tools, Dashboards - Test every bit of logic in the system, write debug & analysis tools together with developing the system, use dashboards to see how your system is functioning.
- A/B Testing - A critical component but don’t get too depended on it. Get early feedback, use intuition, but in the end, users always right.
- UI - The text color of the recommended item probably have more impact on the CTR than the recommendation itself. Try to have as much impact as possible on the UI. If you can, code it yourself.
- Post Processing - Filtering is the last step of any good recommendation algorithm.
- Caching, Indexing are key components in any recommendation system, they should be clearly defined in the specification phase.
- Explanations - the explanation of the recommendation is important at least as the recommendation itself.
- The recommendation algorithm quality is highly depended in its domain. Gather as much domain knowledge as possible and use it. Make different algorithms for different domains.
- Variety - recommending a single item is completely different from a list of items. Watch out for fixation when recommending items on a daily basis. Time is an important feature when making relevant recommendation system.
- Content - Data always require cleaning, don’t be afraid to get your hands dirty.
- Control your own system - be an expert of your own system, regarding to hardware, OS, network traffic, algorithms etc.
- Analytic - It always comes last, but it should always get started with.