• Blog
  • Company News
  • Watson Score: How we’re using Data Science and Machine Learning
Company News

Watson Score: How we’re using Data Science and Machine Learning

Diamonds and Machine Learning

There are clear features that make a diamond more appealing to the human eye than others. And those are the same features that drive the price of a diamond. A good machine learning algorithm, when trained on enough data, should be able to say, given the features of a diamond, what the price “should be”.

And we were able to do it very well!

The Lowdown on Machine Learning

Machine learning and recommendation engines have entered all facets of life. As I was reviewing the slides of a recent presentation on data architecture from a major US entertainment company, I came across a small nugget of information - 80% of all views come from their recommendations. In their own words “recommendation is everything”!

If you followed the recent controversy over Facebook’s “trending topics”, machine learning does not take away from human subjectivity, or creativity. It is simply summarizing massive amounts of data and discovering hidden patterns that may not be apparent to the human eyes.

Machines are good at summarizing data and detecting patterns that humans often can’t, and hence can make better prediction. The data point or metric being predicted is called a “target” variable and the data points being used to make that prediction are called “feature” variables. Traditionally we called them, dependent and independent variables - but it is not accurate in current context.

Because machines are good at big calculations, we throw a lot of data points at them and let them decide what is useful in making the prediction. In the process, we can end up sending feature variables to the machines that have no dependency on the target variable. On the other hand, the feature variables may have dependencies among themselves. This bring us to the concept of “interleaving” features. It is not just the number or cardinality of the features that make a prediction difficult for humans, it can be the complex “rules” involving several features that could be difficult to “deduce”.

Additionally, even if we are able to capture the feature variables and their interdependencies, some problem may just be difficult to predict. For example, if the outcome is truly random and we cannot find features that cause or drive the outcome, then we’ d be helpless no matter how much data we throw at the machines.

IBM Watson and Rare Carat

For proprietary reasons, we are not able to reveal all the features that our algorithm uses, or how exactly these features work together to drive the price of diamonds. But you can probably guess the famous “Four C’s of diamonds” - carat, cut, color and clarity - are surely of predictive nature. In addition, we use data on shape, fluorescence, symmetry, polish, culet, table, depth, cut angles, and length/width ratios.

IBM Watson Analytics Predict uses decision trees to understand how these variables influence the target variable, in our case, cost. IBM Watson Tradeoff Analytics helps by using a mathematical filtering technique called “Pareto Optimization,” that enables exploration of tradeoffs when considering multiple criteria for a single decision.

We collect more than 10 million data points from diamond retailers across the internet, clean and store the data on the IBM cloud in a structured manner. This data store is updated regularly with the latest inventory data and a fast feedback loop between the data store and new data models allow us to present you with the best recommendations based on the sale price of a diamond and what the price “should be” - in other words, we are quickly able to identify good deals. No human mind is capable of this - not even diamond experts.

We also use an ensemble algorithm that slices up the data into several random overlapping segments and tries to deduce the rules from each segment. The outcome of this process is a many rule-based “trees” that are finally accumulated together. Then, to ensure, we are doing a good job, these rules are tested on data that the algorithm has not seen before.

Hope the end results are useful for you!

Rare Carat’s Watson Score FAQs

What is the Watson Score and how does Rare Carat use it to price diamonds?

The Watson Score is Rare Carat’s predictive model, powered by IBM Watson, that estimates what a diamond should cost by analyzing features like the 4 Cs (carat, cut, color, clarity), plus other details like proportions, symmetry, and even fluorescence. The algorithm digests millions of data points to find hidden patterns and gives you a fair market value for any diamond.

Which diamond features does the Watson model take into account?

Besides the standard 4 Cs, Watson also looks at secondary characteristics: shape, polish, symmetry, culet size, depth/table ratios, and more. By combining all these factors, it builds a nuanced prediction — not just based on size or clarity, but on how all the qualities work together.

How accurate is the Watson Score’s prediction for diamond prices?

Rare Carat says that Watson does very well: they’ve fed it massive data (over 10 million retailer data points) and used techniques like ensemble decision trees to train the model. They then test the model on data it hasn’t seen before to make sure it generalizes correctly and gives realistic “should-be” price predictions.

Can I rely on the Watson Score alone to make a decision — or should I combine it with other tools?

While the Watson Score is a powerful tool for price prediction, Rare Carat suggests it works best alongside your own preferences and expert advice. The model helps you spot good deals, but it doesn’t replace human judgment—especially when you’re thinking about style, personal taste, or long-term value.
Saurav Pandit
Saurav Pandit