Evaluating Similarity Search Without Ground Truth?

Hey everyone! :waving_hand:

I’m building a similarity search for recommendations. I don’t have ground truth - just some expected test cases showing I’m on the right track. The similarity search is working great and clearly better than what we had before.

Traditional metrics like precision/recall need ground truth labels, which makes them pointless in my case. But I still want proof that the system is performing well.

Has anyone dealt with this? Looking for:

  • Alternative evaluation approaches without ground truth

  • Proxy metrics or empirical validation methods

  • Expert evaluation / human judgment approaches

  • A/B testing strategies you’ve found useful

Any insights appreciated! :folded_hands:

1 Like

There appear to be several promising precedents that could be utilized.

1 Like