Recently I’m trying to build a semantic search system with my own data and I came across your blog post. I found quite a few papers using “Recall@K” as an evaluation metric (e.g. Semantic Product Search by Amazon, Embedding-based Retrieval in Facebook Search by Facebook, Embedding-based Product Retrieval in Taobao Search), but it is unclear how they obtain the total number of relevant documents (or items) for their query-document pairs.
While it is totally possible to hire a lot of annotators to figure out which documents are relevant to a search query, I don’t think that is economically feasible at all. Do you have any idea how engineers in industry figure out the total number of relevant documents (or items) for their query-document pairs? Many thanks!
If I had to build a search engine from scratch, I would:
I think using human annotators can work, but probably only for defects or edge cases, given how costly it is.
Have a question for me? Happy to answer concise questions via email on topics I know about. More details in How I Can Help.
Join 5,700+ readers getting updates on machine learning, RecSys, LLMs, and engineering.