LLMs vs Search as of 2024

LLMs vs Search as of 2024
Semi-abstract painting depiction of an LLM model fighting vs a search algorithm. LLM models are organic, wild and creative, search feels more systematic and organized. Generated with Midjourney

What's a good use case for LLMs vs Search when it comes to knowledge retrieval?

Read along for some speculations on my end based on current AI capabilities and my current thinking and experience.

Please feel free to add any thoughts!

Quick context

The moment ChatGPT came out, few people declared Google search to be dead.

Google recently started featuring AI summaries prominently in search results, some times with annoying results I may add.

On my side, only on occasions I'm finding myself asking specific questions to ChatGPT, rather than using search .

This happens usually when I'm working on tasks likely to require multiple iterations to complete them, or for tasks that require some degree of specialized knowledge that is not easily available in one place, but that all in all should exist in individual bits.

Interestingly, when building a knowledge retrieval system using a pre-trained LLM, the first step is usually for the system to translate the user query to a search , e.g. using RAG to search a vector database of knowledge.

The goal of this first step is to collect relevant bits of information which can then be passed to an LLM to interpret, summarize, etc.

My experience on building a system like this has been very much hit or miss - if the text you retrieve contains explicitly the information you are after, then the LLM can be fantastic.

Otherwise, what you often get back is a vague blurb that either doesn't answer the question or gives you the wrong answer.

A quick aside: over the years of using Google search, I learned how to search for things optimally. In the early days especially, my admittedly anecdotal experience was that it was optimal to frame a search query to be literally as similar as possible to the answer I was after.

So if I wanted to work out how old my dog was in human years, rather than searching "how old is my dog", I would search "convert dog age to human age" or "how to measure the age of a dog".

What is optimal evolved over time I reckon: search algorithms changed, but also people started framing web content so that it would match the way people searched - e.g. nowadays you can find a dog age calculator titled "How old is my dog" rather than "Calculate a dog's age"

This brings me to my hypothesis when it comes to current LLMs capabilities: what is optimal depends on what information you are after

If you are after specific information that needs significant collating and reframing, your options are the following, in increasing order of building effort and decreasing order of user effort:

  • Search the web, go through the results and manually come up with an answer
  • Finetune an LLM: then you can use it to get pointers to answers
  • Improve structure of the data (e.g. making explicit statements in the written text). You can then use search + a pre-trained LLM. Or further finetune a model on it

If you are after some piece of information that is likely to be explicitly written somewhere in some bits that need little reframing, such as is the case with summaries or even when writing some code, pre-trained LLMs can be amazing.

If you are after some specific facts, a specific person's opinion or perhaps very up to date information, search can be a winner.

Underlying rational: LLMs are just starting to reason

I would argue that LLMs are, amazingly, capable of reasoning. Their reasoning is not always the same as our reasoning and it's not always right. But by many definition of reasoning, they do have this capability.

I would also argue that the reason they have this capability, is that language itself is such a powerful representation tool, that allows to represent both knowledge and reasoning.

However, LLM's reasoning capabilities are currently quite limited: extracting information that needs significant reframing requires more advanced reasoning and some world model, which is precisely why they are not that good at it.

Andrea