I’ve been thinking about user experiences for LLMs. Currently, most demos interact with LLMs via chat. This is a good start, but I think we can do better. I’m not convinced most users want to interact via text. Also, I think we can provide LLMs with the context of the user, without the user having to type or voice it out.
Will chat be the UI for most (LLM) apps?
— Eugene Yan (@eugeneyan) April 18, 2023
I'm not sure. If I'm shopping for electronics or clothes, I want to look at specs & images.
Also, there's context (clicks, purchases) that shouldn't have to be chatted.
Maybe agents should work on context first and chat input second.
Most UIs today are based on clicks: When we surf the web, we’re mostly clicking on images, links, and buttons. And for some apps, the user’s context, such as location (e.g., Google Maps), persona (e.g., Netflix), and past behavior (e.g., Amazon), are taken into account.
Why not do the same for LLMs? Here’s how we can interact with LLMs with minimal chat. We start with clicking books of interest, then filtering them on vibes, before asking an LLM librarian for help. And even though the question entered is simple (e.g., “more books by female authors”), the LLM has our past context and can give us a personalized answer.
The prototype is a blend of recommendation systems, NLP, and LLMs.
When an item is selected, similar items are retrieved via approximate nearest neighbors on item embeddings (details). We can learn item embeddings by building a product graph from e-commerce data, generating random walk sequences, and applying representation learning. If multiple items are viewed in a session, more recently viewed items are given heavier weight. The retrieved candidates are then ranked via an LTR model or heuristics.
The vibe keywords that help us filter items are pre-cached. (Someone had the impression they were dynamically extracted via an LLM. This is not feasible now given the latency, but perhaps in the future.) We either extract these keywords from book descriptions or smartly find such data from sources such as the UCSD Book Graph.
Finally, chatting with the librarian is a call to an LLM, with basic storage of the user’s historical actions and chat. The app is served via FastAPI and Jinja templates. To minimize the amount of time a user had to wait after chat input, I used the streaming API with Python async and aiohttp in the backend, with a bit of JavaScript in the frontend.
What do you think? Would you prefer to chat more, or less? Do you know of other UXes for interacting with LLMs? Please share! 🙏
OG image prompt on MidJourney: “finding books in a magical digital library with a close up of books, in the style of contrasting tones, artifacts of online culture, innovative page design, complexity theory, bold black and whites, bold color scheme –ar 2:1”
If you found this useful, please cite this write-up as:
Yan, Ziyou. (Apr 2023). Interacting with LLMs with Minimal Chat. eugeneyan.com. https://eugeneyan.com/writing/llm-ux/.
or
@article{yan2023ux,
title = {Interacting with LLMs with Minimal Chat},
author = {Yan, Ziyou},
journal = {eugeneyan.com},
year = {2023},
month = {Apr},
url = {https://eugeneyan.com/writing/llm-ux/}
}
Join 9,500+ readers getting updates on machine learning, RecSys, LLMs, and engineering.