Strategies for Prompting Large Language Models and Incorporating Contextual Data
Large language models (LLMs) represent a new, potent instrument in software development. Given their distinct behavior and recency, these models instigate a fresh perspective on how to engage computing resources. Numerous methodologies can be employed to construct with LLMs, such as creating models from the ground up, refining open-source models, or employing hosted APIs. Among these, in-context learning has emerged as a prevalent choice with the rise of foundation models.
Design Pattern: In-Context Learning
At its core, in-context learning deploys LLMs right out of the box, without any fine-tuning. Their behavior is guided through intelligent prompting and conditioning based on private "contextual" data.
Consider a scenario where a chatbot is designed to address queries about a set of legal documents. An initial approach might involve inputting all the documents into a model like ChatGPT or GPT-4 and posing a question at the end. While this method could work for smaller datasets, the challenge arises with scalability. The most extensive GPT-4 model can handle around 50 pages of input text, and performance, as measured by inference time and accuracy, declines steeply as we approach this context window limit.
Utilizing Embeddings and Vector Databases
For embeddings, many developers turn to the OpenAI API's text-embedding-ada-002 model. It's user-friendly, yields satisfactory results, and is becoming more economical. For better performance in specific scenarios, some larger enterprises are considering Cohere. The Sentence Transformers library from Hugging Face is a popular choice for developers preferring open-source options. The practice of crafting different types of embeddings to suit specific use cases is also an up-and-coming field of research.
The vector database holds significant importance from a systems perspective in the preprocessing pipeline. It is tasked with efficiently storing, comparing, and retrieving possibly billions of embeddings (vectors). Pinecone, being fully cloud-hosted, emerges as a common choice, as it's easy to start with and provides features necessary for larger enterprises, like performance at scale, Single Sign-On (SSO), and uptime Service Level Agreements (SLAs).
Prompting Strategies and Orchestration Frameworks
Prompting strategies for LLMs and the incorporation of contextual data are evolving and becoming increasingly complex. Simultaneously, they are becoming more crucial for product differentiation. This is where orchestration frameworks like LangChain and LlamaIndex come into play. They simplify many details of prompt chaining, interfacing with external APIs, retrieving contextual data from vector databases, and maintaining memory across multiple LLM calls. These frameworks are popular among hobbyists and startups, with LangChain leading the pack.
An interesting entrant in the orchestration domain is ChatGPT. Although primarily an application, it can also function as an API and performs some tasks akin to other orchestration frameworks. Though not a direct competitor, it shows potential as a simple alternative for prompt construction.
Hosting and Inference Options
The OpenAI API, typically with the GPT-4 or GPT-4-32k model, is the starting point for nearly every developer we spoke to. This approach ensures an optimal scenario for app performance and is easy to implement, operating across a wide array of input domains without the need for fine-tuning or self-hosting.
The static parts of LLM apps require hosting as well. Vercel and major cloud providers are standard choices. However, new categories are emerging, such as startups like Steamship, offering end-to-end hosting for LLM apps, and companies like Anyscale and Modal, which allow developers to host models and Python code in a single location.
As we look forward to the future of software development with LLMs, it's clear that while the field is still maturing, the possibilities are vast and exciting. We, like many developers, are keenly anticipating what will come next.