Embeddings are dense numerical vectors (arrays of floating-point numbers) that represent the semantic meaning of text, images, audio, or other data in a high-dimensional space. Similar items are placed close together in this space, allowing AI systems to measure similarity, perform semantic search, cluster related content, and build recommendation systems without relying on exact keyword matching.
An embedding model (like OpenAI's text-embedding-3-small or Cohere's embed-v3) takes an input (a sentence, paragraph, or document) and outputs a fixed-length vector, typically 256 to 3072 dimensions. The model has learned during training that semantically similar inputs should produce vectors that are close together (measured by cosine similarity or Euclidean distance). "How do I reset my password" and "I forgot my login credentials" would produce very similar vectors, even though they share almost no words.
These vectors are stored in vector databases (Pinecone, Weaviate, Qdrant, pgvector) that are optimized for similarity search. When a user makes a query, it is embedded into the same vector space, and the database returns the nearest neighbors. This is the foundation of retrieval-augmented generation (RAG), where LLMs are grounded with relevant context retrieved via embeddings.
Embeddings transformed search from keyword matching to semantic understanding. They power recommendation engines ("users who liked this also liked..."), content deduplication, anomaly detection, and the retrieval layer of modern AI applications. For developers building with large language models, embeddings are the bridge between unstructured data and structured search, enabling LLMs to work with knowledge bases far larger than their context windows.
The AI Agents article on the TerminalFeed blog discusses how agents use embeddings for memory and retrieval. The AI Agent Tracker lists agents and frameworks that rely on embedding-based search for their knowledge systems.