Type or paste any text and click "generate embedding" to convert it into a 384-dimensional vector. The vector is automatically saved to the inventory below, making it available to all tools. Try the sample button for ideas.
Paste a raw embedding (384-element JSON array) from an external source to decode it. If you're working with vectors from the encode panel, use the vocab decode tab below instead — it pulls directly from the inventory.
Compares your vector against pre-embedded concepts to find the closest matches. Use the vocab size selector (S/M/L/XL) above to control how many reference concepts are used — larger vocabularies provide finer-grained decoding. The ranked list shows where your embedding sits in semantic space. An LLM then reads the neighbors and interprets what they collectively mean.
try thisThe immune system fights off infections aboveThe army defends the country from invasion — similar structure, different domain. Compare the neighbor lists to see how the model separates "biological defense" from "military defense"Look for score gaps in the results. A tight cluster of high-scoring terms followed by a sharp drop means the embedding clearly represents that concept. An even spread of moderate scores suggests the text blends multiple themes.
An LLM plays a guessing game against the embedding model. It proposes candidate phrases, we embed them and measure cosine similarity to the target vector, then feed the scores back. Over 5 rounds, the LLM converges on what the original text said. This reveals how much information an embedding actually preserves about the original text.
try thisShe ran until her lungs burned and her legs gave outlight — the search will struggle because the embedding is a blend of multiple meanings (illumination, weight, mood). This shows how ambiguity diffuses the signal.The search reveals the information bottleneck of embeddings. Specific, concrete sentences can often be reconstructed closely. Abstract or ambiguous inputs produce embeddings that many different texts could match, making recovery harder.
Vector arithmetic lets you manipulate meaning algebraically. Subtraction removes a concept; addition adds one. This model is optimized for sentences, not single words — so sentence-level analogies work best. Single-word arithmetic (the famous king - man + woman) tends to produce scattered results because individual words create diffuse embeddings that blend multiple senses.
The king ruled the country, The man walked home, and The woman walked homeMore experiments to try:
The chef prepared a French meal - France is in Europe + Japan is in Asia — does the cuisine shift?I am overjoyed and I am devastated?king - man + woman to see how it doesn't work with this model — compare the noisy results to the sentence version aboveNot every analogy works. When it fails, that tells you something too — the model doesn't encode that particular relationship as a clean linear direction. The structure of what works and what doesn't reveals the geometry the model has learned.
Shows what semantic dimension separates two embeddings. For each vocabulary concept, it computes how much closer that concept is to vector A versus vector B. The top movers in each direction reveal what distinguishes the two inputs.
try thisThe scientist conducted the experiment carefully and The artist painted the canvas passionatelyI am happy vs I am excited. The differential reveals what the model thinks distinguishes these emotions.The cosine similarity score at the top tells you the overall relationship. Two near-synonyms might score 0.8+; two texts from different domains might be 0.1. The word lists show where they differ, not just how much.
Projects the 384-dimensional embedding space down to 2D or 3D. Each dot is a vocabulary term or one of your encoded vectors. Nearby dots have similar embeddings. Clusters show how the model organizes meaning.
try thisPCA finds the directions of maximum variance — fast and deterministic but tends to produce overlapping clouds. UMAP preserves local neighborhood structure, producing tighter, more distinct clusters at the cost of a few seconds of computation. Both methods lose information when squashing 384 dimensions to 2-3. Use the other tools for precise similarity measurements.