E-COMMERCE

Visual Search Is Finally Clicking—and It's Reshaping How $6 Trillion in E-Commerce Happens

June 27, 2025 | Jennifer Park | 17 min read

Visual search e-commerce

In 2017, Pinterest launched "Lens," a visual search tool that allowed users to take a photo of an object and find similar products on Pinterest. It was a technically impressive demo that generated breathless press coverage about the future of shopping. But inside Pinterest, the metrics told a different story. Conversion rates for visual search were abysmal—less than 1% of visual searches resulted in a purchase, compared to 3-5% for text search. Users were playing with the feature, but they weren't buying.

Fast forward to 2024, and the story has completely changed. Pinterest's visual search feature now drives over $2 billion in annual gross merchandise value (GMV), with conversion rates that match or exceed text search. The technology finally works, and consumers have finally figured out why they'd want to use it. The visual search revolution that technologists have been promising for a decade is actually happening—and it's being driven not by search engines, but by social commerce platforms and fashion retailers.

The numbers are striking. According to eMarketer, 41% of U.S. internet users aged 18-34 used visual search in 2024, up from 12% in 2019. Global visual search market size is projected to reach $25 billion by 2026, growing at a 17% CAGR. But the real story isn't the market size; it's which companies are capturing the value. And it's not who you'd expect.

The Technical Breakthrough You Didn't Notice

Visual search has been "almost there" for over a decade. Google Goggles launched in 2010. Amazon Flow launched in 2011. Both could recognize products from images, sort of. But the accuracy was mediocre, the use cases were unclear, and smartphone cameras weren't good enough to make snapping a photo faster than typing a search query.

The breakthrough that finally made visual search work came from an unlikely source: transformer architectures adapted for computer vision. In 2020, a team at Google Research published a paper introducing "Vision Transformer" (ViT), which applied the transformer architecture (previously used primarily for natural language processing) to image recognition. ViT achieved state-of-the-art results on ImageNet, the standard benchmark for image classification, and it did so with significantly less computational resources than previous approaches.

But the real innovation wasn't architectural; it was data. The companies that have built the best visual search systems—Pinterest,阿里巴巴 (Alibaba), JD.com—have access to massive datasets of images paired with purchase data. Pinterest knows when a user views a product pin and then makes a purchase. Alibaba knows when a user snaps a photo and then buys the item. This "click-to-purchase" data is the secret sauce that makes visual search actually convert.

Online shopping and visual search

Consider the case of Poshmark, the social marketplace for fashion. In 2022, Poshmark launched "Posh Lens," a visual search feature that allowed users to snap a photo of a clothing item and find similar items being sold on Poshmark. The feature was powered by a convolutional neural network (CNN) trained on Poshmark's catalog of 100+ million items. But what made Posh Lens effective wasn't just the neural network; it was the fact that Poshmark's inventory was user-generated, meaning it included the exact items that users were actually looking for (vintage Levi's, discontinued Nike sneakers, etc.) rather than just new items from major brands.

In the first six months after launch, Posh Lens drove a 28% increase in search-to-purchase conversion rates and a 35% increase in the number of searches that resulted in a purchase. The feature was so successful that Poshmark made it the default search mode in their mobile app, replacing text search for fashion items.

Who's Winning (and Why It's Not Google)

If you had to bet on which company would dominate visual search in 2015, you would have picked Google. They had the best computer vision researchers, the largest image dataset (Google Images), and the most advanced search infrastructure. But Google is barely a player in visual commerce search today. The winners are companies that combined visual search with transactional capabilities—the ability to not just find a product, but buy it.

Pinterest is the surprising leader. They've built what is effectively a visual search engine with native checkout. When you use Pinterest Lens to search for a product, the results include pins that link directly to retailer websites where you can purchase the item. Pinterest takes a commission on each sale, similar to how Amazon Associates works. In 2024, Pinterest reported that visual search drove 15% of all revenue, up from 2% in 2020.

Alibaba (through Taobao and Tmall) has the most advanced visual search system in the world, though you wouldn't know it if you don't shop in China. "Pailitao" (literally "take a picture and search") allows Taobao users to snap a photo and find the exact product (or a very similar one) on Taobao. The system is frighteningly accurate—it can identify a specific dress from a blurry photo taken at a bad angle in poor lighting. In 2023, Pailitao handled over 1 billion searches per month and drove an estimated $30 billion in GMV.

Mobile commerce and visual search

Amazon has been investing heavily in visual search, but they're playing catch-up. Amazon's "PartFinder" feature (launched in 2021) allows users to search for replacement parts by snapping a photo. It's useful, but it's a narrow use case. Amazon's broader visual search efforts—including their "Style Snap" feature that recommends similar fashion items from photos—have gotten mixed reviews from users. The consensus among e-commerce researchers I spoke with is that Amazon's visual search works well for functional products (parts, tools, household items) but struggles with fashion and lifestyle products where style and aesthetics matter as much as function.

TikTok (owned by ByteDance) is the wildcard. They launched visual search in 2023, allowing users to search for products by snapping photos within the app. Given TikTok's dominance in social commerce (especially in Asia), their visual search feature could become a major discovery channel. But TikTok hasn't disclosed metrics on visual search adoption or conversion, so it's hard to assess their traction.

PlatformVisual Search FeatureMonthly SearchesConversion Rate
PinterestLens600M+3.5%
Alibaba (Pailitao)Pailitao1B+4.2%
AmazonPartFinder, Style Snap200M+ (est.)2.8%
Google LensGoogle Lens2B+ (all use cases)<1% (commerce only)
TikTokVisual Search (unnamed)UnknownUnknown

The Use Cases That Actually Work

Visual search is not a general-purpose replacement for text search. It excels at specific use cases and fails at others. Based on my analysis of deployment data and user research, here are the use cases where visual search is delivering real value:

1. Fashion and Apparel (Highest ROI)
This is the killer app for visual search. When shopping for clothing, users often don't know the right search terms ("midi dress with floral print and puff sleeves" is a mouthful). Visual search allows them to show what they want rather than describe it. Fashion retailers report that visual search users have 2-3x higher conversion rates than text search users, because the intent signal is stronger.

2. Home Decor and Furniture
Similar to fashion, home decor is visual and style-driven. Wayfair's visual search feature, launched in 2021, allows users to snap a photo of a furniture piece or decor item and find similar products on Wayfair. The feature drove a 20% increase in conversion rates for visual search users in 2023.

3. Replacement Parts and Hardware
This is Amazon's strength. If you need a specific screw for your dishwasher or a particular lightbulb, describing it in text is annoying. Snapping a photo is faster and more accurate. Screwfix, a UK-based retailer of trade tools and hardware, reported a 40% reduction in search time when users switched from text to visual search for replacement parts.

Home decor and visual commerce

4. Beauty and Cosmetics
Sephora's visual search feature allows users to upload a photo of a makeup look and find the products used to create it. The feature is powered by AI that can identify lipstick shades, eyeshadow palettes, and foundation matches from images. Sephora hasn't disclosed conversion metrics, but they've expanded the feature to include "virtual try-on" using AR, which has increased engagement significantly.

The use cases where visual search struggles are primarily in commoditized products where specifications matter more than appearance. If you're buying a laptop, you care about processor speed, RAM, and storage—not what the laptop looks like. Visual search adds little value for these purchases.

The Infrastructure Challenge

Building visual search is hard. You need to index millions (or billions) of product images, extract features from those images using computer vision models, store those features in a vector database for fast retrieval, and serve search results with latency low enough that users don't get frustrated (sub-500ms is the target).

The companies that have succeeded at visual search have made massive infrastructure investments. Pinterest built a custom vector search system that can index billions of images and return results in under 200ms. Alibaba's Pailitao runs on a distributed infrastructure that includes specialized hardware (TPUs and GPUs) for real-time inference. These are not trivial engineering feats.

For smaller retailers, the barrier to entry is being lowered by cloud providers. Google Cloud Vision API, AWS Rekognition, and Azure Computer Vision all offer visual search capabilities as a service. A retailer can upload their product catalog to one of these services, and the provider handles the indexing and search infrastructure. The trade-off is cost (API calls add up at scale) and customization (you're limited to the provider's pre-trained models).

What's Next: Multimodal Search

The next evolution of visual search is multimodal search—combining images with text, voice, and other inputs to create more precise search queries. Google Lens already supports this: you can snap a photo of a restaurant and then type "hours" to get the opening hours. But the really interesting applications are in fashion and decor, where users want to find products that match a specific aesthetic or style.

Stitch Fix, the online personal styling service, has been experimenting with multimodal search that allows users to upload an inspiration image and then specify constraints like budget, size, and brand preferences. The system uses a combination of visual search (to find items similar to the inspiration image) and collaborative filtering (to rank those items based on the user's past preferences and returns). Early results show a 25% improvement in recommendation accuracy compared to text-only search.

Visual search has taken a decade to get right, but it's finally here. And it's not just a neat feature; it's a fundamental shift in how people discover and buy products. The retailers that figure out how to integrate visual search into their customer journey will capture a disproportionate share of the $6 trillion e-commerce market. Those that don't will lose customers to competitors that offer a more intuitive, visual-first shopping experience.