Product Lakehouse
The Product Lakehouse is your staging area for raw product data. Upload documents, review extracted data, and promote verified products to your main catalog.
A tour of the Product Lakehouse and its key features.
What is the Product Lakehouse?
The Product Lakehouse is a staging area where raw product data lands before being promoted to your main catalog. Think of it as an inbox for your product data that allows you to:
- Upload documents in various formats (PDF, Excel, CSV)
- Review AI-extracted data for accuracy
- Verify column mappings from source files
- Clean up any parsing issues
- Promote approved products to the Catalog
This two-stage approach ensures that only verified, quality data enters your main product catalog.
The Lakehouse Workflow
Getting products from raw files to your enriched catalog follows these steps:
- Upload: Add product documents (PDFs, spreadsheets) through the Document Uploads interface
- Extract: MerchantOps AI processes your documents and extracts product data automatically
- Map: Review the column mappings to ensure source columns map correctly to product fields
- Review: Check extracted Lakehouse Products for accuracy before promotion
- Promote: Move verified products to your main Catalog for enrichment and export
Lakehouse Features
Document Uploads
Upload PDFs, Excel files, and CSVs for AI-powered data extraction.
Mapping Reviews
Review and confirm how source columns map to product fields.
Lakehouse Products
View extracted products and promote them to your catalog.
Brand Technologies
Manage brand sources and configure web scraping for enrichment.
Brand Technologies
In addition to document processing, the Lakehouse includes a Technologies section where you can manage brand sources. Technologies represent vendors or brand sources that can be used for:
- Web scraping product information from brand websites
- Linking products to their brand for enrichment
- Storing brand-specific metadata
Best Practices
Data Quality
- Review before promoting: Always check extracted data for accuracy before moving to the Catalog
- Use consistent file formats: Standardize your vendor file formats when possible
- Include product identifiers: Ensure files include SKUs or unique product keys
Workflow Efficiency
- Batch similar files: Upload files from the same source together for consistent mapping
- Save column mappings: Reuse mappings for recurring file formats
- Set up technologies first: Configure brand technologies before uploading their product data
Next Steps
Start using the Product Lakehouse:
- Upload your first document to see AI extraction in action
- Learn about mapping reviews to understand how data is mapped
- Review core concepts to understand the data model