Multimodal LLMs bridge vision and language, enabling powerful business applications like image-based insights, semantic search and visual question answering.
In this talk, you’ll explore real-world use cases, such as analyzing retail store images to assess product availability, implementing multimodal search and more. We’ll cover practical implementation strategies, key challenges businesses face when deploying multimodal AI, and emerging trends—equipping you to leverage multimodal AI in your own products.
Nothing in particular, familiarity with terms such as LLMs and multimodal AI (combining text and image data) will be helpful but is not required
This presentation explores how multimodal LLMs are transforming retail analytics by extracting insights from images and enhancing semantic search. The session provides a deep dive into practical applications and challenges businesses face when implementing these models.
Introduction
Use Case: Retail Image Analysis
How Multimodal LLMs Drive These Insights
Challenges and Trends
Takeaways
This talk provides a comprehensive understanding of multimodal LLMs, offering valuable insights into their applications, challenges, and future potential.
Alexander Kowsik is the Lead for AI & Science at Predict42, where he drives innovation in business analytics using AI. He holds a Master’s degree in Data Engineering and Analytics from the Technical University of Munich (TUM). He is actively involved in research on Large Language Models, Multimodal AI, and Embeddings, advancing the application of these technologies in business contexts.