Susanna Wong - DevFest Venezia 2024

WWWAI: Constructing a truly open GenAI app with WebAssembly, WebGPU, and WebAI

While AI is a hot topic, building with AI for the web still feels distant for many developers due to limited control over LLM APIs and high costs. What if we could democratize AI development for the web? Enter Wasm, WebGPU, and WebAI—key concepts enabling the use of open AI models with WebAssembly and WebGPU to create sophisticated web applications and locally hosted LLMs directly in the browser.

As a seasoned web developer working in the MLOps space, I will introduce and explore these latest web technologies, demonstrating how they allow developers to maintain control over their AI infrastructure and costs while staying within the JavaScript ecosystem. By leveraging open-source models rather than proprietary cloud services, we can achieve greater customization, data privacy, and cost management.

The talk will delve into system design and architectural strategies to efficiently run AI models in the browser using WebAssembly and WebGPU, reducing the need for expensive server-side processing. Through practical use cases and demos, I will illustrate how this approach enables the productionization of AI-powered web apps with lower operational costs, reduced latency, and enhanced user privacy.

Additionally, I will explore strategies to optimize web performance when integrating these open models. This includes:

WebNN API: I will do a deep dive into this cutting edge web API to take advantage of the best available hardware and software optimisations for each platform and device to run our LLMs.

Parallel Processing with WebGPU: Leveraging the parallel processing capabilities of WebGPU to accelerate model inference and other computationally intensive tasks.

Lazy Loading and Code Splitting: Implementing lazy loading and code splitting to load only necessary components and models, reducing initial load times and improving user experience.

Performance Profiling and Optimization: Using tools and methods to profile and optimize the performance of web applications, identifying bottlenecks and implementing solutions to enhance responsiveness and speed.

Progressive Enhancement: Ensuring that AI functionalities enhance the user experience progressively, maintaining usability even on less powerful devices or under suboptimal network conditions.

Through these strategies, I will show how we can build robust, high-performance web applications that leverage the power of open AI models. Join me to discover how Wasm and open models can revolutionize AI-powered web development, enabling us to take full control of our GenAI applications and elevate our capabilities in the MLOps space.

Speaker Bio:

Susanna is a Staff engineer in QuantumBlack, working in the realms of GenAI and MLOps tooling.
She has been a software developer with extensive experience in software architecture across multiple industries ( CNN, Toyota Connected, McKinsey).
She is a Google Developer Expert in web technologies, and has spoken at multiple conferences worldwide, including Code Mesh 2017, Full Stack Fest 2018,Full Stack Europe 2019 and React Miami 2022. She currently sits on the Google Developer Advisory Board.