How to 10x Throughput When Serving Hugging Face Models Without a GPU
By optimising how a model is served, we serve over 100 predictions per second with a simply Python API using CPU inference
Over the past 2 years, there has been a steady increase in investment towards Machine Learning initiatives. When we started…
For many of us, it’s already a struggle to take a seemingly successful ML model live, but deployment is only…
Workflows and processes are a critical need for every machine learning and AI project. When done well, they can enable…
Deep Learning’s Emissions Problem In the summer of 2019, a group of researchers led by Emma Strubell at the University…