Jun Wu Former Contributor
COGNITIVE WORLD Contributor Group
About 20 months ago, Manu Sharma and a couple of friends started a company called Labelbox and launched it on Reddit. Yeah, Reddit. Today, they’re announcing a $25 million Series B investment led by a16z. Yeah, Andreessen Horowitz.
That’s quite a sprint.
Sharma and his friends, Daniel Rasmuson and Brian Rieger, all saw the same problem in their respective jobs: companies adopting AI were wasting time building the tools to build solutions, rather than just building solutions. Boring, maybe, but frustrating for anyone who has faced it.
So, they began working nights and weekends to create a simple collaboration tool for teams managing data to train machine-learning models. To their delight, it was an instant hit and they were soon adding features requested by users.
And then it was like, “We’ve got customers and we’ve got to go scale-out infrastructure for a billion labeled assets because we have a production pipeline for data from 500,000 trucks,” Sharma recalls.
That simple tool quickly evolved into the comprehensive training data platform that it is today.
But something else has happened. As machine-learning adoption accelerated in the economy, the center of gravity shifted toward Labelbox’s end of the AI production process. Algorithms are cheap or free and the cloud has made computing power ubiquitous and affordable. Suddenly, labeled data became the true intellectual property of the AI age.
The current “AI revolution” is a supervised learning revolution. Supervised learning is a way of teaching computers to recognize patterns and make decisions based on those patterns, often faster and more accurately than humans.
But to teach computers to recognize patterns, data scientists first have to train them with thousands, sometimes millions of examples. Those examples need to be labeled: this is a dog, this is a tumor, this is a weed. High-quality labeled data is the key to successful training, accurate models and reliable decisions. As supervised learning sweeps through the global economy, the demand for labeled data is going supernova.
There are two parts to creating that IP for the AI age. There's the human labeling service and there's the platform.
Labelbox has third-party labeling services attached to the platform, but Labelbox is adamant that they are a software company, not a service company. They built a platform with form following function and their focus is on making the platform as innovative and user-friendly as possible.
The platform not only facilitates collaboration but rework, quality assurance, model evaluation, audit trails and model-assisted labeling. While they started with computer vision, they can handle all forms of data and are expanding their video labeling capabilities. The platform also helps with billing and time management.
“Labelbox is an integrated solution for data science teams to not only create the training data but also to manage it in one place. It’s the foundational infrastructure for customers to build their machine learning pipeline,” says Sharma.
All of this frees data science teams to concentrate on what they should be concentrating on - building and deploying models.
Other companies focus on the service: You turn your data over to these companies, they process it on their platform and they return the data labeled. But you are, in effect, allowing an outsource vendor to build your IP – your most valuable AI asset - and they retain a copy of it.
With Labelbox, you work with a labeling services company on one side and your data engineers on the other, so you maintain visibility and control while building that IP. More than a matter of emphasis, it’s a matter of philosophy.
By controlling the labeling process, data science teams understand intimately the strengths of their data and can identify and fix weaknesses in real-time, rather than the inevitable, time-consuming back-and-forth of working with a black box service.
Now, many people see data-centered programming as the future of computing, where data shapes applications and not the other way around, where data differentiates systems rather than systems determining the data. The world needs to evolve to adapt to this new software development life cycle. Sharma says its goal is to be at the center of that transformation, “over time as models improve, our customers use Labelbox as a platform to understand how the models are operating.”
Labeled data is already impacting the world, like identifying someone who has a punctured lung six hours early so that they can survive, or using 90% less herbicide on farms because tractors can see and spray precisely.
The way Sharma sees it, value is created the moment a precision sprayer sprays a weed or a tumor is detected or an excited shopper opens a box of algorithmically chosen clothing that has been delivered to their door. Labelbox created the infrastructure to get to that moment of value creation. It wouldn’t happen without a data science team or a platform like Labelbox.
Sharma and his friends raised $3.9 million in 2018 to help grow the business and another $10 million last year. Then, late last year, the legendary Andreessen Horowitz investor, Peter Levine, contacted them. That’s how the $25 million Series B funding round came about with some of the greatest VCs in Silicon Valley: a16z, Gradient Ventures, Kleiner Perkins, First Round Capital and Sumon Sadhu. Not bad for a bunch of guys who only wanted to solve a problem.
Read more here.