Pinecone, a serverless vector database for machine learning, leaves stealth with $10M funding


Pinecone, a new startup from the people who helped launch Amazon SageMaker, has created a vector database that generates data in a specialized format to build faster machine learning applications, something previously only accessible to computers, and Larger organizations. Today, the company came out of caution with a new product and announced an initial investment of $10 million led by Wing Venture Capital.

There is a management layer to track all of this and manage data transfer between source locations
There is a management layer to track all of this and manage data transfer between source locations
Key Facts
  1. 1

    Contains all the data structures and algorithms that allow them to index large amounts of high-dimensional vector data

  2. 2

    Converts data into the machine learning format

  3. 3

    Pinecone is created to make technology available to any business

  4. 4

    Vectors are ubiquitous in machine learning


Edo Liberty, the company's co-founder, says he founded the company out of this fundamental belief that the industry was being held back by the lack of broader access to this type of database.

"The data that a machine learning model expects is not a JSON record, it is a high-dimensional vector that is a list of characteristics or what is called embedding, which is a numerical representation of the elements or objects of the world. This format is much more semantically rich and actionable for machine learning," he explained.

He says this is a concept widely understood by data scientists and supported by research. Still, until now, only the most extensive and technically superior companies like Google or Pinterest could take advantage of this difference.

Liberty and his team created Pinecone to make this kind of technology available to any business.

The startup spent the last few years building the solution, which consists of three main components, the main piece is a vector engine to convert the data into this ingestible machine learning format.

Liberty says that this is the piece of technology that contains all the data structures and algorithms that allow them to index substantial amounts of high-dimensional vector data and search through it efficiently and accurately.

The second is a cloud-hosted system to apply all of that converted data to the machine learning model while handling things like index lookups and pre and post-processing - everything a data science team needs to run a machine learning project scale, with very high workloads and throughputs.

There is a management layer to track all of this and manage data transfer between source locations.

A classic example Liberty uses is an e-commerce recommendation engine. While this has been a standard part of online sales for years, he believes that using a vectorized data approach will give much more accurate recommendations. He says that data science research data confirms this.

"It used to be that implementing something like a recommendation engine was actually incredibly complex and if you have access to a production-grade database, 90% of the difficulty and heavy lifting in creating those solutions disappear, and that is why we are building this. We believe it is the new standard," he said.

Finally, Pinecone has its language and supports the type of CRUD operations typical of databases.

However, it doesn't use SQL-clone typical of other forms of databases. How then do you get documents created after a particular data that has a type of keyword?

Get similar stories in your inbox weekly, for free

Is this news interesting? Share it with your followers

Latest stories

200 Million Certificates in 24 Hours

Let's Encrypt has been providing free Certificate Authority (CA) for websites in need of them …

Gatling VS K6

Gatling and K6 are performance load testing tools, and they are both open source, easy …

Red Hat Ansible Platform 1 vs 2; What’s the Difference?

Red Hat Ansible is a platform used by enterprises to manage, unify and execute infrastructure …

Domino Data Labs Raised $100 Million in the Latest Funding Round

Culled from the news released by Domino Data labs on funding and the company's progress …

New Release: The Microsoft Azure Purview Is Now Available on General Availability

News report detailing the announcement of the release of Azure purview on GA

Google Introduces Online Training Program to Improve Cloud Skills

Google addresses existing cloud personnel deficiency with training programs.