OpenActionData

GithubInstall
OpenActionData - Logo

Help teach AIs to use computers

OpenActionData is a chrome extension which asks for 2-5 screen recordings per day. Before being uploaded, they're fully anonymized - and we'll never save anything without a review from you.

public

Why crowdsource?

code
OPEN SOURCE

Fight centralization

As AI becomes more powerful and valuable, it's also becoming increasingly centralized.

OpenActionData is creating a dataset that anyone can use, putting smaller players on equal footing.

mouse
USEFUL AI

Make AI truly useful

Current AI systems can help writers write or artists draw. But to be generally useful, they need to be able to use multiple apps, and perform complex tasks.

rule_folder
DATA ENGINE

A data layer for tool use

Giant, varied datasets of natural language, code, and images have driven recent progress in AI.

In order for AI to be truly useful, it needs to use the same tools as humans do. No big datasets exist for this. We're aiming to create that for future AI systems.

insights

Numbers

people
USERS

38,432 people have installed the extension

history
SESSIONS

1,000,000+ sessions have been recorded

storage
RAW SIZE

32gb of data has been uploaded

info

OpenActionData

What is it?

OpenActionData is a chrome-extension that identifies tasks that are hard for AIs to do. When it sees you doing one, it starts recording your session and then asks for your permission to upload it to a public dataset.

Every browsing session you upload will be consentual & manually approved by you.

tips_and_updates

How it works:

#1: The extension identifies interesting browsing sessions

The extension will identify when you are doing something that is hard for AIs to do, such as:

  • Using enterprise software
  • Doing research & making conclusions
  • Using a new website

#2: It records your screen and interactions in the background

You get to review whether the recording is containing any sensitive information before we upload it to the dataset.

Nothing is uploaded without your explicit consent.

#3: You review the recording and upload it to an open source dataset

The dataset is open source, and anyone can use it to train their own AI.

The dataset is also available for download, so you can use it to train your own AI.

arrow_circle_right

The road ahead

Step 1: Train a LLM ✅

Large Language Models (LLMs) serve as a foundation, learning most human concepts & basic reasoning skills, by reading the web.

There are several open-source LLMs, such as LLaMa, BLOOM, OPT, T5, and GPT-J.

Step 2: Fine-tune on humans performing tasks ⌛️

This is where you come in! Big AI companies hire teams of contractors to create proprietary datasets, used to teach their models to use computers.

We're trying to democratize this process. With the help of the community, we can create a dataset with more scale and diversity, and for a fraction of the cost.

Step 3: Iteratively improve model with user feedback ♻️

After the model is trained, you'd use it to perform tasks. Every time it makes a mistake, you correct it, and it will iteratively improve.

This is how systems such as ChatGPT get as good as they are.

people

Team

Louis Arge

Louis is a person.