Smol-Data: Your Autonomous Data Scientist
Jun 24, 2023
A project by Michael Equi, Yi Ding, and myself; presented at the Agents Hackathon organized by AGI House.This project presents an AI data scientist that can perform generic data analysis tasks automatically. After loading a dataset, one can send a series of prompts for data manipulation, including data visualization. In this article we show a few examples of tasks performed by our assistant.First of all, the approach consists of resolving the task in a multi-step, hierarchical process. The agent first identifies the task, then reads the dataset (columns and some rows) and its metadata, if available. It then generates and executes code in Python to achieve the task. In case there is an error, it uses the error message to correct its output.
As an illustrative example, we use the Adult Income Dataset from Kaggle to test the assistant. We start by downloading the dataset locally, which includes information like in the image below.
Adult income dataset.
Average years of education by workclass.
Distribution of workclass vs. level of education.