Custom Data Workflows

Learn how to manage and upload custom datasets to Alfa.

Overview

Custom datasets allow you to upload and manage your own time-series data for use with Alfa. This guide provides step-by-step instructions for creating, managing, and uploading data to your custom datasets.

Any data you upload is shared with your team.

Currently, we only support macroeconomic data (i.e. data that is not tied to any specific stock.)

Managing Datasets

Creating a Dataset

To create a new dataset, you’ll need to provide a name. The system will return a unique dataset ID that you’ll use for future operations.

1

Make the API Request

Send a PUT request to create a new dataset:

create_dataset.py
1import requests
2
3BASE_URL = "https://alfa.boosted.ai/client"
4API_KEY = "YOUR_API_KEY_HERE"
5
6response = requests.put(
7 f"{BASE_URL}/v2/datasets/create-macro-dataset",
8 headers={"x-api-key": API_KEY},
9 json={"dataset_name": "My Custom Dataset"}
10)
11
12dataset_id = response.json()["dataset_id"]
2

Handle the Response

The API will return a response containing:

  • dataset_id: A unique identifier for your new dataset
  • Store this ID as you’ll need it for future operations

Fetching Dataset Details

You can retrieve detailed information about a specific dataset, including its metadata and features.

1

Make the API Request

Send a GET request with your dataset ID:

get_dataset_info.py
1import request
2
3BASE_URL = "https://alfa.boosted.ai/client"
4API_KEY = "YOUR_API_KEY_HERE"
5
6
7response = requests.get(
8 f"{BASE_URL}/v2/datasets/{dataset_id}/get-info",
9 headers={"x-api-key": API_KEY}
10)
11
12dataset_info = response.json()
2

Understand the Response

The response includes:

  • dataset_info: Basic metadata about the dataset
    • dataset_id: Unique identifier
    • dataset_name: Name you provided
    • dataset_type: Type of dataset (GLOBAL or STOCK)
    • data_start: Earliest data point
    • data_end: Latest data point
    • created_time: When the dataset was created
    • last_updated: Last modification time
    • approx_records: Approximate number of records
    • status: Current status (READY, INGESTING, ERROR, UNKNOWN)
    • owner_info: Information about the owner of the dataset and your access level
  • custom_features: List of features/columns in the dataset
  • upload_info: Information about any ongoing uploads

Listing All Datasets

You can view all your custom datasets, sorted by last update time.

1

Make the API Request

Send a GET request to list all datasets:

list_datasets.py
1import requests
2
3BASE_URL = "https://alfa.boosted.ai/client"
4API_KEY = "YOUR_API_KEY_HERE"
5
6response = requests.get(
7 f"{BASE_URL}/v2/datasets/list-datasets",
8 headers={"x-api-key": API_KEY}
9 params={"limit": 100}
10)
11
12datasets = response.json()["dataset_listing"]
2

Understand the Response

The response contains a list of datasets, each with:

  • dataset_id: Unique identifier
  • dataset_name: Dataset name
  • dataset_type: Type of dataset
  • data_start: Earliest data point
  • data_end: Latest data point
  • created_time: Creation timestamp
  • last_updated: Last modification timestamp
  • approx_records: Approximate record count
  • status: Current dataset status

Deleting a Dataset

You can permanently delete a dataset when it’s no longer needed.

Once a dataset is deleted, it cannot be recovered and existing workplans using the data will no longer function.

1

Make the API Request

Send a DELETE request with your dataset ID:

delete_dataset.py
1import requests
2
3BASE_URL = "https://alfa.boosted.ai/client"
4API_KEY = "YOUR_API_KEY_HERE"
5
6response = requests.delete(
7 f"{BASE_URL}/v2/datasets/{dataset_id}",
8 headers={"x-api-key": API_KEY}
9)
2

Handle the Response

  • The API will return a success response if the deletion was successful

Uploading Data into an Existing Dataset

Uploading CSV Data

You can upload time-series data to your dataset using a CSV file.

1

Prepare Your CSV File

Ensure your CSV file has:

  • A date column for time-series data in YYYY-MM-DD form
  • One or more feature columns with numeric values
  • Consistent date formatting
2

Make the API Request

Send a POST request with your CSV file:

your_data.csv
1date,value1,value2,value3
22024-01-01,1.0,2.0,3.0
32024-01-02,1.7,2.2,3.3
42024-01-03,1.4,2.1,3.9
upload_data.py
1import requests
2
3BASE_URL = "https://alfa.boosted.ai/client"
4API_KEY = "YOUR_API_KEY_HERE"
5
6with open("your_data.csv", "rb") as f:
7 files = {"file": f}
8 data = {
9 "date_column": "date",
10 "feature_columns": "value1,value2,value3"
11 }
12
13 response = requests.post(
14 f"{BASE_URL}/v2/datasets/{dataset_id}/upload-macro-data",
15 headers={"x-api-key": API_KEY},
16 files=files,
17 data=data
18 )
19
20upload_info = response.json()
3

Handle the Response

The API returns:

  • dataset_id: The ID of the dataset receiving the upload
  • upload_id: A unique ID to track the upload progress
  • Store the upload_id to check the upload status

Checking Upload Status

You can monitor the progress of your data upload.

1

Make the API Request

Send a GET request with your dataset ID and upload ID:

check_upload_status.py
1import requests
2
3BASE_URL = "https://alfa.boosted.ai/client"
4API_KEY = "YOUR_API_KEY_HERE"
5
6response = requests.get(
7 f"{BASE_URL}/v2/datasets/{dataset_id}/get-info",
8 headers={"x-api-key": API_KEY},
9 params={"upload_id": upload_id}
10)
11
12upload_status = response.json()["upload_info"]
2

Understand the Response

The response includes:

  • upload_id: The ID of the upload
  • warnings: Any warnings during processing
  • errors: Any errors that occurred
  • successful_rows: Number of rows successfully processed
  • total_rows: Total number of rows in the upload
  • upload_status: Current status (PROCESSING, ABORTED, SUCCESS, WARNING, ERROR)
3

Monitor Progress

  • Poll the status endpoint until upload_status is not “PROCESSING”
  • Check warnings and errors for any issues

The upload process may take some time depending on the size of your CSV file. It’s recommended to implement a polling mechanism to check the status periodically.

When uploading data, ensure your CSV file is properly formatted and contains all required columns. This will help avoid processing errors and warnings.

Using Your Data

You can use this like any other variable in Alfa now! Use the @ typeahead or # selector button to choose the variable for inclusion in your workplans.