Bumblebee Docs
  • Hi Bumblebee!
  • Install Bumblebee
    • Install via Docker
    • Build and Install From Source
  • Quick start
    • Setting up your first workspace
  • Bumblebee interface
    • Loading a Dataset
    • Saving a Dataset
    • Profile
    • Table
    • Columns
  • Transformations
    • Join dataframes
    • Rows functions
      • Sort rows
      • Filter rows
      • Drop empty rows
      • Drop duplicates
    • Column managing functions
      • Set
      • Rename
      • Duplicate
      • Keep
      • Drop
      • Nest
      • Unnest
    • Transformation functions
      • Fill null values
      • Replace
      • String functions
        • Lower case
        • Upper case
        • Proper case
        • Remove accents
        • Remove special chars
        • Normalize white spaces
        • Left (substring)
        • Right (substring)
        • Mid (substring)
      • Math functions
        • Absolute value
        • Round
        • Floor
        • Ceil
        • Modulo
        • Logarithm
        • Natural logarithm
        • Power
        • Square root
      • Trigonometric functions
        • Sine
        • Cosine
        • Tangent
        • Inverse Sine
        • Inverse Cosine
        • Inverse Tangent
        • Hyperbolic Sine
        • Hyperbolic Cosine
        • Hyperbolic Tangent
        • Inverse Hyperbolic Sine
        • Inverse Hyperbolic Cosine
        • Inverse Hyperbolic Tangent
      • Time and Date
        • Transform format
        • Year
        • Year (short)
        • Month name
        • Month name (short)
        • Month as a number
        • Day of month
        • Weekday
        • Weekday (short)
        • Weekday as a number
        • Minute
        • Hour (00-23)
        • AM/PM
        • UTC offset
        • Timezone
        • Day number of year
        • Weekday of year (Mon as 1st)
        • Weekday of year (Sun as 1st)
      • Web related functions
        • Domain
        • Subdomain
        • Url scheme
        • Port
        • Url path
        • Url params
        • Email domain
        • Email username
        • Strip HTML
      • Machine Learning
        • Random sampling
  • Help
    • Bigger than memory data
    • Which engine to use
Powered by GitBook
On this page
  • Some features
  • Clicks & Drags-and-Drops
  • Automated Workflows
  • Load data from anywhere
  • Prerequisites

Was this helpful?

Hi Bumblebee!

NextInstall via Docker

Last updated 4 years ago

Was this helpful?

Bumblebee is the easiest, and most powerful tool to clean, transform, and prepare data of any size for Analysis, Visualization, Reporting, and Machine Learning. All in a spreadsheet-like interface.

Bumblebee can run ever multiple engines like Pandas, Dask, cuDFD, Dask-cuDF, Spark or Vaex.

The engine you decide to use will depend on the resource you have at hand.

Some features

This is what Bumblebee can offer you.

Clicks & Drags-and-Drops

Our spreadsheet like-interface makes the work easier for you to wrangle the data, correct wrong and duplicate values, and identify and group similar strings.

Automated Workflows

Create cleaning repeatability by creating a “Data Recipe”. Next time you need to clean the datasets from the same source, run your pre-saved Data Recipe, and make Bumblebee clean the data for you.

Load data from anywhere

Load data from CSV, JSON, Parquet, local, from a URL or from a remote storage system. Also, connect to data from databases and start working on your data wrangling in no time!

Prerequisites

  • OS Support

    • Ubuntu 18.04 LTS/20*

    • Windows 10

    • GPU Support (Dask-cuDF, cuDF)

    • Pascal or Better

    • Compute Capability

  • CUDA Support (Dask-cuDF, cuDF)

    • 10.2

  • Python Support

    • 3.7

    • 3.8

Absolute value on the interface