Bumblebee Docs
  • Hi Bumblebee!
  • Install Bumblebee
    • Install via Docker
    • Build and Install From Source
  • Quick start
    • Setting up your first workspace
  • Bumblebee interface
    • Loading a Dataset
    • Saving a Dataset
    • Profile
    • Table
    • Columns
  • Transformations
    • Join dataframes
    • Rows functions
      • Sort rows
      • Filter rows
      • Drop empty rows
      • Drop duplicates
    • Column managing functions
      • Set
      • Rename
      • Duplicate
      • Keep
      • Drop
      • Nest
      • Unnest
    • Transformation functions
      • Fill null values
      • Replace
      • String functions
        • Lower case
        • Upper case
        • Proper case
        • Remove accents
        • Remove special chars
        • Normalize white spaces
        • Left (substring)
        • Right (substring)
        • Mid (substring)
      • Math functions
        • Absolute value
        • Round
        • Floor
        • Ceil
        • Modulo
        • Logarithm
        • Natural logarithm
        • Power
        • Square root
      • Trigonometric functions
        • Sine
        • Cosine
        • Tangent
        • Inverse Sine
        • Inverse Cosine
        • Inverse Tangent
        • Hyperbolic Sine
        • Hyperbolic Cosine
        • Hyperbolic Tangent
        • Inverse Hyperbolic Sine
        • Inverse Hyperbolic Cosine
        • Inverse Hyperbolic Tangent
      • Time and Date
        • Transform format
        • Year
        • Year (short)
        • Month name
        • Month name (short)
        • Month as a number
        • Day of month
        • Weekday
        • Weekday (short)
        • Weekday as a number
        • Minute
        • Hour (00-23)
        • AM/PM
        • UTC offset
        • Timezone
        • Day number of year
        • Weekday of year (Mon as 1st)
        • Weekday of year (Sun as 1st)
      • Web related functions
        • Domain
        • Subdomain
        • Url scheme
        • Port
        • Url path
        • Url params
        • Email domain
        • Email username
        • Strip HTML
      • Machine Learning
        • Random sampling
  • Help
    • Bigger than memory data
    • Which engine to use
Powered by GitBook
On this page

Was this helpful?

  1. Help

Which engine to use

Bumblebee support many engines with has specific features that can help you to process your data faster. Below is a table of features available in every engine, and a list of steps to select the engine that can help to process your data easily.

Engine

Out-of-Core

Cluster Support

CPU/GPU

Pandas

No

No

CPU

Dask

Yes

Yes

CPU

cuDF

No

No

GPU

Dask-cuDF

Yes

Yes

GPU

Spark

No

Yes

CPU/GPU

Vaex

Yes

No

CPU

Ibis

Yes

No

CPU

Follow this steps to select the engine:

  • Use pandas if your data fit comfortably in your local memory.

  • Use cuDF if you have a GPU compatible with RAPIDS, and your data fits in memory.

  • Use Vaex if your data do not fit in memory.

  • Use a Dask/Dask-cuDF/Spark Cluster if you have one available.

  • Use a service like Coiled to get a Dask/Dask-cuDF cluster on demand and pay for what you use.

PreviousBigger than memory data

Last updated 4 years ago

Was this helpful?