dask.org Open in urlscan Pro
2606:4700:3033::6815:2751  Public Scan

Submitted URL: http://dask.org/
Effective URL: https://dask.org/
Submission: On January 19 via api from SG — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

 * Why Dask?
 * Documentation
   Overview Arrays Dataframes Machine Learning Custom Applications Real Time
 * Install
 * Deploy
 * Tutorial
 * Community
   Ask for Help Github Discourse Help Forum Stack Overflow Twitter Developer
   Blog

Dask natively scales Python

Dask provides advanced parallelism for analytics, enabling performance at scale
for the tools you love

Learn More Try Now »


INTEGRATES WITH EXISTING PROJECTS


BUILT WITH THE BROADER COMMUNITY

Dask is open source and freely available. It is developed in coordination with
other community projects like NumPy, pandas, and scikit-learn.


NUMPY

Dask arrays scale NumPy workflows, enabling multi-dimensional data analysis in
earth science, satellite imagery, genomics, biomedical applications, and machine
learning algorithms.

Learn More » Try Now »


PANDAS

Dask dataframes scale pandas workflows, enabling applications in time series,
business intelligence, and general data munging on big data.

Learn More » Try Now »


SCIKIT-LEARN

Dask-ML scales machine learning APIs like scikit-learn and XGBoost to enable
scalable training and prediction on large models and large datasets.

Learn More » Try Now »

--------------------------------------------------------------------------------


FAMILIAR FOR PYTHON USERS


AND EASY TO GET STARTED

Dask uses existing Python APIs and data structures to make it easy to switch
between NumPy, pandas, scikit-learn to their Dask-powered equivalents.

You don't have to completely rewrite your code or retrain to scale up.

Learn About Dask APIs »

# Arrays implement the NumPy API
import dask.array as da
x = da.random.random(size=(10000, 10000),
                     chunks=(1000, 1000))
x + x.T - x.mean(axis=0)

# Dataframes implement the pandas API
import dask.dataframe as dd
df = dd.read_csv('s3://.../2018-*-*.csv')
df.groupby(df.account_id).balance.sum()

# Dask-ML implements the scikit-learn API
from dask_ml.linear_model \
  import LogisticRegression
lr = LogisticRegression()
lr.fit(train, test)

--------------------------------------------------------------------------------


SCALE UP TO CLUSTERS


OR JUST USE IT ON YOUR LAPTOP

Dask's schedulers scale to thousand-node clusters and its algorithms have been
tested on some of the largest supercomputers in the world.

But you don't need a massive cluster to get started. Dask ships with schedulers
designed for use on personal machines. Many people use Dask today to scale
computations on their laptop, using multiple cores for computation and their
disk for excess storage.

Learn About Dask Schedulers »


--------------------------------------------------------------------------------


CUSTOMIZABLE


ENABLING YOU TO PARALLELIZE INTERNAL SYSTEMS

Not all computations fit into a big dataframe.

Dask exposes lower-level APIs letting you build custom systems for in-house
applications. This helps open source leaders parallelize their own packages and
helps business leaders scale custom business logic.


Learn More » Try Now »

--------------------------------------------------------------------------------


POWERED BY DASK

These software projects are well-integrated with Dask, or use Dask to power
components of their infrastructure.

PANDAS

Tabular data analysis

NUMPY

Array and numerical computing

SCIKIT-LEARN

Machine learning in Python

SCIKIT-IMAGE

A collection of algorithms for image processing in Python

XGBOOST

Gradient boosted trees for machine learning

XGBoost can use Dask to bootstrap itself for distributed training

RAPIDS

GPU Accelerated libraries for data science

XARRAY

Brings the labeled data power of pandas to the physical sciences, by providing
N-dimensional variants of the core pandas data structures

IRIS

A Python library for analysing and visualising Earth science data

PANGEO

A community effort for big data geoscience in the cloud

PREFECT

A workflow management system, designed for modern infrastructure

NAPARI

Multi-dimensional image viewer for Python

SNORKEL

Programmatically build training data for machine learning

DATASHADER

Visualization packages for large data

INTAKE

A lightweight package for finding, investigating, loading and disseminating data

TPOT

A Python Automated Machine Learning tool that optimizes machine learning
pipelines using genetic programming

MDANALYSIS

A Python toolkit to analyze molecular dynamics trajectories generated by a wide
range of popular simulation packages

STUMPY

A Python library that can be used for a variety of time series data mining tasks

FEATURETOOLS

A Python framework for automated feature engineering

CESIUM-ML

Open-Source machine learning for time series analysis

SKYPORTAL

An astronomical data platform

CONDA FORGE

Community effort to build and maintain Conda packages

DATAPREP

Data preparation in Python

LIGHTGBM

Gradient boosted trees for machine learning

LightGBM can use Dask to bootstrap itself for distributed training

XARRAY-SPATIAL

Geospatial raster analysis in Python; extensible with Numba, scalable with Dask

KARTHOTEK

Manage tabular data in a blob store

SATPY

Library for reading and manipulating meteorological remote sensing data and
writing it to various image and data file formats

STREAMZ

A package to help build pipelines to manage continuous streams of data

SCIKIT-ALLEL

Provides utilities for exploratory analysis of large scale genetic variation
data

TSFRESH

Automatic extraction of relevant features from time series

--------------------------------------------------------------------------------


SUPPORTED BY

We thank these institutions for generously supporting the project






--------------------------------------------------------------------------------

Dask is a fiscally sponsored project of NumFOCUS, a nonprofit dedicated to
supporting the open source scientific computing community. If you like Dask and
want to support our mission, please consider making a donation to support our
efforts.

© 2019 Dask core developers. New-BSD Licensed.