dask.org
Open in
urlscan Pro
2606:4700:3033::6815:2751
Public Scan
Submitted URL: http://dask.org/
Effective URL: https://dask.org/
Submission: On January 19 via api from SG — Scanned from DE
Effective URL: https://dask.org/
Submission: On January 19 via api from SG — Scanned from DE
Form analysis
0 forms found in the DOMText Content
* Why Dask? * Documentation Overview Arrays Dataframes Machine Learning Custom Applications Real Time * Install * Deploy * Tutorial * Community Ask for Help Github Discourse Help Forum Stack Overflow Twitter Developer Blog Dask natively scales Python Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love Learn More Try Now » INTEGRATES WITH EXISTING PROJECTS BUILT WITH THE BROADER COMMUNITY Dask is open source and freely available. It is developed in coordination with other community projects like NumPy, pandas, and scikit-learn. NUMPY Dask arrays scale NumPy workflows, enabling multi-dimensional data analysis in earth science, satellite imagery, genomics, biomedical applications, and machine learning algorithms. Learn More » Try Now » PANDAS Dask dataframes scale pandas workflows, enabling applications in time series, business intelligence, and general data munging on big data. Learn More » Try Now » SCIKIT-LEARN Dask-ML scales machine learning APIs like scikit-learn and XGBoost to enable scalable training and prediction on large models and large datasets. Learn More » Try Now » -------------------------------------------------------------------------------- FAMILIAR FOR PYTHON USERS AND EASY TO GET STARTED Dask uses existing Python APIs and data structures to make it easy to switch between NumPy, pandas, scikit-learn to their Dask-powered equivalents. You don't have to completely rewrite your code or retrain to scale up. Learn About Dask APIs » # Arrays implement the NumPy API import dask.array as da x = da.random.random(size=(10000, 10000), chunks=(1000, 1000)) x + x.T - x.mean(axis=0) # Dataframes implement the pandas API import dask.dataframe as dd df = dd.read_csv('s3://.../2018-*-*.csv') df.groupby(df.account_id).balance.sum() # Dask-ML implements the scikit-learn API from dask_ml.linear_model \ import LogisticRegression lr = LogisticRegression() lr.fit(train, test) -------------------------------------------------------------------------------- SCALE UP TO CLUSTERS OR JUST USE IT ON YOUR LAPTOP Dask's schedulers scale to thousand-node clusters and its algorithms have been tested on some of the largest supercomputers in the world. But you don't need a massive cluster to get started. Dask ships with schedulers designed for use on personal machines. Many people use Dask today to scale computations on their laptop, using multiple cores for computation and their disk for excess storage. Learn About Dask Schedulers » -------------------------------------------------------------------------------- CUSTOMIZABLE ENABLING YOU TO PARALLELIZE INTERNAL SYSTEMS Not all computations fit into a big dataframe. Dask exposes lower-level APIs letting you build custom systems for in-house applications. This helps open source leaders parallelize their own packages and helps business leaders scale custom business logic. Learn More » Try Now » -------------------------------------------------------------------------------- POWERED BY DASK These software projects are well-integrated with Dask, or use Dask to power components of their infrastructure. PANDAS Tabular data analysis NUMPY Array and numerical computing SCIKIT-LEARN Machine learning in Python SCIKIT-IMAGE A collection of algorithms for image processing in Python XGBOOST Gradient boosted trees for machine learning XGBoost can use Dask to bootstrap itself for distributed training RAPIDS GPU Accelerated libraries for data science XARRAY Brings the labeled data power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures IRIS A Python library for analysing and visualising Earth science data PANGEO A community effort for big data geoscience in the cloud PREFECT A workflow management system, designed for modern infrastructure NAPARI Multi-dimensional image viewer for Python SNORKEL Programmatically build training data for machine learning DATASHADER Visualization packages for large data INTAKE A lightweight package for finding, investigating, loading and disseminating data TPOT A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming MDANALYSIS A Python toolkit to analyze molecular dynamics trajectories generated by a wide range of popular simulation packages STUMPY A Python library that can be used for a variety of time series data mining tasks FEATURETOOLS A Python framework for automated feature engineering CESIUM-ML Open-Source machine learning for time series analysis SKYPORTAL An astronomical data platform CONDA FORGE Community effort to build and maintain Conda packages DATAPREP Data preparation in Python LIGHTGBM Gradient boosted trees for machine learning LightGBM can use Dask to bootstrap itself for distributed training XARRAY-SPATIAL Geospatial raster analysis in Python; extensible with Numba, scalable with Dask KARTHOTEK Manage tabular data in a blob store SATPY Library for reading and manipulating meteorological remote sensing data and writing it to various image and data file formats STREAMZ A package to help build pipelines to manage continuous streams of data SCIKIT-ALLEL Provides utilities for exploratory analysis of large scale genetic variation data TSFRESH Automatic extraction of relevant features from time series -------------------------------------------------------------------------------- SUPPORTED BY We thank these institutions for generously supporting the project -------------------------------------------------------------------------------- Dask is a fiscally sponsored project of NumFOCUS, a nonprofit dedicated to supporting the open source scientific computing community. If you like Dask and want to support our mission, please consider making a donation to support our efforts. © 2019 Dask core developers. New-BSD Licensed.