Polars Essentials
This comprehensive course is designed for data professionals and enthusiasts looking to master the Polars library, a high-performance data manipulation and analysis tool in Python. Polars Essentials is structured into two, two-hour segments, each covering critical aspects of working with data in Polars.
Prerequisites
Participants should have intermediate-level Python programming skills and prior experience using standard Python tools for data analysis (notably NumPy, Pandas, Scikit-Learn, Jupyter).
Learning Objectives
Data Structures and Creation
- Understand the core data structures in Polars: Learn the characteristics and uses of DataFrames and Series, the primary data structures in Polars, for efficient columnar data manipulation.
- Create Polars DataFrames: Master various methods to instantiate DataFrames using Python data structures and by loading data from different file formats such as CSV, JSON, Parquet, as well as streaming data sources.
Data Manipulation and High-Performance Analysis
- Manipulate data using Polars: Develop the ability to perform essential DataFrame operations, including filtering, aggregation, sorting, and groupby, to manipulate and analyze data effectively.
- Handle different data types and missing data: Gain proficiency in managing various data types (Numerical, String, Categorical, List, Struct, and Object) and learn strategies for addressing missing data (nan vs. null).
- Optimize data analysis with advanced features: Learn how to use Polars’ high-performance features, such as lazy evaluation, multithreading, and streaming data processing, to enhance analysis efficiency.
- Apply statistical and window functions: Acquire the skills to perform advanced data analysis using Polars’ built-in statistical and window functions for insightful data exploration.
- Debug and profile Polars queries: Understand the tools and techniques for debugging and profiling Polars queries to optimize performance and troubleshoot issues.
Time Series Analysis
- Work with datetime operations in Polars: Become proficient in handling datetime data, including creation, parsing, and performing operations like comparisons and extractions, within Polars.
- Conduct advanced time series analysis: Learn how to apply time series-specific operations such as resampling, rolling-window calculations, and cross-time-zone comparisons to analyze temporal data effectively.
Interoperability and Application
- Integrate Polars with other Python libraries: Understand how to seamlessly use Polars in conjunction with other popular data science libraries such as Pandas, Arrow, Numba, and plotting libraries like Matplotlib, to enhance data analysis workflows.
By participating in this course, participants will be well-equipped to harness the power of Polars for a wide range of data analysis and manipulation tasks, making them valuable assets in any data-driven organization or project.
Instructor Bio
Marco Gorelli is a core dev of pandas and Polars and works at Quansight Labs as Senior Software Engineer. He also consults and trains clients professionally on Polars. He has also written the first Polars Plugins Tutorial and has taught Polars Plugins to clients.He has a background in Mathematics and holds an MSc from the University of Oxford, and was one of the prize winners in the M6 Forecasting Competition (2nd place overall Q1).