6 hours of instruction
Introduces participants to the Polars library for high-performance dataframe manipulation. Participants will have the opportunity to explore the Polars API for data creation/manipulation with Polars and tune data pipelines for speed.
PREREQUISITES
Participants should have prior experience using the Python language and, in particular, using standard Python tools for data analysis (notably NumPy, Pandas, Jupyter).
LEARNING OBJECTIVES
- Data creation / manipulation:
- Create / read polars dataframes
- Select / add / drop / modify columns
- Filter based on rows
- Aggregate data, include over windows and in groupbys
- Combine dataframes with joins and concatenations
- Be able to work with different data types (including categorical and nested)
- High performance:
- Write their polars code in such a way that polars is able to work efficiently
- Avoid performance footguns
- Use streaming mode to process larger-than-RAM amounts of data
- Debug queries
- Interoperate with other libraries: pandas, arrow, numba, and plotly
- Time series:
- How to parse datetimes
- Work with time zones, avoiding common pitfalls
- Use Time-series-specific methods: groupby_dynamic, groupby_rolling, upsample
Login
Accessing this course requires a login. Please enter your credentials below!