scikit-learn 2023 In-person Developer Sprint in Paris, France

In June of 2023 (week of June 19 to 23), the scikit-learn team held its first developers sprint since 2019! The sprint took place in Paris, France at the Dataiku office. The sprint event was an in-person event only and had 33 participants.

The following scikit-learn team members joined the sprint:

  1. Gael Varoquaux (@GaelVaroquaux)
  2. Olivier Grisel (@ogrisel)
  3. Guillaume Lemaitre (@glemaitre)
  4. Tim Head (@betatim)
  5. Jérémie du Boisberranger (@jeremiedbb)
  6. Frank Charras (@fcharras)
  7. Maren Westermann
  8. Arturo Amor Quiroz
  9. Adrin Jalali
  10. Loïc Estève
  11. Thomas Fan
  12. Julien Jerphanion (@jjerphan)
  13. François Goupil (@francoisgoupil)
  14. Roman Yurchak
  15. Joris Van den Bossche

The following community members joined the sprint:

  1. Ines (intern at Dataiku)
  2. Chaine San Buenaventura
  3. Yann Lechelle
  4. Vincent Maladière
  5. Franck Charras
  6. Jovan Stojanovic
  7. Leo Dreyfus-Schmidt
  8. Alexandre Landeau
  9. Louis Fouquet
  10. Alexandre Vigny
  11. Harizo Rajaona
  12. Samuel Ronsin
  13. Riccardo Cappuzzo
  14. Camille Troillard
  15. Denis Engemann
  16. Léo Grinsztajn
  17. Lilian Boulard
group of people who participated in the sprint
scikit-learn Developer Sprint, Paris, June 2023; Photo credit: Copyright: Inria / Photo B. Fourrier, June 2023; (from left to right, back to front):
Last Row: Denis Engemann, Riccardo Cappuzzo, François Goupil, Tim Head, Guillaume Lemaitre, Louis Fouquet, Jérémie du Boisberranger, Frank Charras, Léo Grinsztajn, Arturo Amor Quiroz.
Middle Row: Thomas Fan, Lilian Boulard, Gaël Varoquaux, Ines, Jovan Stojanovic, Chaine San Buenaventura.
First Row: Olivier Grisel, Harizo Rajaona, Vincent Maladière.

Sponsors

  • Dataiku provided the space and some of the food, as well as all of the coffee.
  • The scikit-learn consortium organized the sprint, paid for the lunch, the travel and accommodation expenses.

Topics covered at the sprint

  • PR #13649: Monotonic constraints for Tree-based models
  • Discussed the vision/future directions for the project. What is important to keep the project relevant in the future.
  • Should we share some points beyond the vision statement?
  • Thomas F will try and create a vision statement
  • Discussed what people are keeping an eye on with a two year time scale in mind in terms of technology and developments that are relevant.
  • Tim: keep improving our documentation (not just expanding it but also “gardening” to keep it readable)
  • Tim: increase active outreach and communication about new features/improvements and other changes. A lot of cool things in scikit-learn are virtually unknown to the wider public (e.g. Hist grad boosting being on par with lightgbm in terms of performance, …)

What is next?

We are discussing co-locating with OpenML in 2024 in Berlin to organize another developers’ sprint.

group of people who participated in the sprint
scikit-learn Developer Sprint, Paris, June 2023; Photo credit: Copyright Inria / Photo B. Fourrier, June 2023; (from left to right): Thomas Fan, Olivier Grisel

Related Articles

Interfaces for Explaining Transformer Language Models

Interfaces for exploring transformer language models by looking at input saliency and neuron activation. Explorable #1: Input saliency of a list of countries generated by a language model Tap or hover over the output tokens: Explorable #2: Neuron activation analysis reveals four groups of neurons, each is associated with generating a certain type of token Tap or hover over the sparklines on the left to isolate a certain factor: The Transformer architecture has been powering a number of the recent advances in NLP. A breakdown of this architecture is provided here . Pre-trained language models based on the architecture, in both its auto-regressive (models that use their own output as input to next time-steps and that process tokens from left-to-right, like GPT2) and denoising (models trained by corrupting/masking the input and that process tokens bidirectionally, like BERT) variants continue to push the envelope in various tasks in NLP and, more recently, in computer vision. Our understanding of why these models work so well, however, still lags behind these developments. This exposition series continues the pursuit to interpret and visualize the inner-workings of transformer-based language models. We illustrate how some key interpretability methods apply to transformer-based language models. This article focuses on auto-regressive models, but these methods are applicable to other architectures and tasks as well. This is the first article in the series. In it, we present explorables and visualizations aiding the intuition of: Input Saliency methods that score input tokens importance to generating a token. Neuron Activations and how individual and groups of model neurons spike in response to inputs and to produce outputs. The next article addresses Hidden State Evolution across the layers of the model and what it may tell us about each layer’s role.

124 Artificial Intelligence and Machine Learning Technology Influencers

As of 2023, the field of Artificial Intelligence (AI) and Machine Learning (ML) has witnessed rapid growth, innovation, and adoption across various industries. Many individuals have played pivotal roles in shaping and advancing this dynamic field. These influencers have made significant contributions through their groundbreaking research, influential publications, thought leadership, and active participation in the AI/ML community. In this article, we will highlight 124 AI and ML technology influencers who have had a profound impact on the industry.