Python Sprints - pandas visualisation vol.2

pandas visualisation vol.2

Let’s continue working on visualisation in pandas.

Plotting in pandas is very easy, mainly by using Series.plot() and DataFrame.plot() methods. There is a plotting subsystem in pandas based in matplotlib that implement different types of plots (e.g. lines, bars, boxplots, kde…).

In the last months, several new projects have been create to address new use cases for visualising in pandas. For example, to generate interactive plots. Some of these projects are:

https://hvplot.pyviz.org
https://github.com/PatrikHlobil/Pandas-Bokeh
https://github.com/altair-viz/pdvega/

Those libraries have been monkey patching pandas to be able to plot easily, so plotting can be done by using DataFrame.hvplot(), DataFrame.plot_bokeh()…

But a better design would be to decouple the existing plotting code in pandas into an extension registered with the pandas extension capabilities, and be able to select with an option with plotting backend the user wants to use. The resulting code would be:

pandas.set_option('plotting.backend', 'hvplot')
df.plot()

With this architecture, there are several advantages:

Developing new plotting backends for pandas becomes much simpler
Plotting backends share a common API
Users of pandas don’t need to learn a new syntax for each plugin
Migrating existing code becomes trivial, by just adding a single line of code setting the option for the backend
Internal pandas code becomes cleaner, with the plotting code fully decoupled from the rest

Work on this is already going on, with a first PR that decoupled the current plotting code:

https://github.com/pandas-dev/pandas/pull/26414

In this sprint we will continue the work, by working on different tasks:

Defining and documenting the pandas plotting API
Porting hvplot to the new plotting API
Improve pandas plotting documentation
Try to move plots like andrews_curves, lag_plot… to the “standard” .plot(kind=X) API

We also can work on other pandas issues, for example:

pytest-azurepipelines: https://github.com/pandas-dev/pandas/issues/26601
Disable codecov commenting: https://github.com/pandas-dev/pandas/issues/26896
Implement web service to write GitHub comments with the CI results https://github.com/pandas-dev/pandas/issues/26930
Validate title capitalisation in the docs https://github.com/pandas-dev/pandas/issues/26941
Fix Azure pipelines problem with $PATH https://github.com/pandas-dev/pandas/issues/26962
Fix pipe example in the documentation https://github.com/pandas-dev/pandas/issues/27054
Bug in the validation of docstrings https://github.com/pandas-dev/pandas/issues/27055
Feel free to propose yours

As usual, we will give priority to join the sprint to the next people:

Experienced open source contributors
People from underrepresented minorities in our sprints

Agenda

6:00pm: Food and networking
6:30pm: Presentation of the project and the sponsor
6:45pm: Coding

The day of the sprint

Bring your own laptop if you can
Join the Gitter channel of the sprint

Code of Conduct

Please be reminded that all participants are expected to follow the NumFOCUS Code of Conduct

Thanks to Ecom Recruitment for hosting this event!

ECOM is the digital talent brand of InterQuest Group. We have won the award for being LinkedIn’ most socially engaged staffing agency in 2016, 2017 and 2018. We operate at the forefront of digital transformation. We recruit the in-demand, technical skills that are shaping the new digital economy across analytics, information security, digital, information technology, networks & change management. We look to partner with our clients to bring optimized solutions to help them create opportunities for talented individuals across the verticals we serve.

Information

RSVP: Click here

Level: All

Date: 26 June 2019

Time: 18:00

Address:
Ecom Recruitment
Cannon Green, 27 Bush Lane, London, United Kingdom, EC4R 0AA