Reimagining data science with Python-based operators in Einblick’s visual canvas
Currently, data scientists are siloed from their teammates and relevant stakeholders. First, data scientists need to gather raw data or work with data engineers through the ETL or ELT pipeline. Then begins the process of cleaning the data and doing exploratory data analysis for the particular task or project at hand. This step can be quite time-consuming as it is iterative in nature. Only then can data scientists begin to build and tune machine learning models.
Einblick: collaborative data science at the speed of thought
At Einblick, one of our goals is to remove barriers for data scientists so you can spend less time on tedious setup and repetitive tasks, and more time extracting meaningful insights. In pursuit of our goals, Einblick reimagines the modern data science workflow in a collaborative data science canvas, rather than a linear notebook. Working in a canvas environment offers many advantages including live collaboration, an expansive visual interface, and a progressive computation engine. In this article, we’ll highlight one of the key ways we’re saving data scientists time–our operators. We’ll go through a couple of our core operators, why Python is such a crucial part of our software solution, and how we augmented our offerings with a user operator interface. The latter allows users to customize and use their own operators, which can be used in any Einblick canvas, and shared with other Einblick users.
Operators in a data science canvas
One of the main inefficiencies for data scientists now is that there are certain tasks or code snippets that get run all the time, like data exploration or feature engineering. Even though these tasks can be mundane and repetitive, they are critical to the data science workflow.
A core part of our platform are our operators. Einblick operators all capture a defined set of steps in analysis, and don't have to be arranged linearly in the canvas environment. The space then allows users to work according to how their thought process might flow naturally. A few of our core operators include:
- Python cells have traditionally been the only operator available to data scientists. Write code and reproduce your Jupyter notebook 1:1 in a browser-accessed Python runtime.
- Chart operators create different visualizations, including scatter plots, histograms, bar charts, line charts, and heat maps
- Expression operators support Python 3 syntax, take in a dataframe and add a new column based on a logical expression. We currently support many operations, including arithmetic, comparison, and bitwise operators, as well as mathematical functions
- AutoML operators build more accurate predictive models in much less time than it would take to hand-tune. You just have to select the target and feature columns from a dataframe, as well as the training and testing datasets.
Given Python's many libraries and frameworks for data science, statistics, and machine learning, such as statsmodels
and sklearn
, Python is an easy choice for modern data scientists. As such, Einblick utilizes Python in various parts of our codebase. Additionally, our Python cell operator is critical to our user experience, and connects the experience of working in a Python notebook with working in a data science canvas. We value the flexibility that Python gives to our users to not only augment Einblick's functionality, but also make the data science process more efficient and accessible.
Creating shareable user-defined operators
As we worked to make the data science process smoother and faster, we created user operators. Through an editor or by linking to a Git repository, our users can create their own Einblick operators, so they can easily and efficiently re-use their own code and processes. Given how important collaboration and communication are to the data science process, operators created by Einblick users are able to be shared easily, so that someone who might know what they want to do, such as convert a text column to a date-time column, can do so without needing to know the Python syntax to do so.
For example, our linear regression and k-means clustering operators were both created through our user operator editor. The user operator editor is accessible through our Operators Menu, which can be toggled on and off from the User Settings menu.
In celebration of the roots of data science, and as a nod to shared experiences in the data science community, we created an operator to visualize a simple linear regression model. This user operator is an extension of our linear regression operator, which is built-in, and accessible to everyone, that will allow us to create a nicely formatted linear regression graph. To illustrate the operator, we're using the iris
dataset, via the seaborn
library.
Determine operator inputs
In the Operator specification tab of our user operator editor, you can specify the following:
- Basic properties: includes dimensions of the operator in the canvas and description of the operator, which will appear on hover in the left hand panel, and in the canvas when using the operator.
- Dataframe inputs: allows you to name the dataframes that feed into the operator, accessible in the operator editor as a list
dfs
. You can index and access the dataframe inputs in the Script tab. - Attribute selection inputs: specifies what you can and cannot select from the dataframe inputs, such as the column types allowed or number of selections per field allowed. You can access these using the
attributes
variable in the Script tab. - Custom values: allows you to enter custom values such as text titles of graphs, numbers of bins in a histogram, or any other dynamic field you want to tailor for each time you use the operator
- Inner inputs: allows you to enter inputs within the operator, instead of on the left hand panel. This is mainly an aesthetic difference in the experience of using the operator in the canvas
Make sure to save your work at the bottom of the page!
Operator Package Requirements
In the Requirements tab, you can list packages that need to be installed for your operator to run. You can think of this tab as a stand-in for a requirements.txt
file in a GitHub repository.
Operator Script
The Script tab is where you can customize exactly what you want your operator to do. You can treat this as a Python script. Just like any Python script, you can start by importing relevant packages listed in your Requirements tab. You can then write your code as you normally would. You can also test out your script on any uploaded data files, while working in the editor. The built-in error messages will help you to debug your code as necessary.
For the regression visualization operator, we used Python's matplotlib
and seaborn
to create and label a scatter plot, as well as a line illustrating the best fit line based on two inputs: a dataframe and the results of our built-in linear regression operator.
Since this is a user-defined operator, we can tailor the operator to our specific needs. In this case, we know that the results from the linear regression operator returns a table, with a column for each predictor or X variable, in this case petal_width
, and a column for const
, which represents the y-intercept of the regression line. For more advanced operators, you can always add more inputs or custom inputs in the Operator specification tab.
Once you're satisfied with your code, you can use the Options hamburger menu at the top to publish your operator. As soon as you hit publish, you'll be able to use your operator in any Einblick canvas. You can also share your operator with any stakeholders or collaborators so they can use it as well. Read more about keywords and concepts about our user operators in Einblick's user operator documentation.
Why Python: advancing as a data science community
The simple linear regression example is a great introduction to the implementation of user operators in Einblick, but some of our users have created more advanced operators, such as an NBA shot chart visualizer, inspired by this blog post.
We know how important innovation, flexibility, and efficiency are to the data science and Python community. Our team is excited to see how other users will use our operators to create innovative solutions to data science challenges, and we look forward to fostering our community as we grow as a company.
Together, with our users, we will continue leveraging the power of Python to simplify and amplify modern data science. Try out Einblick and our user operators for free, and let us know what you think. We look forward to your feedback!