How to Run Python’s Scikit-Learn in R in 5 minutes
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The 2 most popular data science languages – Python and R – are often pitted as rivals. This couldn’t be further from the truth. Data scientists that learn to use the strengths of both languages are valuable because they have NO LIMITS.
- Machine Learning: They can switch to Python to leverage
scikit learn
andtensorflow
. - Data Wrangling, Visualization, Apps & Reporting: They can quickly change to R to use
tidyverse
,shiny
andrmarkdown
.
The bottom line is that knowing both R
and Python
makes you SUPER PRODUCTIVE. This article has been updated. View the updated article at Business Science.
Have 5 minutes?
Then let’s set up Python Scikit Learn
We’re going to go through the essential setup tips of the PRO’s – those that use Python from R via reticulate
.
-
Install the Anaconda Distribution
-
Get Python Scikit Learn Setup in R
-
Do a Cluster Analysis with Affinity Propagation Algorithm to make sure Scikit Learn is running.
Using Python Scikit Learn & R
How do I use them together for Business Projects???
Setting up Python
in R
is an insane productivity booster, but you still need to learn how to use Python and R together for real business projects. And, it’s impossible to teach you all the in’s and out’s in 1 short article. But, I have great news!
I just launched a NEW LEARNING LAB PYTHON + R SERIES (Register Here) that will show you how to use Python
and R
together on Real Business Projects – Human Resources Employee Clustering, Sales and Marketing, Finance, Energy, Social Media, and more! And it’s FREE to attend live.
Register here to attend Python + R Learning Labs live for free. I’ll notify you in advance of the accelerated 1-hour courses that you can attend via webinar.
2 Steps to Python
Yeah, you heard me right. With only 2 steps, we are able to use Python in R!
Step 1 – Reticulate Setup
Fire up an R Markdown document and load tidyverse
and reticulate
:
tidyverse
– Loads the core data wrangling and visualization packages needed to work in R.reticulate
– The key link between R and Python.
Your R Markdown should have something that looks like this (possibly without the outline, but that’s where we are headed).
R Markdown (Rmd) File with reticulate
Step 2 – Conda Installation
Next, we need to make sure we have the Python Environment setup that we want to use. For Python Environments, we will use Anaconda (Conda), a python
environment management tool specifically developed for data scientists.
Download Conda
- Anaconda Distribution – Installation Instructions
Create a New Python Environment
- Run the following code in your terminal:
This code does the following:
- Creates a new Python environment called “py3.8”
- Installs
python
version 3.8 - Installs the latest versions of
scikit-learn
,pandas
,numpy
, andmatplotlib
.
In the future you can always add more python
packages (more on this in Pro Tips).
List your Conda Environments (in the Terminal)
- Use
conda list env
to list your Conda Environments in the Terminal. - If you see
py3.8
, you are good to go.
List your Conda Enviromnents (in R Markdown)
Back in R Markdown, we can do the same thing using retculate::conda_list()
.
Set Your Conda Environment (in R Markdown)
Make sure your R Markdown document activates the “py3.8” environment using use_condaenv()
.
Double check that reticulate
is actually using your new conda env.
You should see something like this where the python path is:
python: /Users/mdancho/opt/anaconda3/envs/py3.8/bin/python
.
It may not be exact, but you should see “py3.8” in the file path.
Python Tests
All of the code in this section uses python code chunks
. This means you need to use {python}
instead of {r}
code chunks.
- Errors in this section: Are likely because you have a code chunk with
{r}
(it’s super easy to make this mistake) - Solution: Replace
{r}
with{python}
.
Spoiler alert – I have a PRO-TIP coming that helps big time.
Test 1 – Is Python working???
- Let’s add 1 + 1
- You should see 2
Test 2 – Numpy & Pandas
- Import
numpy
andpandas
using the import shorthandnp
andpd
respectively.numpy
– Math Calculationspandas
– Data Wrangling
Numpy
Test numpy
using the np.arange()
function to create a sequence of numbers in an array.
Pandas
Next, test pandas
by creating a data frame df
using pd.DataFrame()
.
Test 3 – Matplotlib
Run the following pandas
plotting code. If the visualization appears, matplotlib
is installed.
Test 4 – Scikit Learn
Run a test Random Forest using RandomForestClassifier
from the sklearn.ensemble
module of Scikit Learn.
Use the predict()
method to make a prediction on the training data set.
Can you Run Affinity Progagation???
If you are planning to attend Learning Lab 33 – HR Analytics Employee Clustering with Python Scikit Learn (Register Here), you will need to be able to perform the following algorithms to comple an Employee Clustering and Termination Analysis Project:
- Affinity Propagation and DBSCAN Clustering Algorithms
- TSNE Manifold Embedding
A simple test is to run the AffinityPropagation
test from Scikit Learn’s website.
Become Great at Shiny
Up until now we haven’t talked about Shiny
! It’s web application framework that is used to take your python
and R
machine learning models into Production.
Business Science Application Library
A Meta-Application that houses Shiny Apps
R Shiny needs to be in your toolbox if you want to productionize Data Science. You simply cannot put machine learning applications into production with other “BI” Tools like Tableau, PowerBI, and QlikView.
CRITICAL POINT: You can USE SHINY to productionize python
Scikit Learn
and Tensorflow Models
If you need to learn R Shiny as fast as possible, I have the perfect program for you. It will accelerate your career. The 4-Course R-Track Bundle through Business Science.
Pro Tips (Python in R)
Now that you have python
running in R
, use these pro-tips to make your experience way more enjoyable.
Pro-Tip #1 – Python Chunk Keyboard Shortcut
I can’t stress this one enough – Set up a Keyboard shortcut for Python Code Chunks. This is a massive productivity booster for Rmarkdown documents.
- My preference:
Ctrl + Alt + P
When you hit Ctrl + Alt + P
, a {python}
code chunk will appear in your R Markdown document.
Pro-Tip #2 – Use Python Interactively
For debugging Python Code Chunks in R Markdown, it can help to use the repl_python()
to convert your Console to a Python Code Console. To do so:
- In R Console, you can run python interactively using
repl_python()
. You will see>>>
indicating you are in Python Mode. - Make sure the correct Python / Conda Environment is selected.
- To escape Python in the console, just hit
escape
.
Pro-Tip #3 – 4 Conda Terminal Commands
At some point you will need to create, modify, add more packages to your Conda Environment(s). Here are 4 useful commands:
- Run
conda env list
to list the available conda environments - Run
conda activate <env_name>
to activate a conda environment - Run
conda update --all
to update allpython
packages in a conda environment. - Run
conda install <package_name>
to install a new package
Have questions on using Python + R?
Make a comment in the chat below. ????
And, if you plan on using Python
+ R
at work, it’s a no-brainer – attend my Learning Labs (they are FREE to attend live).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.