Meta(title='mtcars: a pinned 5 x 11 DataFrame', description=None, created='20220601T175057Z', pin_hash='120a54f7e0818041', file='mtcars.csv', file_size=249, type='csv', api_version=1, version=Version(created=datetime.datetime(2022, 6, 1, 17, 50, 57, 80318), hash='120a54f7e0818041'), name='mtcars', user={}) #> #> Writing to pin 'mtcars' Above, we saved the data as a CSV, but depending on what you’re saving and who else you want to read it, you might use the type argument to instead save it as a feather, parquet, or joblib file. You can later retrieve the pinned data with .pin_read(): board.pin_read("mtcars") #> mpg cyl disp hp drat wt qsec vs am gear carb #> 0 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 #> 1 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 #> 2 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 #> 3 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 #> 4 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 You can search for data using .pin_search() and .pin_list(). # prints out a list of all pins # board.pin_list() # searches for pins containing "cars" board.pin_search("cars") #> name type ... file_size meta #> 0 mtcars csv ... 249 Meta(title='mtcars: a pinned 5 x 11 DataFrame'... #> #> [1 rows x 6 columns] Two more pieces of important functionality exist: .pin_write() won’t delete existing data, but versions your data. .pin_read() caches your data, so subsequent reads are much faster. See getting started in the pins documentation for more information. Interoperability with R pins Pins stored with Python can be read with R, and vice-versa. For example, here is R code that reads the mtcars pin we wrote to the board above. Note that TEMP_PATH refers to the temporary directory we created in this blog post for our Python board. library(pins) board % pin_read("mtcars") #> mpg cyl disp hp drat wt qsec vs am gear carb #> 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 #> 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 #> 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 #> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 #> 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 This is especially useful when colleagues prefer one language over the other. For real collaborative work like this, you would use a board like board_rsconnect() or board_s3(). Going further The real power of pins comes when you share a board with multiple people. To get started, you can use board_folder() with a directory on a shared drive or in DropBox, or if you use RStudio Connect you can use board_rsconnect(): board = pins.board_rsconnect() board.pin_write(tidy_sales_data, "michael/sales-summary", type="csv") Then, someone else (or an automated report) can read and use your pin: board = pins.board_rsconnect() board.pin_read("michael/sales-summary") The pins package also includes boards that allow you to share data on services like Amazon’s S3 (board_s3()), with plans to support other backends such as Google Cloud Storage and Azure’s blob storage. Get in touch We are so happy about releasing pins for Python, and we want to make sure it supports your workflow. Join our discussion on RStudio Community to let us know what you’re working on, and how pins could help!" />

Announcing pins for Python

[This article was first published on RStudio | Open source & professional software for data science teams on RStudio, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We’re excited to announce the release of pins for Python!

pins removes the hassle of managing data across projects, colleagues, and teams by providing a central place for people to store, version and retrieve data. If you’ve ever chased a CSV through a series of email exchanges, or had to decide between data-final.csv and data-final-final.csv, then pins is for you.

pins stores data on a board, which can be a local folder, or on RStudio Connect or a cloud provider like Amazon S3. Each individual object (such as a dataframe, model, or another pickle-able Python object), together with some metadata, is called a pin.

The Python pins library works with its R counterpart, so that teams working across R and Python have a unified strategy for sharing data. This work emerged as part of RStudio’s investment in Python open source, in order to support bilingual data science teams.

Getting Started

The first step to using pins is installing it from PyPI.

python -m pip install pins

In the examples below, I’ll walk through the basics of pins using a temporary directory for a board, with board_temp(). This gets deleted after you close Python, so it is not ideal for collaboration! You can use other boards, like board_rsconnect(), board_folder(), and board_s3(), in more realistic settings.

import pins
from pins.data import mtcars

board = pins.board_temp()

You can “pin” (save) data to a board with the .pin_write() method. It requires three arguments: an object, a name, and a pin type:

board.pin_write(mtcars.head(), "mtcars", type="csv")
#> Meta(title='mtcars: a pinned 5 x 11 DataFrame', description=None, created='20220601T175057Z', pin_hash='120a54f7e0818041', file='mtcars.csv', file_size=249, type='csv', api_version=1, version=Version(created=datetime.datetime(2022, 6, 1, 17, 50, 57, 80318), hash='120a54f7e0818041'), name='mtcars', user={})
#> 
#> Writing to pin 'mtcars'

Above, we saved the data as a CSV, but depending on what you’re saving and who else you want to read it, you might use the type argument to instead save it as a feather, parquet, or joblib file.

You can later retrieve the pinned data with .pin_read():

board.pin_read("mtcars")
#>    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> 0 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> 1 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> 2 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> 3 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> 4 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2

You can search for data using .pin_search() and .pin_list().

# prints out a list of all pins
# board.pin_list()

# searches for pins containing "cars"
board.pin_search("cars")
#>      name type  ... file_size                                               meta
#> 0  mtcars  csv  ...       249  Meta(title='mtcars: a pinned 5 x 11 DataFrame'...
#> 
#> [1 rows x 6 columns]

Two more pieces of important functionality exist:

  • .pin_write() won’t delete existing data, but versions your data.
  • .pin_read() caches your data, so subsequent reads are much faster.

See getting started in the pins documentation for more information.

Interoperability with R pins

Pins stored with Python can be read with R, and vice-versa.

For example, here is R code that reads the mtcars pin we wrote to the board above. Note that TEMP_PATH refers to the temporary directory we created in this blog post for our Python board.

library(pins)

board <- board_folder(TEMP_PATH)
board %>% pin_read("mtcars")
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2

This is especially useful when colleagues prefer one language over the other. For real collaborative work like this, you would use a board like board_rsconnect() or board_s3().

Going further

The real power of pins comes when you share a board with multiple people. To get started, you can use board_folder() with a directory on a shared drive or in DropBox, or if you use RStudio Connect you can use board_rsconnect():

board = pins.board_rsconnect()
board.pin_write(tidy_sales_data, "michael/sales-summary", type="csv")

Then, someone else (or an automated report) can read and use your pin:

board = pins.board_rsconnect()
board.pin_read("michael/sales-summary")

The pins package also includes boards that allow you to share data on services like Amazon’s S3 (board_s3()), with plans to support other backends such as Google Cloud Storage and Azure’s blob storage.

Get in touch

We are so happy about releasing pins for Python, and we want to make sure it supports your workflow. Join our discussion on RStudio Community to let us know what you’re working on, and how pins could help!

To leave a comment for the author, please follow the link and comment on their blog: RStudio | Open source & professional software for data science teams on RStudio.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)