Site icon R-bloggers

Community interviews about {data.table}

[This article was first published on Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< section id="whats-it-all-about" class="level2">

What’s it all about?

One stipulation of NSF POSE funded projects like this one was to conduct several interviews under NSF’s I-CORPS program (Winter 2024 Cohort), to gather information as to how data.table as an open-source project can improve and remain sustainable. For four weeks starting on the 17th of January, I conducted a total of 60 interviews with R Users and data.table contributors. Issue#5880 on the data.table GitHub mentions this, and has a link to the Google Doc that contains the list of people interviewed.

Project PI Toby Hocking assigned me to do these interviews and serve as the EL (Entrepreneurial Lead) for the data.table team. In addition to the interviews, this position involves tasks such as making and giving various presentations. Having successfully completed the program and conducted the interviews, it’s time to share the insights I gathered from them as a source of open-ended knowledge for the community.

But before you head below to read those parts, I would like to convey a big Thank you! to everyone who took part in this; from making availability and scheduling, to providing comprehensive feedback, and all the while being extremely communicative. I sincerely appreciate it. Not just for the value of your insights brought to the table, but also for being great people to talk with in general!

< section id="selected-quotes" class="level2">

Selected Quotes

We begin with some direct (anonymous) quotes from the interviews giving positive feedback and general statements about data.table.

data.table would be one of the few arguments (in addition to Shiny) that I could bring forward to make people use R instead of Python”

“Using data.table, it becomes easier to read, manipulate, and represent data in a more appropriate way for my needs”

“If I give my script to somebody, knowing that only data.table needs to be installed is reassuring”

“I like it for minimalism and since it’s backward compatible with data.frame

data.table was small enough to put my data into RAM (noticeable copy reduction compared to dplyr) and do analyses on my old laptop”

“Very convenient to operate on lists as columns, wherein the base structure in which mlr3 is programmed around is in essence, a data.table

“Not something I’d recommend to everyone because of its peculiar syntax, but once you are used to it, I believe it can be very expressive while reducing lines of code dramatically”

“I’m happy to see the community mobilization around this package since it brings such a valuable contribution to base R and is used by so many other packages.”

“All the f-xyzfunctions are super useful. For example, it takes ages to read a big CSV with read_csv, while using fread sometimes doesn’t even allow me to grab a cup of coffee :)”

Next we will summarize some consistent themes that will help guide the grant moving forward.

< section id="theme-1-contribution-of-package-development" class="level2">

Theme 1: Contribution of package development

The first theme of the interviews was what values motivate people or prevent people in contributing to the data.table project. These varied from person to person, but I summarize a few common answers below.

< section id="positive-motivations" class="level3">

Positive motivations

< section id="barriers" class="level3">

Barriers

< section id="summary-and-takeaways" class="level3">

Summary and Takeaways

Section added by Kelly Bodwin

Based on these findings, we on the grant team see three major directions for encouraging more contribution to data.table:

  1. Use in projects: Interviewees reported adding their own functionality to data.table based on needs in personal or work projects. Others cited their own lack of data.table use as a reason not to be more involved. The more we can encourage practical adoption of data.table, where it can be useful to users, the more contribution we will see from users.

  2. Feeling of community and culture of inclusion: This is already a focus of the grant project, and it is great to hear that this is already valued by the users and community members! We hope to vastly expand the beginner-friendliness and language diversity of documentation.

  3. Beginner-friendliness and support: Interviewees reported not having the programming skills to add to data.table. Going forward, we hope to better denote and emphasize the areas of contribution for less experienced programmers, and to provide more supporting resources for new community members to learn about the structure of the package.

  4. Financial and professional benefits: Contributors report that developing for data.table has positive impact on professional development and hireability, and that they would welcome financial incentives as well. I believe we should experiment with structure that help support our developers in concrete ways.

  5. Pull Request process and timeline: We believe that the newly established Governance Document for the package will help clarify and streamline the contributor process for the future.

< section id="theme-2-adoption-of-data.table" class="level2">

Theme 2: Adoption of data.table

The second theme is what drives people to be regular users of the data.table package. We mostly focused on barriers to adpotion.

< section id="individual-reasons-to-not-adopt" class="level3">

Individual reasons to not adopt

People cited various reasons for not utilizing or transitioning to data.table:

< section id="areas-of-improvement" class="level3">

Areas of improvement

Specific areas were identified by the interviewed population, including regular users, that they would like to see improvement in or be worked upon.

In terms of technical improvements:

In terms of the community revolving around data.table:

< section id="summary-and-takeaways-1" class="level3">

Summary and Takeaways

Section added by Kelly Bodwin

  1. Education and Resources: It has been clear from the outset of this project that data.table could benefit from a lot more documentation, guides, tutorials, etc. This is always a tough issue, because creating such materials can be a thankless task with not a lot of concrete payoff. However, thanks to the grant, we are able to fund time for this project! Expect good things on the horizon in this category.

  2. Syntax and the R sub-languages: The diversity of R syntax is a blessing and a curse, and everyone has their favorite sytax style, from tidyverse to formula style to Base R to data.table, and every combination in between. Ultimately, our goal should be to be as flexible and possible and offer ways for data.table to interface smoothly with other styles, without losing it’s core syntax structure and personality! (dtplyr and tidyfast are lovely examples of such interfacing.)

  3. Applicability to the problem at hand: This is an interesting one. Can we do better at defining what dataset sizes and types are the best use cases for data.table? Can we provide more options for interfacing with databases, so that users can perhaps pull data using database tools, but analyze on-disk with data.table?

< section id="theme-3-open-source-sustainability" class="level2">

Theme 3: Open-source sustainability

Finally, we asked interviewees what might be necessary to give data.table long term sustainability.

< section id="summary-and-takeaways-2" class="level3">

Summary and Takeaways

Section added by Kelly Bodwin

I don’t have much to add to this one – it’s clear that we need more support structure for open-source maintenance, whether from private sources or public grants or community sponsorship.

< section id="anis-roadmap" class="level2">

Ani’s Roadmap

Here is my list of ideas to potentially do or keep in mind for the agenda going forward:

No matching items
To leave a comment for the author, please follow the link and comment on their blog: Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version