Site icon R-bloggers

Results of the 2023 survey

[This article was first published on Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Thanks to everyone who helped create, shared, or filled out the first data.table survey! The survey was officially open between October 17 and December 1 and it received 391 responses during this time.

This post provides a partial summary of the results. It covers all close-ended questions & includes short, informal summaries of the answers to some of the open-ended questions.

I encourage you to explore the data yourself – you can find it here.

< section id="respondents" class="level2">

Respondents

A typical respondent was:

For a richer summary, here are the corresponding bar charts:

< section id="the-good-and-the-bad" class="level2">

The good and the bad

What do users appreciate most about data.table? A scan of the answers to this open-ended question quickly reveals a clear winner: performance. Nearly every answer brings up speed or memory efficiency.

“Speed! When I need speed, I turn directly to data.table.”

The runner-up is syntax, with users praising its concision and expressiveness. At the same time, however, syntax often appears in the answers to the question about the biggest challenges in using data.table. For some users it is too concise or difficult to remember. Some users highlighted specific functionality that they find difficult to use: reshaping (dcast/melt) is brought up most often, followed by joining.

“Some queries are so surprisingly simple for complex operations”

“Still can’t get used to the syntax, have to look it up every time”

We explored this topic in a more structured way as well, by asking about the following areas:

The possible answers were Very dissatisfied, Somewhat dissatisfied, Neither satisfied nor dissatisfied, Somewhat satisfied, Very satisfied, which I mapped to -2:2 below. The majority of users are Very satisfied with performance (86.2%), minimal dependencies (77.8%), backward compatibility (60.8%), and syntax concision (57.1%). Syntax readability (35.5%), error messages (29.0%), and documentation (30.2%) lag behind.

Does this pattern hold across all levels of data.table experience? The following plot shows the average (vertical red line) in addition to the distribution of answers across the different levels of experience.

A way to contextualize these results is to consider how important the different areas are. Another grid question featured this same set of areas, but asked about their importance to the user. I standardized the satisfaction & importance scores and plot the averages below. The two areas that score relatively high in importance but relatively low in satisfaction are syntax readability and quality of documentation.

Another grid question asked about users’ satisfaction with:

While Very satisfied was the dominant response for every area, the results are consistent with earlier qualitative observations in that the share of users selecting this response is substantially lower for reshaping (49.2%) and joining (45.9%) than the other areas (manipulation & aggregation 59.9%, import/export 62.7%, filtering 71.3%).

The next plot considers variation across levels of data.table experience. One area where beginners (less than 2 years of experience) are less satisfied compared to other users is importing & exporting.

< section id="desired-functionality" class="level2">

Desired functionality

What extra functionality would users like to see? The answers to this question covered a lot of different ground, but the three clear winners (with at least 10 mentions each) were:

Pipe integration was also the subject of a later question in the survey, with the majority of users (69.4%) indicating they would find a helper function for working with the pipe useful.

Another specific question asked about the alias for the walrus operator (:=). Interestingly, set() (47.3%) outperformed let() (39.2%), with setj() (13.6%) far behind.

< section id="contributing-to-data.table" class="level2">

Contributing to data.table

Good news for data.table is that many users indicated interest in contributing to the project. In particular, 80 respondents (20.8%) said Yes, and a further 191 (49.6%) respondents answered Maybe.

We followed up on this question by asking for interest in specific activities. The orange bars in the following plot represent interest in contributing, whereas the darker parts indicate actual contribution in the past.

What would make contributing to data.table easier or more appealing? Setting aside personal reasons, such as lack of time or skill, the following areas were mentioned at least a few times each:

< section id="conclusion" class="level2">

Conclusion

The responses to this survey make the value of data.table clear, but some users fear that the package may be abandoned or stagnating. Fear not, the project is again picking up steam! An important release with many new features and bug fixes recently landed on CRAN, and the project now has a governance document, which includes information on the different roles you can take. New contributions are very welcome, so check out the guidelines and take a look at the open issues – those labeled beginner-task are a particularly great place to start!

< section id="what-can-you-do" class="level2">

What can you do?

Are you interested in learning more, or helping grow the data.table community and infrastructure? Here are some places to start:

No matching items
To leave a comment for the author, please follow the link and comment on their blog: Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version