I’ve added a new function to qeML 1.2, qeMittalGraph, based on an idea by my student Aditya Mittal. Below is an example that I think is rather compelling. The basic idea is quite simple (and not necessarily new, just something I had not seen below): Instead of comparing several curves ...
I’m very pleased to announce a new package, dsld, available on CRAN. This is the work of eight talented undergrad students. I provided the concept and some general guidance, but this is their work. The package is aimed at dealing with discrimination — race, gender, age — in the workplace, education, ... [Read more...]
Readers who are interested in the Data Privacy field may find our new paper (Perry, Matloff, Tendick) of interest, https://tdp.cat/issues21/tdp.a478a22.pdf…. There we introduce a new method that we call RWN, Randomization within Neighborhoods. We present a bit of supporting theory and do some ... [Read more...]
The famed physicist Richard Feynman once said, “I learned very early the difference between knowing the name of something and knowing something,” a lesson from his father. I think too often we in the statistics/machine learning field are guilty of “only knowing the name of something.” Well, in most ...
I’ve added a new function, qeNeuralTorch, to the qeML package, as an alternative to the package’s qeNeural. It is experimental as this point, but usable and I urge everyone to try it out. In this post, I will (a) state why I felt it desirable to add such ... [Read more...]
In my December 22 blog, I first introduced the classic parametric quantile regression (QR) concept. I then showed how one could use the qeML package to perform quantile regression nonparametrically, using the package’s qeKNN function for a k-Nearest Neighbors approach. A reader then asked if this could be applied to ...
In this post, I will first introduce the concept of quantile regression (QR), a powerful technique that is rarely taught in stat courses. I’ll give an example from the quantreg package, and then will show how qeML can be used to do model-free QR estimation. Along the way, I ... [Read more...]
Is machine learning overrated, with traditional methods being underrated these days? Yes, ML has had some celebrated successes, but these have come after huge amounts of effort, and it’s possible that similar effort with traditional methods may have produced similar results. A related issue concerns the type of data. ...
In writing an R package, it is often useful to build up some function call in string form, then “execute” the string. To give a really simple example: Quite a lot of trouble to go to just to find that 1+1 = 2? Yes, but this trick can be extremely useful, as we’... [Read more...]
What about variable selection? Which predictor variables/features should we use? No matter what anyone tells you, this is an unsolved problem. But there are lots of useful methods. See the qeML vignettes on feature selection and overfitting for detailed background on the issues involved. We note at the outset ... [Read more...]
Sorry I haven’t been very active on this blog lately, but now that I have more time, that will change. I’ve got myriad things to say. To begin with, then, I’ll announce a major new R package, and my new book. qeML package (“quick and easy machine ... [Read more...]
I’ve recently completed fastStat, https://github.com/matloff/fastStat,a quick introduction to statistics for those who’ve had a calculus-based probability course. Many such people later need to do statistics, and this will give them quick access. It is modeled after my R tutorial, https://github.com/matloff/... [Read more...]
Many of you may have heard of ChatGPT, a dazzling new AI tool. We are hearing lots of gushing praise for the tool. Well, how well does it do in data science contexts? I tried a few queries here, and found interesting results. I first requested, “Write an R function ...
The field of data privacy has long been of broad interest. In a medical database, for instance, how can administrators enable statistical analysis by medical researchers, while at the same time protecting the privacy of individual patients? Over the years, many methods have been proposed and used. I’ve done ... [Read more...]
I have a new short writeup, showing common R design patterns, implemented side-by-side in base-R and Tidy. As readers of this blog know, I strongly believe that Tidy is a poor tool for teaching R learners who have no coding background. Relative to learning in a base-R environment, learners using ... [Read more...]
During the last year or so, I’ve been quite interested in the issue of fairness in machine learning. This area is more personal for me, as it is the confluence of several interests of mine: My lifelong activity in probability theory, math stat and stat methodology (in which I ... [Read more...]
George Ostrouchov, one of R’s top parallel computing experts, will run a workshop on cluster parallel computing in R next week. BTW, even a multicore laptop is a “cluster,” so anyone can apply this material to their own work even if ... [Read more...]
As many readers of this blog know, I strongly believe that R learners should be taught base-R, not the tidyverse. Eventually the students may settle on using a mix of the two paradigms, but at the learning stage they will benefit from the fact that base-R is simple and more ... [Read more...]