No-code Machine Learning Cross-validation and Interpretability in techtonique.net
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve added a new feature to techtonique.net: No-code Machine Learning Cross-validation and Interpretability for tabular data (supervised learning,regression and classification). techtonique.net will remain free to use until December 24, December 30, 2024. As many others have already done, give it a try!
To use this new feature, you’ll need to navigate to https://www.techtonique.net/mlcvexplain, and select a file from Techtonique/datasets. The files to be used are those with names ending with a “2”, in folders /classification or /regression, i.e tables with a training set index. This means: you can create similar files with your own data, and add a training set index as an additional column for predictive purposes.
Once the file is uploaded, you can click on the “Submit” button to see (work in progress):
- 5-fold cross-validation results: Balanced accuracy for classification, or Root Mean Squared Error (RMSE) for regression (how well the model performs on the training set)
- Out-of-sample errors: Balanced accuracy for classification, or Root Mean Squared Error (RMSE) for regression (how well the model performs on the test set of interest)
- Sensitivity analysis summary: impact of each feature on the response variable (e.g what are the variables that drive an increase in the price of an apartment, or the probability of a patient having cancer)
- Parametric tests of significance (for now). Identify variables for which there’s no significant ambiguity (given the assumptions hold) regarding the fact that their effect on the response variable is different from zero (e.g is the effect of the number of rooms on the price of an apartment statistically significant?)
- Feature effects’ heterogeneity: the distribution of the effects of each feature on the response variable (e.g to what extent does the number of rooms affects the price of an apartment?)
As a reminder of techtonique.net features:
You can simulate predictive scenarios using R, Python, and Excel, by using Techtonique API, available at https://www.techtonique.net. Input csv files used in the examples are available in Techtonique/datasets.
The Excel version can be found in the Excel file https://github.com/thierrymoudiki/techtonique-excel-server/VBA-Web.xlsm (in ‘Sheet3’). Behind the scenes, I’m using Visual Basic for Applications (VBA) to send requests to the API. All you need to do to see it in action is get a token and press a button. Remember to enable macros in Excel when asked to do so (this is safe).
Here’s the Python version, which relies on forecastingapi
Python package:
import forecastingapi as fapi import numpy as np import pandas as pd from time import time import matplotlib.pyplot as plt import ast # examples in https://github.com/Techtonique/datasets/tree/main/time_series path_to_file = '/Users/t/Documents/datasets/time_series/univariate/AirPassengers.csv' start = time() res_get_forecast = fapi.get_forecast(path_to_file, base_model="RidgeCV", n_hidden_features=5, lags=25, type_pi='scp2-kde', replications=10, h=10) print(f"Elapsed: {time() - start} seconds \n") print(res_get_forecast) # Convert lists to numpy arrays for easier handling mean = np.asarray(ast.literal_eval(res_get_forecast['mean'])).ravel() lower = np.asarray(ast.literal_eval(res_get_forecast['lower'])).ravel() upper = np.asarray(ast.literal_eval(res_get_forecast['upper'])).ravel() sims = np.asarray(ast.literal_eval(res_get_forecast['sims'])) # Plotting plt.figure(figsize=(10, 6)) # Plot the simulated lines for sim in sims: plt.plot(sim, color='gray', linestyle='--', alpha=0.6, label='Simulations' if 'Simulations' not in plt.gca().get_legend_handles_labels()[1] else "") # Plot the mean line plt.plot(mean, color='blue', linewidth=2, label='Mean') # Plot the lower and upper bounds as shaded areas plt.fill_between(range(len(mean)), lower, upper, color='lightblue', alpha=0.2, label='Confidence Interval') # Labels and title plt.xlabel('Time Point') plt.ylabel('Value') plt.title('Spaghetti Plot of Mean, Bounds, and Simulated Paths') plt.legend() plt.show()
The R version relies on forecastingapi
R package:
path_to_file <- "/Users/t/Documents/datasets/time_series/univariate/AirPassengers.csv" forecastingapi::get_forecast(path_to_file) forecastingapi::get_forecast(path_to_file, type_pi='scp2-kde', h=10L, replications=10L) sims <- forecastingapi::get_forecast(path_to_file, type_pi="scp2-kde", replications=10L)$sims matplot(sims, type='l', lwd=2)
In addition, you can obtain insights from your tabular data by chatting with it in techtonique.net. No plotting yet (coming soon), but you can already ask questions like:
- What is the average of column
A
? - Show me the first 5 rows of data
- Show me 5 random rows of data
- What is the sum of column
B
? - What is the average of column
A
grouped by columnB
? - …
You can also run R or Python code interactively in your browser, on www.techtonique.net/consoles.
Techtonique web app is a tool designed to help you make informed, data-driven decisions using Mathematics, Statistics, Machine Learning, and Data Visualization. As of September 2024, the tool is in its beta phase (subject to crashes) and will remain completely free to use until December 30, 2024. After registering, you will receive an email. CHECK THE SPAMS.
The tool is built on Techtonique and the powerful Python ecosystem. Both clickable web interfaces and Application Programming Interfaces (APIs, see below) are available.
Currently, the available functionalities include:
- Data visualization. Example: Which variables are correlated, and to what extent?
- Probabilistic forecasting. Example: What are my projected sales for next year, including lower and upper bounds?
- Machine Learning (regression or classification) for tabular datasets. Example: What is the price range of an apartment based on its age and number of rooms?
- Survival analysis, analyzing time-to-event data. Example: How long might a patient live after being diagnosed with Hodgkin’s lymphoma (cancer), and how accurate is this prediction?
- Reserving based on insurance claims data. Example: How much should I set aside today to cover potential accidents that may occur in the next few years?
As mentioned earlier, this tool includes both clickable web interfaces and Application Programming Interfaces (APIs).
APIs allow you to send requests from your computer to perform specific tasks on given resources. APIs are programming language-agnostic (supporting Python, R, JavaScript, etc.), relatively fast, and require no additional package installation before use. This means you can keep using your preferred programming language or legacy code/tool, as long as it can speak to the internet. What are requests and resources?
In Techtonique/APIs, resources are Statistical/Machine Learning (ML) model predictions or forecasts.
A common type of request might be to obtain sales, weather, or revenue forecasts for the next five weeks. In general, requests for tasks are short, typically involving a verb and a URL path — which leads to a response.
Below is an example. In this case, the resource we want to manage is a list of users.
- Request type (verb): GET
- URL Path:
http://users
| Endpoint: users | API Response: Displays a list of all users - URL Path:
http://users/:id
| Endpoint: users/:id | API Response: Displays a specific user
- Request type (verb): POST
- URL Path:
http://users
| Endpoint: users | API Response: Creates a new user
- Request type (verb): PUT
- URL Path:
http://users/:id
| Endpoint: users/:id | API Response: Updates a specific user
- Request type (verb): DELETE
- URL Path:
http://users/:id
| Endpoint: users/:id | API Response: Deletes a specific user
In Techtonique/APIs, a typical resource endpoint would be /MLmodel
. Since the resources are predefined and do not need to be updated (PUT) or deleted (DELETE), every request will be a POST request to a /MLmodel
, with additional parameters for the ML model.
After reading this, you can proceed to the /howtoapi page.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.