Python Streamlit Example

Python

2022

Author

Cliff Weaver

Published

February 22, 2017

Introduction

While I much prefer R and Shiny to publish solutions, I do not reject Python platforms for publishing data science projects. Below, I have used Pycaret and Steamlit to build a blended machine learning model and published the application on Streamlit.

What I learned is Pycaret is an easy to use and powerful automated machine learning Python module. Also, Streamlit is very easy to use and you can publish your application for free just as you can a Shiny application.

Will I stop using R and Shiny instead of Python alternatives? No, unless the business case requires me to do so.

import numpy as np
import pandas as pd
from pycaret.regression import *

dataset = pd.read_csv("insurance.csv")
dataset.head()

Get Data

age	sex	bmi	children	smoker	region	charges
Insurance Data
19	female	27.900	0	yes	southwest	16884.924
18	male	33.770	1	no	southeast	1725.552
28	male	33.000	3	no	southeast	4449.462
33	male	22.705	0	no	northwest	21984.471
32	male	28.880	0	no	northwest	3866.855

data = dataset.sample(frac=0.9, random_state=786)
data_unseen = dataset.drop(data.index)

data.reset_index(drop=True, inplace=True)
data_unseen.reset_index(drop=True, inplace=True)

print('Data for Modeling: ' + str(data.shape))
print('Unseen Data For Predictions: ' + str(data_unseen.shape))

Data for Modeling: (1204, 7) Unseen Data For Predictions: (134, 7)

s = setup(data, target = 'charges', session_id = 123,
           normalize = True, silent = True,
           polynomial_features = True, trigonometry_features = True,
           feature_interaction=True, 
           bin_numeric_features= ['age', 'bmi'])

Pycaret Feature Engineering and Selection

read.delim("texttable2.txt", sep = "\t") %>%as_tibble() %>%  select(Description, Value) %>% 
  gt() %>% tab_header(title = md("**Pycaret Data Features**")) %>% 
  tab_style(style = list(cell_fill(color="green")), locations = cells_body(columns = c(Description, Value), rows = Value == "True"))

Description	Value
Pycaret Data Features
session_id	123
Target	charges
Original Data	(1204, 7)
Missing Values	False
Numeric Features	2
Categorical Features	4
Ordinal Features	False
High Cardinality Features	False
High Cardinality Method	None
Transformed Train Set	(842, 58)
Transformed Test Set	(362, 58)
Shuffle Train-Test	True
Stratify Train-Test	False
Fold Generator	KFold
Fold Number	10
CPU Jobs	-1
Use GPU	False
Log Experiment	False
Experiment Name	reg-default-name
USI	da99
Imputation Type	simple
Iterative Imputation Iteration	None
Numeric Imputer	mean
Iterative Imputation Numeric Model	None
Categorical Imputer	constant
Iterative Imputation Categorical Model	None
Unknown Categoricals Handling	least_frequent
Normalize	True
Normalize Method	zscore
Transformation	False
Transformation Method	None
PCA	False
PCA Method	None
PCA Components	None
Ignore Low Variance	False
Combine Rare Levels	False
Rare Level Threshold	None
Numeric Binning	True
Remove Outliers	False
Outliers Threshold	None
Remove Multicollinearity	False
Multicollinearity Threshold	None
Remove Perfect Collinearity	True
Clustering	False
Clustering Iteration	None
Polynomial Features	True
Polynomial Degree	2
Trignometry Features	True
Polynomial Threshold	0.100000
Group Features	False
Feature Selection	False
Feature Selection Method	classic
Features Selection Threshold	None
Feature Interaction	True
Feature Ratio	False
Interaction Threshold	0.010000
Transform Target	False
Transform Target Method	box-cox

Modeling

compare_models(fold=20, n_select=10)

tbl3 <- read.delim("texttable3.txt", sep = "\t") %>%as_tibble()
min_mse = min(tbl3$MSE)
max_r2 = max(tbl3$R2)
min_rmse = min(tbl3$RMSE)
min_mae = min(tbl3$MAE)
min_rmsle = min(tbl3$RMSLE)
min_mape = min(tbl3$MAPE)

tbl3 <- tbl3 %>% gt() %>% tab_header(title = md("**Model Performance Comparison**")) %>% 
  tab_style(style = list(cell_fill(color = "green")), locations = cells_body(columns = c(MSE), rows = MSE == min_mse)) %>% 
  tab_style(style = list(cell_fill(color = "green")), locations = cells_body(columns = c(MAE), rows = MAE == min_mae)) %>% 
  tab_style(style = list(cell_fill(color = "green")), locations = cells_body(columns = c(R2), rows = R2 == max_r2)) %>% 
  tab_style(style = list(cell_fill(color = "green")), locations = cells_body(columns = c(RMSE), rows = RMSE == min_rmse)) %>% 
  tab_style(style = list(cell_fill(color = "green")), locations = cells_body(columns = c(RMSLE), rows = RMSLE == min_rmsle)) %>% 
  tab_style(style = list(cell_fill(color = "green")), locations = cells_body(columns = c(MAPE), rows = MAPE == min_mape))

tbl3

Model	MAE	MSE	RMSE	R2	RMSLE	MAPE	TT..Sec.
Model Performance Comparison
Lasso Least Angle Regression	2732.596	2.104770e+07	4523.254	8.42300e-01	0.4220	0.3214	0.0145
Ridge Regression	2728.936	2.108289e+07	4519.783	8.42000e-01	0.4083	0.2998	0.0115
Bayesian Ridge	2735.247	2.112505e+07	4526.634	8.41800e-01	0.4085	0.3008	0.0160
Linear Regression	2727.752	2.105203e+07	4505.568	8.41600e-01	0.4104	0.2984	1.1405
Lasso Regression	2716.907	2.109591e+07	4515.338	8.41600e-01	0.4089	0.2972	0.0140
Gradient Boosting Regressor	2553.426	2.117309e+07	4486.413	8.40600e-01	0.4306	0.3055	0.0560
Orthogonal Matching Pursuit	2853.320	2.246594e+07	4695.386	8.34400e-01	0.4432	0.3401	0.0105
Huber Regressor	1756.587	2.263832e+07	4684.403	8.31900e-01	0.3460	0.0731	0.0465
Passive Aggressive Regressor	1741.354	2.281087e+07	4705.221	8.30900e-01	0.3474	0.0709	0.0745
Random Forest Regressor	2637.197	2.274487e+07	4651.607	8.26800e-01	0.4543	0.3193	0.2225
Light Gradient Boosting Machine	2862.800	2.326736e+07	4718.931	8.25300e-01	0.5139	0.3609	0.0310
Extra Trees Regressor	2568.497	2.493545e+07	4872.980	8.09400e-01	0.4511	0.2984	0.2060
K Neighbors Regressor	3349.147	3.449971e+07	5815.685	7.47600e-01	0.4683	0.3145	0.0390
AdaBoost Regressor	5023.447	3.325294e+07	5720.854	7.47300e-01	0.7227	0.9520	0.0180
Decision Tree Regressor	2963.674	3.860766e+07	6095.674	7.09000e-01	0.5118	0.3331	0.0115
Elastic Net	6242.648	6.143278e+07	7808.173	5.71400e-01	0.7065	0.8848	0.0125
Dummy Regressor	9409.356	1.518491e+08	12221.629	-2.93000e-02	1.0215	1.5963	0.0095
Least Angle Regression	15534816.771	7.669493e+15	27427002.595	-4.84973e+07	2.5624	3122.1669	0.0145

Cross Validation

Ridge Regression

ridge=create_model("ridge", fold=10)

read.delim("texttable4.txt", sep = "\t") %>%as_tibble() %>% 
  gt() %>% tab_header(title = md("**Ridge Model Selected**"),
                      subtitle = md("Ridge model selected after running many iterations.")) %>% 
  gt_highlight_rows(rows = 11, font_weight = "normal")

Fold	MAE	MSE	RMSE	R2	RMSLE	MAPE
Ridge Model Selected
Ridge model selected after running many iterations.
0	2644.4497	17458422	4178.3276	0.8888	0.3994	0.2997
1	2396.6917	13792027	3713.7617	0.9271	0.3248	0.2947
2	2661.2834	16095249	4011.8884	0.8754	0.3945	0.3242
3	2803.0896	24090658	4908.2236	0.7923	0.4632	0.2866
4	3306.9434	32729668	5720.9849	0.7929	0.4599	0.2634
5	2740.3174	20716682	4551.5581	0.8559	0.4226	0.3347
6	2968.8784	20332018	4509.1040	0.8883	0.3812	0.3050
7	2732.5020	23839016	4882.5215	0.7771	0.4448	0.3143
8	2188.8196	12012671	3465.9299	0.9407	0.3604	0.3137
9	2865.0833	28998558	5385.0308	0.7451	0.4542	0.2699
Mean	2730.8058	21006497	4532.7331	0.8483	0.4105	0.3006
Std	288.0337	6237200	678.8429	0.0638	0.0442	0.0216

tuned_ridge=tune_model(ridge)

read.delim("texttable5.txt", sep = "\t") %>%as_tibble() %>% 
  gt() %>% tab_header(title = md("**Tuned Ridge Model Selected**")) %>% 
  gt_highlight_rows(rows = 11, font_weight = "normal")

Fold	MAE	MSE	RMSE	R2	RMSLE	MAPE
Tuned Ridge Model Selected
0	2638.0696	17719748	4209.4829	0.8871	0.3964	0.2979
1	2464.8701	14317923	3783.9031	0.9243	0.3290	0.2996
2	2700.9890	16758014	4093.6553	0.8702	0.3928	0.3202
3	2821.5364	24214992	4920.8730	0.7913	0.4737	0.2893
4	3210.9170	31561262	5617.9409	0.8003	0.4497	0.2572
5	2770.4917	20765924	4556.9644	0.8556	0.4207	0.3365
6	3000.1746	20701708	4549.9131	0.8862	0.3835	0.3088
7	2752.6213	23780156	4876.4902	0.7776	0.4478	0.3218
8	2220.6887	12428095	3525.3503	0.9386	0.3579	0.3129
9	2808.0850	28607576	5348.6050	0.7485	0.4481	0.2645
Mean	2738.8443	21085540	4548.3178	0.8480	0.4100	0.3009
Std	256.7305	5783024	631.1453	0.0616	0.0437	0.0238

plot_model(ridge, plot="residuals")

plot_model(ridge, plot="error")

Lasso Regression

lr = tune_model(create_model("lr", fold=10))

read.delim("texttable6.txt", sep = "\t") %>%as_tibble() %>% 
  gt() %>% tab_header(title = md("**Tuned Lasso Regression**")) %>% 
  gt_highlight_rows(rows = 11, font_weight = "normal")

Fold	MAE	MSE	RMSE	R2	RMSLE	MAPE
Tuned Lasso Regression
0	2653.5310	17379078	4168.822	0.8893	0.4169	0.3015
1	2298.0444	11653536	3413.727	0.9384	0.3191	0.2889
2	2662.1392	15827221	3978.344	0.8774	0.3982	0.3281
3	2814.8789	24249588	4924.387	0.7910	0.4516	0.2794
4	3427.0925	34247512	5852.137	0.7833	0.4756	0.2746
5	2737.7075	20867530	4568.099	0.8548	0.4244	0.3338
6	2925.0276	20040368	4476.647	0.8899	0.3787	0.2985
7	2741.3254	24315856	4931.111	0.7726	0.4553	0.3135
8	2193.4858	11721570	3423.678	0.9421	0.3647	0.3170
9	2885.1521	29481826	5429.717	0.7408	0.4596	0.2691
Mean	2733.8385	20978409	4516.667	0.8480	0.4144	0.3004
Std	322.5177	6951552	760.347	0.0678	0.0470	0.0214

Gradient Boosting Regression Model

gbr=tune_model(create_model("gbr", fold=10))

read.delim("texttable7.txt", sep = "\t") %>%as_tibble() %>% 
  gt() %>% tab_header(title = md("**Gradient Boosting Regression Model**")) %>% 
  gt_highlight_rows(rows = 11, font_weight = "normal")

Fold	MAE	MSE	RMSE	R2	RMSLE	MAPE
Gradient Boosting Regression Model
0	3183.1979	24621592	4962.0149	0.8431	0.5329	0.4181
1	2398.5225	11020166	3319.6636	0.9417	0.6118	0.3837
2	2857.3699	20660735	4545.4082	0.8400	0.4839	0.3977
3	3906.2212	41214576	6419.8579	0.6447	0.7550	0.4224
4	3625.8021	37865167	6153.4679	0.7604	0.6096	0.3373
5	3164.0660	23417383	4839.1511	0.8371	0.6098	0.4298
6	3359.0022	26027473	5101.7127	0.8570	0.5525	0.4737
7	3163.6582	27129401	5208.5891	0.7463	0.6274	0.4616
8	3057.1768	22049487	4695.6881	0.8911	0.6044	0.5064
9	3127.0129	34563631	5879.0842	0.6962	0.5511	0.2870
Mean	3184.2030	26856961	5112.4638	0.8058	0.5939	0.4118
Std	386.1636	8465302	848.3368	0.0869	0.0688	0.0615

Passive Aggressive Regressor

par=tune_model(create_model("par", fold=10))

read.delim("texttable8.txt", sep = "\t") %>%as_tibble() %>% 
  gt() %>% tab_header(title = md("**Passive Aggressive Regressor Model**")) %>% 
  gt_highlight_rows(rows = 11, font_weight = "normal")

Fold	MAE	MSE	RMSE	R2	RMSLE	MAPE
Passive Aggressive Regressor Model
0	1797.3083	20225171	4497.2403	0.8711	0.3831	0.0786
1	1283.8384	13518825	3676.7955	0.9285	0.1444	0.0440
2	1525.9976	16249886	4031.1147	0.8742	0.2937	0.0655
3	1837.2902	26450373	5142.9926	0.7720	0.4964	0.0792
4	2058.1164	29969980	5474.4844	0.8103	0.4379	0.0839
5	1927.5637	23508539	4848.5605	0.8365	0.3627	0.0828
6	1887.8624	22822462	4777.2861	0.8746	0.3313	0.0753
7	1746.1904	25535434	5053.2597	0.7612	0.4357	0.0773
8	1498.0157	16776104	4095.8642	0.9172	0.2440	0.0539
9	1902.7301	32862799	5732.6084	0.7111	0.4768	0.0798
Mean	1746.4913	22791957	4733.0206	0.8357	0.3606	0.0720
Std	225.7268	5881662	624.8782	0.0673	0.1047	0.0127

plot_model(par, plot="residuals")

plot_model(par, plot="feature")

plot_model(par, plot = "error")

Blended Model

blender=blend_models(estimator_list=[tuned_ridge, lr, gbr, par])

read.delim("texttable9.txt", sep = "\t") %>%as_tibble() %>% 
  gt() %>% tab_header(title = md("**Blended Model**")) %>% 
  gt_highlight_rows(rows = 11, font_weight = "normal")

Fold	MAE	MSE	RMSE	R2	RMSLE	MAPE
Blended Model
0	2408.0842	17302756	4159.658	0.8898	0.3869	0.2535
1	1863.4308	9320452	3052.941	0.9507	0.2753	0.2229
2	2282.9336	14762622	3842.216	0.8857	0.3599	0.2565
3	2654.2865	25483338	5048.102	0.7803	0.4712	0.2441
4	2829.9945	31288042	5593.572	0.8020	0.4538	0.2101
5	2481.5104	20045336	4477.202	0.8606	0.4005	0.2650
6	2564.7835	19534698	4419.807	0.8926	0.3474	0.2399
7	2478.1730	23479704	4845.586	0.7804	0.4929	0.2646
8	2064.9262	12575007	3546.126	0.9379	0.3530	0.2771
9	2410.9463	29548765	5435.878	0.7402	0.4522	0.1934
Mean	2403.9069	20334072	4442.109	0.8520	0.3993	0.2427
Std	265.2138	6810031	775.720	0.0683	0.0645	0.0252

plot_model(blender, plot="residuals")

Save Blended Model

save_model(blender, model_name="pycaret_prod_example")

Application Deployment

For this example, the app has been hosted on streamlit.io. Here is the link to the app.

The application can also be run locally using the Anaconda Prompt by moving to the folder where app.py lives and typing streamlit run app.py.