Inserting interactive scikit-learn diagrams into mkdocs
scikit-learn
has this nice feature where you can display an interactive visualization of a pipeline.
This post shows how to insert interactive diagrams into your mkdocs documentation, which is great for documenting your machine learning projects.
Here's an example of what it looks like 1:
GridSearchCV(estimator=Pipeline(steps=[('preprocessor', ColumnTransformer(transformers=[('categorical', Pipeline(steps=[('imputation_constant', SimpleImputer(fill_value='missing', strategy='constant')), ('onehot', OneHotEncoder(handle_unknown='ignore'))]), ['state', 'gender']), ('numerical', Pipeline(steps=[('imputation_mean', SimpleImputer()), ('scaler', StandardScaler())]), ['age', 'weight'])])), ('classifier', RandomForestClassifier())]), n_jobs=1, param_grid={'classifier__criterion': ['gini', 'entropy'], 'classifier__max_depth': [4, 5, 6, 7, 8], 'classifier__max_features': ['auto', 'sqrt', 'log2'], 'classifier__n_estimators': [200, 500]})In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(estimator=Pipeline(steps=[('preprocessor', ColumnTransformer(transformers=[('categorical', Pipeline(steps=[('imputation_constant', SimpleImputer(fill_value='missing', strategy='constant')), ('onehot', OneHotEncoder(handle_unknown='ignore'))]), ['state', 'gender']), ('numerical', Pipeline(steps=[('imputation_mean', SimpleImputer()), ('scaler', StandardScaler())]), ['age', 'weight'])])), ('classifier', RandomForestClassifier())]), n_jobs=1, param_grid={'classifier__criterion': ['gini', 'entropy'], 'classifier__max_depth': [4, 5, 6, 7, 8], 'classifier__max_features': ['auto', 'sqrt', 'log2'], 'classifier__n_estimators': [200, 500]})
Pipeline(steps=[('preprocessor', ColumnTransformer(transformers=[('categorical', Pipeline(steps=[('imputation_constant', SimpleImputer(fill_value='missing', strategy='constant')), ('onehot', OneHotEncoder(handle_unknown='ignore'))]), ['state', 'gender']), ('numerical', Pipeline(steps=[('imputation_mean', SimpleImputer()), ('scaler', StandardScaler())]), ['age', 'weight'])])), ('classifier', RandomForestClassifier())])
ColumnTransformer(transformers=[('categorical', Pipeline(steps=[('imputation_constant', SimpleImputer(fill_value='missing', strategy='constant')), ('onehot', OneHotEncoder(handle_unknown='ignore'))]), ['state', 'gender']), ('numerical', Pipeline(steps=[('imputation_mean', SimpleImputer()), ('scaler', StandardScaler())]), ['age', 'weight'])])
['state', 'gender']
SimpleImputer(fill_value='missing', strategy='constant')
OneHotEncoder(handle_unknown='ignore')
['age', 'weight']
SimpleImputer()
StandardScaler()
RandomForestClassifier()
How it's done
To insert a pipeline visualization into a markdown document, first save the .html
file:
from sklearn.utils import estimator_html_repr
with open("docs/assets/visualizations/gridsearch.html", "w") as f:
f.write(estimator_html_repr(grid_search))
Then, insert it into mkdocs using the snippets extension, see embedding external files:
Alternatively, you could use the markdown-exec package, or a mkdocs hook with a python script that is triggered when the docs are built (on_build
event).
-
the
grid_search
pipeline is from this example ↩