References
Abramson, I.S. (1982). On bandwidth variation in kernel estimates-a square root law. The Annals of Statistics 10(4), 1217–1223. DOI: 10.1214/aos/1176345986
Afendras, G., Markatou, M. (2019). Optimality of training/test size and resampling effectiveness in cross-validation. Journal of Statistical Planning and Inference 199, 286–301. DOI: 10.1016/j.jspi.2018.07.005
Aitchison, J., Aitken, C.G.G. (1976). Multivariate binary discrimination by the kernel method. Biometrika 63(3), 413–420. DOI: 10.1093/biomet/63.3.413
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control 19(6), 716–723. DOI: 10.1109/TAC.1974.1100705
Akaike, H. (1972). Information theory and an extension of the maximum likelihood principle, in: Proceedings of the 2nd International Symposium on Information Theory. pp. 267–281.
Amit, Y., Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation 9(7), 1545–1588. DOI: 10.1162/neco.1997.9.7.1545
Andoni, A., Indyk, P., Razenshteyn, I. (2018). Approximate nearest neighbor search in high dimensions, in: Proceedings of the International Congress of Mathematicians (Icm 2018). pp. 3287–3318. DOI: 10.1142/9789813272880_0182
Arlot, S., Bach, F. (2009). Data-driven calibration of linear estimators with minimal penalties. arXiv (0909.1884). URL https://arxiv.org/abs/0909.1884
Arlot, S., Celisse, A. (2010). A survey of cross-validation procedures for model selection. Statistics Surveys 4, 40–79. DOI: 10.1214/09-SS054
Aslett, L.J.M. (2021). Statistical Machine Learning. URL https://www.louisaslett.com/StatML/notes/
Aslett, L.J.M., Esperança, P.M., Holmes, C.C. (2015). A review of homomorphic encryption and software tools for encrypted statistical machine learning. arXiv (1508.06574). URL https://arxiv.org/abs/1508.06574
Aurenhammer, F. (1991). Voronoi diagrams — a survey of a fundamental geometric data structure. ACM Computing Surveys 23(3), 345–405. DOI: 10.1145/116873.116880
Bach, F. (2021). Learning Theory from First Principles, Draft. ed. URL https://www.di.ens.fr/~fbach/ltfp_book.pdf
Barber, D. (2012). Bayesian Reasoning and Machine Learning, 1st ed. Cambridge University Press. URL http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.Online, ISBN: 978-0521518147
Bartlett, P., Freund, Y., Sun Lee, W., Schapire, R.E. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics 26(5), 1651–1686. DOI: 10.1214/aos/1024691352
Bates, S., Hastie, T., Tibshirani, R. (2021). Cross-validation: What does it estimate and how well does it do it? arXiv (2104.00673). URL https://arxiv.org/abs/2104.00673
Belkin, M., Hsu, D., Ma, S., Mandal, S. (2019). Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854. DOI: 10.1073/pnas.1903070116
Belson, W.A. (1959). Matching and prediction on the principle of biological classification. Journal of the Royal Statistical Society: Series C 8(2), 65–75. DOI: 10.2307/2985543
Bengio, Y., Grandvalet, Y. (2004). No unbiased estimator of the variance of k-fold cross-validation. Journal of Machine Learning Research 5, 1089–1105. URL https://www.jmlr.org/papers/v5/grandvalet04a.html
Bentley, J.L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9), 509–517. DOI: 10.1145/361002.361007
Bergmeir, C., Hyndman, R.J., Koo, B. (2018). A note on the validity of cross-validation for evaluating autoregressive time series prediction. Computational Statistics & Data Analysis 120, 70–83. DOI: 10.1016/j.csda.2017.11.003
Bernardo, J.M., Smith, A.F.M. (1994). Bayesian Theory, 1st ed, Wiley Series in Probability and Statistics. Wiley. ISBN: 978-0471494645
Beygelzimer, A., Kakade, S., Langford, J. (2006). Cover trees for nearest neighbor, in: Proceedings of the 23rd International Conference on Machine Learning. pp. 97–104. DOI: 10.1145/1143844.1143857
Bishop, C.M. (2006). Pattern Recognition and Machine Learning, 1st ed, Information Science and Statistics. Springer. URL https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf, ISBN: 978-0387310732
Bohanec, M., Bratko, I. (1994). Trading accuracy for simplicity in decision trees. Machine Learning 15, 223–250. DOI: 10.1023/A:1022685808937
Box, G.E.P., Draper, N.R. (1987). Empirical model-building and response surfaces, 1st ed, Wiley series in probability and statistics. Wiley. ISBN: 978-0471810339
Breiman, L. (2001a). Statistical modeling: The two cultures. Statistical Science 16(3), 199–231. DOI: 10.1214/ss/1009213726
Breiman, L. (2001b). Random forests. Machine Learning 45, 5–32. DOI: 10.1023/A:1010933404324
Breiman, L. (1998). Arcing classifier. The Annals of Statistics 26(3), 801–849. DOI: 10.1214/aos/1024691079
Breiman, L. (1996). Bagging predictors. Machine Learning 24, 123–140. DOI: 10.1007/BF00058655
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J. (1984). Classification and regression trees, 1st ed. Chapman & Hall/CRC. ISBN: 978-1351460484
Breiman, L., Meisel, W., Purcell, E. (1977). Variable kernel estimates of multivariate densities. Technometrics 19(2), 135–144. DOI: 10.1080/00401706.1977.10489521
Breiman, L., Spector, P. (1992). Submodel selection and evaluation in regression. The \(X\)-random case. International Statistical Review 60(3), 291–319. DOI: 10.2307/1403680
Bühlmann, P. (2006). Boosting for high-dimensional linear models. The Annals of Statistics 34(2), 559–583. DOI: 10.1214/009053606000000092
Bühlmann, P., Hothorn, T. (2007). Boosting algorithms: Regularization, prediction and model fitting. Statistical Science 22(4), 477–505. DOI: 10.1214/07-STS242
Caton, S., Haas, C. (2020). Fairness in machine learning: A survey. arXiv (2010.04053). URL https://arxiv.org/abs/2010.04053
Chen, G.H., Shah, D. (2018). Explaining the success of nearest neighbor methods in prediction. Foundations and Trends in Machine Learning 10(5-6), 337–588. DOI: 10.1561/2200000064
Chen, T., Guestrin, C. (2016). XGBoost: A scalable tree boosting system, in: Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining. pp. 785–794. DOI: 10.1145/2939672.2939785
Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., Chen, K., Mitchell, R., Cano, I., Zhou, T., Li, M., Xie, J., Lin, M., Geng, Y., Li, Y. (2021). xgboost: Extreme Gradient Boosting. R package version 1.4.1.1. URL https://CRAN.R-project.org/package=xgboost
Christodoulou, E., Ma, J., Collins, G.S., Steyerberg, E.W., Verbakel, J.Y., Van Calster, B. (2019). A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. Journal of Clinical Epidemiology 110, 12–22. DOI: 10.1016/j.jclinepi.2019.02.004
Clopper, C.J., Pearson, E.S. (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26(4), 404–413. DOI: 10.1093/biomet/26.4.404
Collins, G.S., Moons, K.G.M. (2019). Reporting of artificial intelligence prediction models. Lancet 393, 1577–1579. DOI: 10.1016/S0140-6736(19)30037-6
Collins, G.S., Moons, K.G.M. (2012). Comparing risk prediction models. BMJ 344. DOI: 10.1136/bmj.e3186
Cover, T.M., Hart, P.E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27. DOI: 10.1109/TIT.1967.1053964
Cox, D.R. (1958). Two further applications of a model for binary regression. Biometrika 45(3-4), 562–565. DOI: 10.1093/biomet/45.3-4.562
Davison, A.C., Hinkley, D.V., Young, G.A. (2003). Recent developments in bootstrap methodology. Statistical Science 18(2), 141–157. DOI: 10.1214/ss/1063994969
DeGroot, M.H., Schervish, M.J. (2012). Probability and Statistics, 4th ed. Pearson. ISBN: 978-0321500465
Devroye, L., Györfi, L., Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition, 1st ed. Springer. ISBN: 978-0387946184
Dietterich, T.G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40, 139–157. DOI: 10.1023/A:1007607513941
Dietterich, T.G., Kong, E.B. (1995). Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Technical Report, Department of Computer Science, Oregon State University. URL https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.2702&rep=rep1&type=pdf
Dwork, C., Roth, A. (2014). The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science 9(3-4), 211–407. DOI: 10.1561/0400000042
Efron, B. (2020). Prediction, estimation, and attribution. Journal of the American Statistical Association 115(530), 636–655. DOI: 10.1080/01621459.2020.1762613
Efron, B. (2004). The estimation of prediction error: Covariance penalties and cross-validation. Journal of the American Statistical Association 99(467), 619–632. DOI: 10.1198/016214504000000692
Efron, B. (1986). How biased is the apparent error rate of a prediction rule? Journal of the American Statistical Association 81(394), 461–470. DOI: 10.2307/2289236
Efron, B. (1983). Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of the American Statistical Association 78(382), 316–331. DOI: 10.2307/2288636
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. The Annals of Statistics 7(1), 1–26. DOI: 10.1214/aos/1176344552
Efron, B., Tibshirani, R. (1997). Improvements on cross-validation: The .632+ bootstrap method. Journal of the American Statistical Association 92(438), 548–560. DOI: 10.2307/2965703
Epanechnikov, V.A. (1969). Non-parametric estimation of a multivariate probability density. Theory of Probability & Its Applications 14(1), 153–158. DOI: 10.1137/1114019
Fix, E., Hodges, Jr., J.L. (1951). Discriminatory analysis — nonparametric discrimination: Consistency properties. Technical Report: USAF School of Aviation Medicine (21-49-004.4). DOI: 10.2307/1403797
Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation 121(2), 256–285. DOI: 10.1006/inco.1995.1136
Freund, Y., Schapire, R.E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139. DOI: 10.1006/jcss.1997.1504
Friedman, J.H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics 29(5), 1189–1232. DOI: 10.1214/aos/1013203451
Friedman, J.H. (1998). Data mining and statistics: What’s the connection? Computing Science and Statistics 29(1), 3–9.
Friedman, J.H. (1997). On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery 1, 55–77. DOI: 10.1023/A:1009778005914
Friedman, J., Hastie, T., Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. The Annals of Statistics 28(2), 337–407. DOI: 10.1214/aos/1016218223
Friedman, J.H., Bentley, J.L., Finkel, R.A. (1977). An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software 3(3), 209–226. DOI: 10.1145/355744.355745
García, S., Derrac, J., Cano, J.R., Herrera, F. (2012). Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(3), 417–435. DOI: 10.1109/TPAMI.2011.142
Geisser, S. (1975). The predictive sample reuse method with applications. Journal of the American Statistical Association 70(350), 320–328. DOI: 10.2307/2285815
Gilmour, S.G. (1996). The interpretation of mallows’s \(C_p\)-statistic. Journal of the Royal Statistical Society: Series D 45(1), 49–56. DOI: 10.2307/2348411
Glur, C. (2020). data.tree: General Purpose Hierarchical Data Structure. R package version 1.0.0. URL https://CRAN.R-project.org/package=data.tree
Gneiting, T., Raftery, A.E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association 102(477), 359–378. DOI: 10.1198/016214506000001437
Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep Learning, 1st ed. MIT Press. URL https://www.deeplearningbook.org/, ISBN: 978-0262035613
Gorman, K.B., Williams, T.D., Fraser, W.R. (2014). Ecological sexual dimorphism and environmental variability within a community of antarctic penguins (genus pygoscelis). PLOS ONE 9(3). DOI: 10.1371/journal.pone.0090081
Gower, J.C. (1971). A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871. DOI: 10.2307/2528823
Greenwell, B., Boehmke, B., Cunningham, J., Developers, G. (2020). gbm: Generalized Boosted Regression Models. R package version 2.1.8. URL https://CRAN.R-project.org/package=gbm
Grund, B., Hall, P., Marron, J.S. (1994). Loss and risk in smoothing parameter selection. Journal of Nonparametric Statistics 4(2), 107–132. DOI: 10.1080/10485259408832605
H2O.ai (2021). R Interface for H2O. R package version 3.32.1.3. URL https://github.com/h2oai/h2o-3
Hall, P., Samworth, R.J. (2005). Properties of bagged nearest neighbour classifiers. Journal of the Royal Statistical Society: Series B 67(3), 363–379. DOI: 10.1111/j.1467-9868.2005.00506.x
Hand, D.J., Till, R.J. (2001). A simple generalisation of the area under the roc curve for multiple class classification problems. Machine Learning 45, 171–186. DOI: 10.1023/A:1010920819831
Hand, D.J., Yu, K. (2007). Idiot’s bayes — not so stupid after all? International Statistical Review 69(3), 385–398. DOI: 10.1111/j.1751-5823.2001.tb00465.x
Hart, P. (1968). The condensed nearest neighbor rule. IEEE Transactions on Information Theory 14(3), 515–516. DOI: 10.1109/TIT.1968.1054155
Hastie, T., Tibshirani, R., Friedman, J. (2009). The Elements of Statistical Learning, 2nd ed, Springer Series in Statistics. Springer. URL https://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLII_print12_toc.pdf, ISBN: 978-0387848570
Hays, J., Efros, A.A. (2008). IM2GPS: estimating geographic information from a single image, in: 2008 Ieee Conference on Computer Vision and Pattern Recognition. pp. 1–8. DOI: 10.1109/CVPR.2008.4587784
Ho, T.K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844. DOI: 10.1109/34.709601
Holmes, C.C., Adams, N.M. (2002). A probabilistic nearest neighbour method for statistical pattern recognition. Journal of the Royal Statistical Society: Series B 64, 295–306. DOI: 10.1111/1467-9868.00338
Horst, A.M., Hill, A.P., Gorman, K.B. (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. R package version 0.1.0. URL https://allisonhorst.github.io/palmerpenguins/
Hothorn, T., Hornik, K., Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics 15(3), 651–674. DOI: 10.1198/106186006X133933
Hyafil, L., Rivest, R.L. (1976). Constructing optimal binary decision trees is NP-complete. Information Processing Letters 5(1), 15–17. DOI: 10.1016/0020-0190(76)90095-8
James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). An Introduction to Statistical Learning, 1st ed, Springer Texts in Statistics. Springer. URL https://www.statlearning.com/s/ISLR-Seventh-Printing.pdf, ISBN: 978-1461471370
Jenkins, P.A., Johansen, A.M., Evers, L. (2021). APTS: Computer Intensive Statistics Notes. URL https://warwick.ac.uk/fac/sci/statistics/apts/students/resources/cis-notes.pdf
Jones, M.C., Marron, J.S., Sheather, S.J. (1996). A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association 91(433), 401–407. DOI: 10.2307/2291420
Jones, M.L. (2018). How we became instrumentalists (again): Data positivism since World War II. Historical Studies in the Natural Sciences 48(5), 673–684. DOI: 10.1525/hsns.2018.48.5.673
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree, in: Proceedings of the 30th International Conference on Neural Information Processing Systems. URL https://papers.nips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html
Ke, G., Soukhavong, D., Lamb, J., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.-Y. (2021). lightgbm: Light Gradient Boosting Machine. R package version 3.2.1. URL https://CRAN.R-project.org/package=lightgbm
Kearns, M. (1988). Thoughts on hypothesis boosting. Machine Learning Class Project (unpublished) 1–9. URL http://www.cis.upenn.edu/~mkearns/papers/boostnote.pdf
Kohler, M., Krzyżak, A., Walk, H. (2006). Rates of convergence for partitioning and nearest neighbor regression estimates with unbounded data. Journal of Multivariate Analysis 97(2), 311–323. DOI: 10.1016/j.jmva.2005.03.006
Kuhn, M., Johnson, K. (2020). Feature Engineering and Selection: A Practical Approach for Predictive Models, 1st ed, Data Science Series. Chapman & Hall/CRC. ISBN: 978-1138079229
Kuhn, M., Wickham, H. (2020). Tidymodels: a collection of packages for modeling and machine learning using tidyverse principles. URL https://www.tidymodels.org
Lang, M., Binder, M., Richter, J., Schratz, P., Pfisterer, F., Coors, S., Au, Q., Casalicchio, G., Kotthoff, L., Bischl, B. (2019). mlr3: A modern object-oriented machine learning framework in R. Journal of Open Source Software 4(44), 1903. DOI: 10.21105/joss.01903
Larson, S.C. (1931). The shrinkage of the coefficient of multiple correlation. Journal of Educational Psychology 22(1), 45–55. DOI: 10.1037/h0072400
Leslie, D. (2019). Understanding artificial intelligence ethics and safety: A guide for the responsible design and implementation of ai systems in the public sector. Technical Report, The Alan Turing Institute. DOI: 10.5281/zenodo.3240529
Li, K. (1984). Consistency for cross-validated nearest neighbor estimates in nonparametric regression. The Annals of Statistics 12(1), 230–240. DOI: 10.1214/aos/1176346403
Liaw, A., Wiener, M. (2002). Classification and regression by randomForest. R News 2(3), 18–22. URL https://www.r-project.org/doc/Rnews/Rnews_2002-3.pdf
Liley, J., Emerson, S.R., Mateen, B.A., Vallejos, C.A., Aslett, L.J.M., Vollmer, S.J. (2021). Model updating after interventions paradoxically introduces bias, in: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics. pp. 3916–3924. URL http://proceedings.mlr.press/v130/liley21a.html
Long, P.M., Servedio, R.A. (2010). Random classification noise defeats all convex potential boosters. Machine Learning 78, 287–304. DOI: 10.1007/s10994-009-5165-z
Lugosi, G., Nobel, A. (1996). Consistency of data-driven histogram methods for density estimation and classification. The Annals of Statistics 24(2), 687–706. DOI: 10.1214/aos/1032894460
Majka, M. (2019). naivebayes: High Performance Implementation of the Naive Bayes Algorithm in R. R package version 0.9.7. URL https://CRAN.R-project.org/package=naivebayes
Malik, M.M. (2020). A hierarchy of limitations in machine learning. arXiv (2002.05193). URL https://arxiv.org/abs/2002.05193
Mallows, C.L. (1973). Some comments on \(C_p\). Technometrics 15(4), 661–675. DOI: 10.1080/00401706.1973.10489103
Maxwell, J.C. (1860). On the theory of compound colours, and the relations of the colours of the spectrum. Proceedings of the Royal Society of London 10, 404–409. DOI: 10.1098/rspl.1859.0074
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys 54(6), 1–35. DOI: 10.1145/3457607
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F. (2021). e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7-8. URL https://CRAN.R-project.org/package=e1071
Milborrow, S. (2021). rpart.plot: Plot ’rpart’ Models: An Enhanced Version of ’plot.rpart’. R package version 3.1.0. URL https://CRAN.R-project.org/package=rpart.plot
Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning 4, 227–243. DOI: 10.1023/A:1022604100933
Mingers, J. (1987). Expert systems – rule induction with statistical data. Journal of the Operational Research Society 38, 39–47. DOI: 10.1057/jors.1987.5
Mohri, M., Rostamizadeh, A., Talwalkar, A. (2018). Foundations of Machine Learning, 2nd ed. MIT Press. URL https://www.dropbox.com/s/7voitv0vt24c88s/10290.pdf?dl=1, ISBN: 978-0262039406
Molnar, C., Casalicchio, G., Bischl, B. (2021). Interpretable machine learning – a brief history, state-of-the-art and challenges, in: ECML Pkdd 2020 Workshops. pp. 417–431. DOI: 10.1007/978-3-030-65965-3_28
Moons, K.G.M., Altman, D.G., Reitsma, J.B., Ioannidis, J.P.A., Macaskill, P., Steyerberg, E.W., Vickers, A.J., Ransohoff, D.F., Collins, G.S. (2015). Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration. Annals of Internal Medicine 162(1), W1–W73. DOI: 10.7326/M14-0698
Morgan, J.N., Sonquist, J.A. (1963). Problems in the analysis of survey data, and a proposal. Journal of the American Statistical Association 58(302), 415–434. DOI: 10.2307/2283276
Murphy, K.P. (2012). Machine Learning: A Probabilistic Perspective, 1st ed. MIT Press. ISBN: 978-0262018029
Murthy, S.K., Salzberg, S. (1995). Decision tree induction: How effective is the greedy heuristic?, in: Proceedings of the First International Conference on Knowledge Discovery and Data Mining. pp. 222–227. URL https://www.aaai.org/Library/KDD/kdd95contents.php
Nadaraya, E.A. (1964). On estimating regression. Theory of Probability & Its Applications 9(1), 141–142. DOI: 10.1137/1109020
Nagy, G. (1968). State of the art in pattern recognition. Proceedings of the IEEE 56(5), 836–863. DOI: 10.1109/PROC.1968.6414
Neyshabur, B., Tomioka, R., Srebro, N. (2015). In search of the real inductive bias: On the role of implicit regularization in deep learning, in: Proceedings of the Third International Conference on Learning Representations. URL https://arxiv.org/abs/1412.6614
Ng, A.Y., Jordan, M.I. (2001). On discriminative vs. Generative classifiers: A comparison of logistic regression and naive bayes, in: Proceedings of the 14th International Conference on Neural Information Processing Systems. pp. 841–848. URL http://papers.nips.cc/paper/2020-on-discriminative-vs-generative-classifiers-a-comparison-of-logistic-regression-and-naive-bayes.pdf
Niculescu-Mizil, A., Caruana, R. (2005). Predicting good probabilities with supervised learning, in: Proceedings of the 22nd International Conference on Machine Learning. pp. 625–632. DOI: 10.1145/1102351.1102430
Nobel, A. (1996). Histogram regression estimation using data-dependent partitions. The Annals of Statistics 24(3), 1084–1105. DOI: 10.1214/aos/1032526958
Ogden, H.E., Davison, A.C., Forster, J.J., Woods, D.C., Overstall, A.M. (2021). APTS: Statistical Modelling Notes. URL https://warwick.ac.uk/fac/sci/statistics/apts/students/resources/statmod-notes.pdf
Parzen, E. (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics 33(3), 1065–1076. DOI: 10.1214/aoms/1177704472
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830. URL https://www.jmlr.org/papers/v12/pedregosa11a.html
Platt, J.C. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods, in: Advances in Large Margin Classifiers. pp. 61–74. ISBN: 978-0262283977
Quinlan, J.R. (1993). C4.5: Programs for Machine Learning, 1st ed. Morgan Kaufmann Publishers. ISBN: 978-1558602380
Quinlan, J.R. (1986). Induction of decision trees. Machine Learning 1, 81–106. DOI: 10.1007/BF00116251
R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/
Robert, C.P. (2007). The Bayesian Choice, 2nd ed, Springer Texts in Statistics. Springer. ISBN: 978-0387715988
Robertson, T., Wright, F.T., Dykstra, R.L. (1988). Order restricted statistical inference, 1st ed. Wiley. ISBN: 978-0471917878
Rosenblatt, M. (1971). Curve estimates. The Annals of Mathematical Statistics 42(6), 1815–1842. DOI: 10.1214/aoms/1177693050
Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics 27(3), 832–837. DOI: 10.1214/aoms/1177728190
Rosset, S., Tibshirani, R.J. (2020). From fixed-x to random-x regression: Bias-variance decompositions, covariance penalties, and prediction error estimation. Journal of the American Statistical Association 115(529), 138–151. DOI: 10.1080/01621459.2018.1424632
Sachs, M.C. (2017). plotROC: A Tool for Plotting ROC Curves. Journal of Statistical Software, Code Snippets 79(2), 1–19. DOI: 10.18637/jss.v079.c02
Schapire, R.E. (1990). The strength of weak learnability. Machine Learning 5, 197–227. DOI: 10.1023/A:1022648800760
Schapire, R.E., Freund, Y. (2012). Boosting: Foundations and Algorithms, 1st ed. MIT Press. ISBN: 978-0262017183
Scott, D.W. (2015). Multivariate Density Estimation: Theory, Practice, and Visualization, 2nd ed, Wiley Series in Probability and Statistics. Wiley. ISBN: 978-1118575574
Shao, J. (1993). Linear model selection by cross-validation. Journal of the American Statistical Association 88(422), 486–494. DOI: 10.2307/2290328
Shaw, S.C., Rougier, J.C. (2020). APTS: Statistical Inference Notes. URL https://warwick.ac.uk/fac/sci/statistics/apts/students/resources/lecturenotes.pdf
Sheather, S.J. (2004). Density estimation. Statistical Science 19(4), 588–597. DOI: 10.1214/088342304000000297
Sheather, S.J., Jones, M.C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society: Series B 53(3), 683–690. DOI: 10.1111/j.2517-6161.1991.tb01857.x
Shmueli, G. (2010). To explain or to predict? Statistical Science 25(3), 289–310. DOI: 10.1214/10-STS330
Silverman, B.W. (1998). Density Estimation for Statistics and Data Analysis, 1st ed. Chapman & Hall/CRC. ISBN: 978-0412246203
Stein, C.M. (1981). Estimation of the mean of a multivariate normal distribution. The Annals of Statistics 9(6), 1135–1151. DOI: 10.1214/aos/1176345632
Stone, C.J. (1977). Consistent nonparametric regression. The Annals of Statistics 5(4), 595–645. DOI: 10.1214/aos/1176343886
Stone, M. (1977). An asymptotic equivalence of choice of model by cross-validation and akaike’s criterion. Journal of the Royal Statistical Society: Series B 39(1), 44–47. DOI: 10.1111/j.2517-6161.1977.tb01603.x
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B 36(2), 111–133. DOI: 10.1111/j.2517-6161.1974.tb00994.x
Štrumbelj, E. (2018). Predictive model evaluation. Course Notes. URL https://file.biolab.si/textbooks/ml1/model-evaluation.pdf
Terrell, G.R., Scott, D.W. (1992). Variable kernel density estimation. The Annals of Statistics 20(3), 1236–1265. DOI: 10.1214/aos/1176348768
The CONSORT-AI and SPIRIT-AI Steering Group (2019). Reporting guidelines for clinical trials evaluating artificial intelligence interventions are needed. Nature Medicine 25, 1467–1468. DOI: 10.1038/s41591-019-0603-3
Therneau, T., Atkinson, B. (2019). rpart: Recursive Partitioning and Regression Trees. R package version 4.1-15. URL https://CRAN.R-project.org/package=rpart
The Turing Way Community, Arnold, B., Bowler, L., Gibson, S., Herterich, P., Higman, R., Krystalli, A., Morley, A., O’Reilly, M., Whitaker, K. (2019). The Turing Way: A Handbook for Reproducible Data Science. Zenodo v0.0.4. DOI: 10.5281/zenodo.3233986
Tibshirani, R.J. (2015). Degrees of freedom and model search. Statistica Sinica 25(3), 1265–1296. DOI: 10.5705/ss.2014.147
Tibshirani, R.J., Rosset, S. (2019). Excess optimism: How biased is the apparent error of an estimator tuned by sure? Journal of the American Statistical Association 114(526), 697–712. DOI: 10.1080/01621459.2018.1429276
Tukey, J.W. (1962). The future of data analysis. The Annals of Mathematical Statistics 33(1), 1–67. DOI: 10.1214/aoms/1177704711
Van Calster, B., McLernon, D.J., Van Smeden, M., Wynants, L., Steyerberg, E.W. (2019). Calibration: The Achilles heel of predictive analytics. BMC Medicine 17(230). DOI: 10.1186/s12916-019-1466-7
Van Calster, B., Nieboer, D., Vergouwe, Y., De Cock, B., Pencina, M.J., Steyerberg, E.W. (2016). A calibration hierarchy for risk models was defined: From utopia to empirical data. Journal of Clinical Epidemiology 74, 167–176. DOI: 10.1016/j.jclinepi.2015.12.005
Van Calster, B., Vickers, A.J. (2014). Calibration of risk prediction models: Impact on decision-analytic performance. Medical Decision Making 35(2), 162–169. DOI: 10.1177/0272989X14547233
van der Laan, M.J., Polley, E.C., Hubbard, A.E. (2007). Super learner. Statistical Applications in Genetics and Molecular Biology 6(1). DOI: 10.2202/1544-6115.1309
van der Ploeg, T., Austin, P.C., Steyerberg, E.W. (2014). Modern modelling techniques are data hungry: A simulation study for predicting dichotomous endpoints. BMC Medical Research Methodology 14(137). DOI: 10.1186/1471-2288-14-137
Vapnik, V.N. (1998). The Nature of Statistical Learning Theory, 2nd ed. Springer. ISBN: 978-0387987804
Viering, T., Loog, M. (2021). The shape of learning curves: A review. arXiv (2103.10948). URL https://arxiv.org/abs/2103.10948
Wager, S. (2020). Cross-validation, risk estimation, and model selection: Comment on a paper by Rosset and Tibshirani. Journal of the American Statistical Association 115(529), 157–160. DOI: 10.1080/01621459.2020.1727235
Wand, M.P. (2021). KernSmooth: Functions for Kernel Smoothing Supporting Wand & Jones (1995). R package version 2.23-20. URL https://CRAN.R-project.org/package=KernSmooth
Wand, M.P., Jones, M.C. (1995). Kernel Smoothing, 1st ed, Monographs on Statistics and Applied Probability. Chapman & Hall/CRC. ISBN: 978-0412552700
Wang, J., Shen, X. (2006). Estimation of generalization error: Random and fixed inputs. Statistica Sinica 16(2), 569–588. URL http://www3.stat.sinica.edu.tw/statistica/J16N2/J16N213/J16N213.html
Warner, H.R., Toronto, A.F., Veasey, L.G., Stephenson, R. (1961). A mathematical approach to medical diagnosis: Application to congenital heart disease. JAMA 177(3), 177–183. DOI: 10.1001/jama.1961.03040290005002
Wasserman, L.A. (2014). Rise of the machines, in: Lin, X., Genest, C., Banks, D.L., Molenberghs, G., Scott, D.W., Wang, J.-L. (Eds.), Past, Present, and Future of Statistical Science. Chapman; Hall/CRC, pp. 525–536. DOI: 10.1201/b16720
Watson, G.S. (1964). Smooth regression analysis. Sankhyā: The Indian Journal of Statistics, Series A 26(4), 359–372. URL http://www.jstor.org/stable/25049340
Whittle, P. (1958). On the smoothing of probability density functions. Journal of the Royal Statistical Society: Series B 20(2), 334–343. DOI: 10.1111/j.2517-6161.1958.tb00298.x
Wilson, D.L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics SMC-2(3), 408–421. DOI: 10.1109/TSMC.1972.4309137
Wolpert, D.H. (1996). The lack of a priori distinctions between learning algorithms. Neural Computation 8, 1341–1390. DOI: 10.1162/neco.1996.8.7.1341
Wong, W.H. (1983). On the consistency of cross-validation in kernel nonparametric regression. The Annals of Statistics 11(4), 1136–1141. DOI: 10.1214/aos/1176346327
Wright, M.N., Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. Journal of Statistical Software 77(1), 1–17. DOI: 10.18637/jss.v077.i01
Yang, Y. (2005). Can the strengths of aic and bic be shared? A conflict between model indentification and regression estimation. Biometrika 92(4), 937–950. DOI: 10.1093/biomet/92.4.937
Yousef, W.A. (2020). A leisurely look at versions and variants of the cross validation estimator. arXiv (1907.13413). URL https://arxiv.org/abs/1907.13413
Yu, Y. (2021). APTS: High-dimensional Statistics Notes. URL https://warwick.ac.uk/fac/sci/statistics/apts/students/resources/hdsnotes.pdf
Zadrozny, B., Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates, in: Proceedings of the Eigth Acm Sigkdd International Conference on Knowledge Discovery and Data Mining. pp. 694–699. DOI: 10.1145/775047.775151
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O. (2017). Understanding deep learning requires rethinking generalization, in: Proceedings of the Fifth International Conference on Learning Representations. URL https://arxiv.org/abs/1611.03530
Епанечников, В.А. (1969). Непараметрическая оценка многомерной плотности вероятности. Теория вероятн. и ее примен. 14(1), 156–161. URL http://mi.mathnet.ru/tvp1130