Issue 182, article 8

DOI:https://doi.org/10.15407/kvt182.02.086

Kibern. vyčisl. teh., 2015, Issue 182, pp.

Nastenko I.А., Boyko A.L., Nosovets О.K., Teplyakov K.I., Pavlov V.А.

National Technical University of Ukraine “Kiev Polytechnical Institute” (Kiev)

SYNTHESIS OF LOGISITIC REGRESSION, BASED ON SELF-ORGANISATION PRINCIPLES OF MODELS

Introduction. Requirements for modeling algorithms and their implementations varies depending upon the desired properties of the models, which has to be received in restrictions on the available computational resources. Examples of desired properties — accuracy, efficiency ratings, the lowest sensitivity to a change in the data of the model error, variance estimation of parameters, p values etc. Depending on the specific use of models, those or other criteria are taken as a basis for designing specific algorithm simulation. However, choice of the solution of resulting model is usually left to the user. This article considers the possibility of stepwise regression algorithm’s automatic optimization of parameters that is based on principles of self-organization on an example of the synthesis of the logistic model.
The purpose of this article is the improvment the quality of logistic regression classification models due to automatic optimization multivariate binary logistic regression algorithm parameters.
Results. The essence of the modification of stepwise logistic regression standard algorithm: defines penter , pleave grid for each combination of the thresholds calculates stepwise logistic algorithm and the corresponding value of the external criteria. Proposed external criteria reflects the classification accuracy on the training and test datasets, on the one hand, and the requirement to balance the quality of recognition in each class on the other. The stated procedure is repeated for the next value of the grid parameters of the algorithm. Final evaluation of the model is given in the exam sample data. For logistic model calculation and quality’s comparison of classification between standard logistic regression (glm function in R software) and proposed version of modified stepwise algorithm were taken data obtained in the laboratory of functional diagnostics at Department of Physical Education NTUU “KPI”. The purpose of the example is to get a classifying function, of group of subjects with certain states of the cardiovascular system from the rest of the test sample. Standard algorithm demonstrated on examination sample classification quality — 81%, the area under the ROC — curve — 0.8685. Graphs of sensitivity and specificity, and ROC curve for modified algorithm showed the results: quality of the classification algorithm — 90.5 %, area under the ROC — curve — 0.9717.
Conclusions. Article proposes stepwise logistic regression based on the principles of self-organization synthesis algorithm. In order to optimize the parameters of the algorithm proposed by external criterion, which reflects the classification accuracy on the training and test samples and requirement to balance the quality of recognition in each class the effect was received. For the aboved example the classification of functional states of the cardiovascular system in comparison of the standard stepwise algorithm with the proposed algorithm has shown classification quality improvement on 10 % on examination sample.
Keywords: logistic regression, stepwise regression, self-organization’s principles.

Download full text (ru)!

References

  1. Strighov V., Krimova E., Selection methods of regression models — Moscow: CC RAS — 2010. — 45 p. (in Russian).
  2. Ivakhnenko A., Stepashko V. Noisestability modelling — Kiev: «Nauk.dumka». — 1985, — 216 p. (in Russian).
  3. Ivakhnenko A. Muller J. Self-organization of predictive models — Kiev: Technic. — 1984, — 223 p. (in Russian).
  4. Akaike H.A new look at the statistical model identification // IEEE Transactions on Automatic Control — 1974. — Vol. 19. — P.716–723.
  5. Schwarz E. Estimating the dimension of a model // Annals of Statistics — 1978. — Vol. 6. — № 2. — P. 461–464.
  6. Mallows C.L. Some Comments on CP//Technometrics — 1973. — Vol. 15. — № 4. — P. 661–675.
  7. Efroymson M.A. Multiple regression analysis // Mathematical Methods for Digital Computers — 1960.
  8. Green P.G. Iteratively Reweighted Least Squares for Maximum Likelihood Estimation, and some Robust and Resistant Alternatives (with discussions) // Journal of the Royal Statistical Society, Series — 1984. — B 46. — P. 149–192.

Received 15.06.2015