machine learning - Why use multiple features in Linear Regression? -


linear regression defines y function of x. using function can predict y using values of x before occur (ignoring outliers).

uni-variate linear regression depends on 1 variable. more powerful form multi variate linear regression, instead of using 1 parameter : x , use multiple parameters. can not visualized on x,y plane, introducing 3 parameters can perhaps visualized, 4,5,6 parameters (dimensions) can not.

the idea being more parameters give better prediction. basis of ? why using multiple features improves quality of prediction ? intuitively understand more known problem more accurate prediction can made. why adding more features, or in other words dimensions increase function accuracy ? formal definition exist of ?

or trial , error - 1 feature may enough not know sure until test multiple features.

formal proof simple. the nature of mapping f cannot expressed function of features. can approximation, , adding more variables always expand space of possible approximators (to more strict - never reduces). while might harder find approximator in new space (and known algorithms fail), there greater chance exists. - if can create perfect set of features, example .... output values - adding actualy reduce quality of model. in real life - humans - not capable of finding such predictors , sample blindely can obtained, measured reality, , simple random guessing - each additional piece of information might usefull.

if prefer more mathematical solution, consider f being function of unknown set of features

f(x1, ..., xm) e r 

now can measure features in infinite space of raw signals r1, r2, ..., , each subset of raw signals there mapping onto these true features of f, various degree of correctness, have g1(r1, r2, r3) = (x1+er1, 0, x3+er3, ...); g2(r1) = (0, 0, x3+er4, ...) etc. trying build function finite subset of raw signals r approximate f, greater amount of r include, have better chance of capturing such elements, make approximating f possible. unfortunately - can add many redundant signals, or those, uncorrelated true features. might seen great problem of bias-variance. more features add, assuming in whole spectrum of possible signals (and can find related nature of f), more variance introduce. , on other hand - small set of features introduce high bias error (due strong assumptions required signals, , correlactions true features).

in particular, linear regression not suited working highly correlated signals, particular statistical model, adding new signals can lead destroying model. there strong, underlying assumption of lr f linear model of predictors normally distributed errors, equal variances among each dimension.


Comments