This is the fourth part of a posting series (I, II and III) which deals with pushing the declarative boundary of SQL engines in the domain of statistics as far as possible.

In the previous examples, we have gathered samples for (piecewise-linear) risk probabilities.

Now we would rather use these samples to regress a typical exponential risk probability disttibution. The appeal of this particular family of distributions is not only that it can adapt to the most reasonable shapes (increasing, decreasing, even a kind of normal distribution), but that it has most interesting numeric properties.

In particular, by transforming the exponential regression problem with a double-logarithmic representation, we gather the simplest linear regression problem (one variable, one target) which we can even compute “by hand”.

The remainder of this posting is now rather trivial, given the already presented definitions of the database schema:

select material
-- the exponential shape parameter corresponds to the linear slope parameter
, beta
-- the exponential acceleration parameter can be computed out of the linear horizontal translation
-- be aware of zero slopes
, exp(avg_ln_age-case when beta 0 then avg_ln_ln_surv/beta else 0 end) as alpha
from (
-- Aggregate the linear coefficients per material and solve the equation
select material
, (count(*)*sum(ln_age_surv)-sum(ln_age_sum_surv))/(count(*)*sum(ln_age_sqr)-sum(ln_age)*sum(ln_age)) as beta
-- prepare computation of horizontal translation
, avg(ln_age) as avg_ln_age
, avg(ln_ln_surv) as avg_ln_ln_surv
from (
-- Next we prepare the equation solving by some
-- windowing/aggregation
select material
, ln_age
, ln_ln_surv
, ln_age*ln_age as ln_age_sqr
, ln_age*ln_ln_surv as ln_age_surv
, ln_age*sum(ln_ln_surv) over (partition by material) as ln_age_sum_surv
from (
-- Here, we transform the original problem into the
-- logarithmic scale, edge probabilities are somehow avoided
select material
, ln(age) as ln_age
, ln(ln(1/(1-greatest(least(probability,0.999999),0.00001)))) as ln_ln_surv
from repair_estimations
) transf(material, ln_age, ln_ln_surv)
) compl(material,ln_age,ln_ln_surv,ln_age_sqr, ln_age_surv, ln_age_sum_surv)
group by material
) final(material,beta,avg_ln_age,avg_ln_ln_surv)
;

### Like this:

Like Loading...

*Related*