Progressively Fusing Independent Probabilities using Proprietary SQL Windowing (Teradata version)

As announced in the second part of the posting series, the scheme to use aggregate User-Defined-Functions (UDF) for fusing probabilities will work in a similar fashion for all standard databases, such as Oracle and Teradata.

Similar to the Oracle formulation (in which the aggregation object itself keeps state and functional logic), here is the Teradata version consisting of a static logic class and a seperate state class that is serialized and passed):

```package de.jasna.teradata.repair;

import java.sql.SQLException;

import java.io.Serializable;

/**
* Storage class for ProbabilityUnion Java User Defined Function
* Remember that you have to reserve the appropriate context memory for its serialized form
* in the final function definition
*/
class ProStor implements Serializable {
// memory
private double probability=0;

/**
* Default Constructor
*/
public ProStor() {
}

/**
* @return current probability
*/
public double getProbability() {
return this.probability;
}

/**
* @param aProbability to set
*/
public void setProbability(double aProbability) {
this.probability = aProbability;
}
}

/**
* Teradata version of the union of independent probabilities
*/
public class ProbabilityUnion {

/**
* Compute method
* @param phase of aggregation
* @param context of aggregation including states
* @param aProbability value to integrate
* @return union of probabilities
*/
public static double unite(Phase phase
, Context[] context
, double aProbability)
throws SQLException {

try {
switch (phase.getPhase()) {

/* The AGR_INIT phase is executed once. */
case Phase.AGR_INIT:
ProStor newStor=new ProStor();
context.initCtx(newStor);
context.setObject(1, newStor);
return 0;

/* The AGR_DETAIL phase is executed once per row. */
case Phase.AGR_DETAIL:
ProStor yet = (ProStor) context.getObject(1);
yet.setProbability(yet.getProbability() + aProbability
- yet.getProbability() * aProbability);
context.setObject(1, yet);
return 0;

/* AGR_COMBINE runs over the distributed results */
case Phase.AGR_COMBINE:
ProStor res1 = (ProStor) context.getObject(1);
ProStor res2 = (ProStor) context.getObject(2);
res1.setProbability(res1.getProbability()
+ res2.getProbability() - res1.getProbability()
* res2.getProbability());
context.setObject(1, res1);
return 0;

/* we are through */
case Phase.AGR_FINAL:
ProStor finale = (ProStor) context.getObject(1);
return finale.getProbability();

/* prob of empty set is zero */
case Phase.AGR_NODATA:
return 0;

default:
throw new SQLException("Invalid Phase", "38905");
}
} catch (Exception ex) {
throw new SQLException(ex.getMessage(), "38101");
}
}

}```

Once you deployed the jar file under a unique id (e.g., “de.jasna.teradata.repair” for which your user needs “execute prodedure” grants on the ‘SQLJ’ database), you can define the aggregation UDF as follows (note that the storage class above needs some extra class aggregate space wrt the default 64 ?bytes?).

```create function union_indep_prop(aProbability float)
returns float
class aggregate(128)
specific union_indep_prop
language java
no sql
no external data
parameter style java
deterministic
called on null input
;```

Finally, we can define the hierarchical decomposition and probability propagation query (given the STRUCTURED_MATERIAL definition as in the previous post) as follows:

```/*
* A recursive view to decompose the bill of materials down to the leaf level
*/
create recursive view MATERIAL_HIERARCHY(root_material, component_material,depth) as (
SELECT STRUCTURED_MATERIAL.composite_material as root_material
, STRUCTURED_MATERIAL.component_material as component_material
, 0 as depth
FROM STRUCTURED_MATERIAL
union all
SELECT MATERIAL_HIERARCHY.root_material
, STRUCTURED_MATERIAL.component_material
, MATERIAL_HIERARCHY.depth+1 as depth
FROM MATERIAL_HIERARCHY
join STRUCTURED_MATERIAL on MATERIAL_HIERARCHY.component_material=STRUCTURED_MATERIAL.composite_material
);

/*
* Here come the propagated probabilities
*/
insert into repair_estimations
select MATERIAL_HIERARCHY.root_material
, AGE.age
, min(AGE.age-1) over (partition by MATERIAL_HIERARCHY.root_material
order by AGE.age
rows between 1 following
and unbounded following) as next_age
, union_indep_prop(ALL_REPAIRS.probability) as probability
, union_indep_prop(ALL_REPAIRS.lower_bound_95) as lower_bound_95
, union_indep_prop(ALL_REPAIRS.upper_bound_95) as upper_bound_95
, max(MATERIAL_HIERARCHY.depth) as depth
from MATERIAL_HIERARCHY
join REPAIR_ESTIMATIONS AGE on AGE.material=MATERIAL_HIERARCHY.component_material and AGE.depth=0
join MATERIAL_HIERARCHY ALL_PARTS on ALL_PARTS.root_material=MATERIAL_HIERARCHY.root_material
join REPAIR_ESTIMATIONS ALL_REPAIRS on ALL_REPAIRS.material=ALL_PARTS.component_material and ALL_REPAIRS.depth=0
and AGE.age between ALL_REPAIRS.age and coalesce(ALL_REPAIRS.next_age, AGE.age)
group by MATERIAL_HIERARCHY.root_material
, AGE.age
;```

Some helpful links to get this implementation flying were this debugging guide and the Java Aggregate Function tutorial.