Calculating Entropy for Data Miners
Pages: 1, 2, 3, 4, 5
Joint Entropy Code
The JointEntropy.php class below extends the
Output.php class. The ConditionalEntropy.php class
also extends the Output.php class. The Output.php
class currently contains methods including showJointFrequency(),
showJointProbability(), and showConditionalProbability()
to render joint and conditional entropy object data in the form of HTML tables.
Feel free to overwrite or extend the Output.php class with other
tabular or graphical methods. The JointEntropy.php class contains only the "business logic" of the computation. The analyze()
method creates a joint entropy object that the
Output.php class accesses for rendering purposes.
<?php
/**
* @package IT
*/
require_once "Output.php";
/**
* Computes the joint entropy between two columns.
*/
class JointEntropy extends Output {
var $n = 0;
var $columns = array();
var $row_freqs = array();
var $col_freqs = array();
var $row_probs = array();
var $col_probs = array();
var $row_labels = array();
var $col_labels = array();
var $joint_freqs = array();
var $joint_probs = array();
var $bits = 0;
var $data = array();
var $table = "";
var $select = "";
var $where = "";
/* Methods for handling database table input */
function setTable($table) {
$this->table = $table;
}
function setSelect($sql) {
$this->select = $sql;
}
function setWhere($sql) {
$this->where = " WHERE ".$sql;
}
function getSQL() {
if (empty($this->select)) {
$sql = " SELECT ".$this->columns[0].",".$this->columns[1];
$sql .= " FROM $this->table ";
if (empty($this->where))
return $sql;
else
return $sql . $this->where;
} else
return $this->select;
}
function getFrequenciesFromTable() {
global $db;
$sql = $this->getSQL();
$result = $db->query($sql);
if (DB::isError($result))
die($result->getMessage());
else {
$n = 0;
while($row = $result->fetchRow()) {
$a = $row[$this->columns[0]];
$b = $row[$this->columns[1]];
$this->joint_freqs[$a][$b]++;
$this->row_freqs[$a]++; // aka row marginals
$this->col_freqs[$b]++; // aka col marginals
$n++;
}
$this->n = $n;
}
return true;
}
/* Methods for handling array input */
function setArray($data) {
$this->data = $data;
}
function getFrequenciesFromArray() {
$this->n = count($this->data);
for ($i=0; $i < $this->n; $i++) {
$a = $this->data[$i][0];
$b = $this->data[$i][1];
$this->joint_freqs[$a][$b]++;
$this->row_freqs[$a]++; // aka row marginals
$this->col_freqs[$b]++; // aka col marginals
}
}
/* Shared methods */
function setColumns($columns) {
$parts = explode(",",$columns);
$this->columns[0] = trim($parts[0]);
$this->columns[1] = trim($parts[1]);
}
function clear() {
$this->n = 0;
$this->row_freqs = array();
$this->col_freqs = array();
$this->row_probs = array();
$this->col_probs = array();
$this->row_labels = array();
$this->col_labels = array();
$this->joint_freqs = array();
$this->joint_probs = array();
$this->bits = 0;
}
function analyze() {
$this->clear();
if (empty($this->table))
$this->getFrequenciesFromArray();
else
$this->getFrequenciesFromTable();
$this->row_labels = array_keys($this->row_freqs);
$this->col_labels = array_keys($this->col_freqs);
$this->getProbabilities();
$this->getJointEntropyScore();
}
function getProbabilities() {
foreach($this->joint_freqs AS $key1=>$array) {
foreach($array AS $key2=>$val2) {
$this->joint_probs[$key1][$key2] =
$this->joint_freqs[$key1][$key2] / $this->n;
$this->row_probs[$key1] += $this->joint_probs[$key1][$key2];
$this->col_probs[$key2] += $this->joint_probs[$key1][$key2];
}
}
}
function getJointEntropyScore() {
foreach($this->joint_probs AS $key1=>$array)
foreach($array AS $key2=>$val2)
$this->bits -= $this->joint_probs[$key1][$key2] *
log($this->joint_probs[$key1][$key2], 2);
}
}
?>
There are two forms of accepted input for this script:
From a database table:
getFrequenciesFromTable()From a passed-in two-dimensional array:
getFrequenciesFromArray()



