O'Reilly Databases

oreilly.comSafari Books Online.Conferences.

We've expanded our coverage and improved our search! Search for all things Database across O'Reilly!

Search Search Tips

advertisement
AddThis Social Bookmark Button

Listen Print Discuss Subscribe to Databases Subscribe to Newsletters

Calculating Entropy for Data Miners
Pages: 1, 2, 3, 4, 5

Joint Entropy Code

The JointEntropy.php class below extends the Output.php class. The ConditionalEntropy.php class also extends the Output.php class. The Output.php class currently contains methods including showJointFrequency(), showJointProbability(), and showConditionalProbability() to render joint and conditional entropy object data in the form of HTML tables. Feel free to overwrite or extend the Output.php class with other tabular or graphical methods. The JointEntropy.php class contains only the "business logic" of the computation. The analyze() method creates a joint entropy object that the Output.php class accesses for rendering purposes.

<?php
/**
* @package IT
*/
require_once "Output.php";
/**
* Computes the joint entropy between two columns.
*/
class JointEntropy extends Output {

  var $n = 0;

  var $columns = array();

  var $row_freqs = array();
  var $col_freqs = array();

  var $row_probs = array();
  var $col_probs = array();
 
  var $row_labels = array();
  var $col_labels = array();
 
  var $joint_freqs = array();
  var $joint_probs = array();

  var $bits = 0;

  var $data = array();

  var $table  = "";
  var $select = "";
  var $where  = "";

  /* Methods for handling database table input */

  function setTable($table) {
    $this->table = $table;
  }

  function setSelect($sql) {
    $this->select = $sql;
  }

  function setWhere($sql) {
    $this->where = " WHERE ".$sql;
  }

  function getSQL() {
    if (empty($this->select)) {
      $sql = " SELECT ".$this->columns[0].",".$this->columns[1];
      $sql .= " FROM $this->table ";
      if (empty($this->where))
        return $sql;
      else
        return $sql . $this->where;
    } else
      return $this->select;
  }

  function getFrequenciesFromTable() {
    global $db;
    $sql    = $this->getSQL();
    $result = $db->query($sql);
    if (DB::isError($result))
      die($result->getMessage());
    else {
      $n = 0;
      while($row = $result->fetchRow()) {
        $a = $row[$this->columns[0]];
        $b = $row[$this->columns[1]];
        $this->joint_freqs[$a][$b]++;
        $this->row_freqs[$a]++; // aka row marginals
        $this->col_freqs[$b]++; // aka col marginals
        $n++;
      }
      $this->n = $n;
    }
    return true;
  }

  /* Methods for handling array input */

  function setArray($data) {
    $this->data = $data;
  }

  function getFrequenciesFromArray() {
    $this->n = count($this->data);
    for ($i=0; $i < $this->n; $i++) {
      $a = $this->data[$i][0];
      $b = $this->data[$i][1];
      $this->joint_freqs[$a][$b]++;
      $this->row_freqs[$a]++; // aka row marginals
      $this->col_freqs[$b]++; // aka col marginals
    }
  }

  /* Shared methods */

  function setColumns($columns) {
    $parts = explode(",",$columns);
    $this->columns[0] = trim($parts[0]);
    $this->columns[1] = trim($parts[1]);
  }

  function clear() {
    $this->n           = 0;
    $this->row_freqs   = array();
    $this->col_freqs   = array();
    $this->row_probs   = array();
    $this->col_probs   = array();
    $this->row_labels  = array();
    $this->col_labels  = array();
    $this->joint_freqs = array();
    $this->joint_probs = array();
    $this->bits        = 0;
  }

  function analyze() {
    $this->clear();
    if (empty($this->table))
      $this->getFrequenciesFromArray();
    else
      $this->getFrequenciesFromTable();

    $this->row_labels = array_keys($this->row_freqs);
    $this->col_labels = array_keys($this->col_freqs);
    $this->getProbabilities();
    $this->getJointEntropyScore();
  }

  function getProbabilities() {
    foreach($this->joint_freqs AS $key1=>$array) {
      foreach($array AS $key2=>$val2) {
        $this->joint_probs[$key1][$key2] =
		    $this->joint_freqs[$key1][$key2] / $this->n;
        $this->row_probs[$key1] += $this->joint_probs[$key1][$key2];
        $this->col_probs[$key2] += $this->joint_probs[$key1][$key2];
      }
    }
  }

  function getJointEntropyScore() {
    foreach($this->joint_probs AS $key1=>$array)
      foreach($array AS $key2=>$val2)
        $this->bits -= $this->joint_probs[$key1][$key2] *
		               log($this->joint_probs[$key1][$key2], 2);
  }

}
?>

There are two forms of accepted input for this script:

  1. From a database table:

    getFrequenciesFromTable()
  2. From a passed-in two-dimensional array:

    getFrequenciesFromArray()

Pages: 1, 2, 3, 4, 5

Next Pagearrow




Tagged Articles

Post to del.icio.us

This article has been tagged:

datamining

Articles that share the tag datamining:

Data Mining Email (10 tags)

Massive Data Aggregation with Perl (9 tags)

Top Ten Data Crunching Tips and Tricks (8 tags)

Calculating Entropy for Data Mining (5 tags)

Calculating Entropy for Data Miners (3 tags)

View All

php

Articles that share the tag php:

Understanding MVC in PHP (477 tags)

The PHP Scalability Myth (123 tags)

The Dynamic Duo of PEAR::DB and Smarty (53 tags)

PHP Form Handling (43 tags)

Very Dynamic Web Interfaces (39 tags)

View All

software

Articles that share the tag software:

What Is Web 2.0 (185 tags)

Rolling with Ruby on Rails (97 tags)

How Does Open Source Software Stack Up on the Mac? (79 tags)

Calculating the True Price of Software (68 tags)

Delve into DEVONthink (30 tags)

View All

Sponsored Resources

  • Inside Lightroom

Related to this Article

Understanding Oracle Clinical Understanding Oracle Clinical
by Joan M. Johnson
May 2007
$9.99 USD

Inside SQLite Inside SQLite
by Sibsankar Haldar
April 2007
$9.99 USD

Advertisement
O'Reilly Media

©2009, O'Reilly Media, Inc.
(707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
About O'Reilly
Academic Solutions
Authors
Contacts
Customer Service
Jobs
Newsletters
O'Reilly Labs
Press Room
Privacy Policy
RSS Feeds
Terms of Service
User Groups
Writing for O'Reilly
Content Archive
Business Technology
Computer Technology
Google
Microsoft
Mobile
Network
Operating System
Digital Photography
Programming
Software
Web
Web Design
More O'Reilly Sites
O'Reilly Radar
Ignite
Tools of Change for Publishing
Digital Media
Inside iPhone
O'Reilly FYI
makezine.com
craftzine.com
hackszine.com
perl.com
xml.com

Partner Sites
InsideRIA
java.net
O'Reilly Insights on Forbes.com