O'Reilly Databases

oreilly.comSafari Books Online.Conferences.

We've expanded our coverage and improved our search! Search for all things Database across O'Reilly!

Search Search Tips

advertisement
AddThis Social Bookmark Button

Listen Print Discuss Subscribe to Databases Subscribe to Newsletters

ANOVA Statistical Programming with PHP
Pages: 1, 2, 3, 4

The bulk of the code involves calculating the value of various instance variables to use in subsequent reporting steps. Most of these instance variables are associative arrays with indices such as total, between, and within. This is because the ANOVA procedure involves computing the total variance (in our test scores) and partitioning it into between-group (i.e., between treatment levels) and within-group (i.e., within a treatment level) variance estimates.

At the end of the analyze method we evaluate the probability of the observed F score by first instantiating an FDistribution class with our degrees of freedom parameters:

$F = new FDistribution($this->df["between"], $this->df["within"]);

To obtain the probability of the obtained F score we subtract 1 minus the value returned by the cumulative distribution function applied to the obtained F score:

$this->p = 1 - $F->CDF($this->f);

Finally, we invoke the inverse cumulative distribution function using 1 minus our alpha setting (i.e., 1 - 0.05) in order set a critical F value that defines the decision criterion we will use to reject the null hypothesis which states that there is no difference between treatment-level means.

$this-crit = $F->inverseCDF(1 - $this->alpha);

If our observed F score is visibly greater than the critical F score, we can conclude that at least one of the means differs significantly from the others. A p value (i.e., $this->p) value less than 0.05 (or whatever your null rejection setting is) would also lead you to reject the null hypothesis.

The formula for decomposing the total sum of squares (first term) into a between-groups component (second term) and a within-group component (third term) appears in Figure 1.

\sum_{t=1}^{k}\sum_{i=1}^{n_t}(y_{ti} - \bar{y})^2 = \sum_{t=1}^{k}n_t(\bar{y}_t - \bar{y})^2 + \sum_{t=1}^{k}\sum_{i=1}^{n_t}(y_{ti} - \bar{y}_t)^2
Figure 1. Formula for decomposing the sum of squares.

The symbol y overbar stands for the grand mean and the symbol yt overbar stands for the treatment mean.

Step 2: Show Raw Data

It is always good to begin your analysis by making sure that you've properly loaded your data. We can call the showRawData() method to dump our test-anxiety data table to a web browser.

<?php
/*
* Output contents of database table.
*/  
function showRawData() {
  global $db;
  $data        = $db->tableInfo($this->table, DB_TABLEINFO_ORDER);
  $columns     = array_keys($data["order"]);
  $num_columns = count($columns);

  ?>

  <table cellspacing='0' cellpadding='0'>
    <tr>
      <td>
        <table border='1' cellspacing='0' cellpadding='3'>
        <?php
          print "<tr bgcolor='ffffcc'>";

          for ($i=0; $i < $num_columns; $i++) {
            print "<td align='center'><b>".$columns[$i]."</b></td>";
          }

          print "</tr>";

          $fields = implode(",", $columns); 
          $sql    = " SELECT $fields FROM $this->table ";
          $result = $db->query($sql);

          if (DB::isError($result)) {
            die( $result->getMessage());
          } else {
            while ($row = $result->fetchRow()) { 
              print "<tr>";

              foreach( $row as $key=>$value) {
                print "<td>$value</td>";
              }

              print "</tr>";
            }
          }
          ?>
        </table>
      </td>
    </tr>
  </table>
  <?php
}  
?>

This code generates as output the table below:

Table 1. Show Raw Data

idanxietyscore
1low26
2low34
3low46
4low48
5low42
6low49
7low74
8low61
9low51
10low53
11moderate51
12moderate50
13moderate33
14moderate28
15moderate47
16moderate50
17moderate48
18moderate60
19moderate71
20moderate42
21high52
22high64
23high39
24high54
25high58
26high53
27high77
28high56
29high63
30high59

A tip for data miners: Maybe you already have some data in your databases to which you can adapt this code. Look for situations where you have an enum data type to act as your treatment-level field and a corresponding integer or float column that measures some response associated with that treatment-level.

Pages: 1, 2, 3, 4

Next Pagearrow




Tagged Articles

Post to del.icio.us

This article has been tagged:

php

Articles that share the tag php:

Understanding MVC in PHP (477 tags)

The PHP Scalability Myth (123 tags)

The Dynamic Duo of PEAR::DB and Smarty (53 tags)

PHP Form Handling (43 tags)

Very Dynamic Web Interfaces (39 tags)

View All

statistics

Articles that share the tag statistics:

Calculating Entropy for Data Mining (3 tags)

Using Bloom Filters (2 tags)

Analyzing Baseball Stats with R (2 tags)

ANOVA Statistical Programming with PHP (2 tags)

View All

Sponsored Resources

  • Inside Lightroom

Related to this Article

Understanding Oracle Clinical Understanding Oracle Clinical
by Joan M. Johnson
May 2007
$9.99 USD

Inside SQLite Inside SQLite
by Sibsankar Haldar
April 2007
$9.99 USD

Advertisement
O'Reilly Media

©2009, O'Reilly Media, Inc.
(707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
About O'Reilly
Academic Solutions
Authors
Contacts
Customer Service
Jobs
Newsletters
O'Reilly Labs
Press Room
Privacy Policy
RSS Feeds
Terms of Service
User Groups
Writing for O'Reilly
Content Archive
Business Technology
Computer Technology
Google
Microsoft
Mobile
Network
Operating System
Digital Photography
Programming
Software
Web
Web Design
More O'Reilly Sites
O'Reilly Radar
Ignite
Tools of Change for Publishing
Digital Media
Inside iPhone
O'Reilly FYI
makezine.com
craftzine.com
hackszine.com
perl.com
xml.com

Partner Sites
InsideRIA
java.net
O'Reilly Insights on Forbes.com