PHP DevCenter

oreilly.comSafari Books Online.Conferences.

We've expanded our LAMP news coverage and improved our search! Search for all things LAMP across O'Reilly!

Search
Search Tips

advertisement

Listen Print Discuss Subscribe to PHP Subscribe to Newsletters

Building a Simple Search Engine with PHP

by Daniel Solin
10/24/2002

A little while ago, I was working on an intranet site for a mid-sized company. As the site grew in both size and popularity, the assigner requested me to extend the site with a search feature. Since one of the rules of the intranet was that all logic code should be written in-house, using an existing open source engine was not an option.

Within a day, the engine was quite complete, and the result actually turned out better than expected. With PHP, MySQL, and a few techniques, these small projects are very easy. This article presents a cut-down version of the search engine. I hope this will encourage you to develop an engine that suits your particular needs, with the exact features you desire.

Database Design and Logic

We'll use MySQL as a database backend to store our search data. It's possible to shell out to Unix commands such as grep and find, but that would mean running the search engine on the machine hosting the files. As well, it would be more difficult to index pages served from a database. We'll tackle the database first.

The database for the search engine consists of three tables: page, word, and occurrence. page holds all indexed web pages, and word holds all of the words found on the indexed pages. The rows in occurrence correlate words to their containing pages. Each row represents one occurrence of one particular word on one particular page. The SQL for creating these tables are shown below.

CREATE TABLE page (
   page_id int(10) unsigned NOT NULL auto_increment,
   page_url varchar(200) NOT NULL default '',
   PRIMARY KEY (page_id)
) TYPE=MyISAM;

CREATE TABLE word (
   word_id int(10) unsigned NOT NULL auto_increment,
   word_word varchar(50) NOT NULL default '',
   PRIMARY KEY (word_id)
) TYPE=MyISAM;

CREATE TABLE occurrence (
   occurrence_id int(10) unsigned NOT NULL auto_increment,
   word_id int(10) unsigned NOT NULL default '0',
   page_id int(10) unsigned NOT NULL default '0',
   PRIMARY KEY (occurrence_id)
) TYPE=MyISAM;

While page and word hold actual data, occurrence acts only as a reference table. By joining occurrence with page and word, we can determine which pages contain a word, as well as how many times the word occurs. Before that, though, we need some data.

Web Database Applications with PHP, and MySQL

Related Reading

Web Database Applications with PHP, and MySQL
By Hugh E. Williams, David Lane

Pages: 1, 2, 3

Next Pagearrow




Recommended for You

Tagged Articles

Post to del.icio.us

This article has been tagged:

php

Articles that share the tag php:

Understanding MVC in PHP (477 tags)

The PHP Scalability Myth (123 tags)

The Dynamic Duo of PEAR::DB and Smarty (53 tags)

PHP Form Handling (43 tags)

Very Dynamic Web Interfaces (39 tags)

View All

search

Articles that share the tag search:

MySQL FULLTEXT Searching (93 tags)

Find What You Want with Plucene (22 tags)

Building a Vector Space Search Engine in Perl (18 tags)

Google Your Desktop (14 tags)

Dreaming of an Atom Store: A Database for the Web (14 tags)

View All

mysql

Articles that share the tag mysql:

MySQL FULLTEXT Searching (155 tags)

Live Backups of MySQL Using Replication (152 tags)

Advanced MySQL Replication Techniques (125 tags)

Ten MySQL Best Practices (59 tags)

Rolling with Ruby on Rails (56 tags)

View All

Sponsored Resources

  • Inside Lightroom
Advertisement

Sponsored by:

O'Reilly Media

©2010, O'Reilly Media, Inc.
(707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
About O'Reilly
Academic Solutions
Authors
Contacts
Customer Service
Jobs
Newsletters
O'Reilly Labs
Press Room
Privacy Policy
RSS Feeds
Terms of Service
User Groups
Writing for O'Reilly
Content Archive
Business Technology
Computer Technology
Google
Microsoft
Mobile
Network
Operating System
Digital Photography
Programming
Software
Web
Web Design
More O'Reilly Sites
O'Reilly Radar
Ignite
Tools of Change for Publishing
Digital Media
Inside iPhone
makezine.com
craftzine.com
hackszine.com
perl.com
xml.com

Partner Sites
InsideRIA
java.net
O'Reilly Insights on Forbes.com