O'Reilly Databases

oreilly.comSafari Books Online.Conferences.

We've expanded our coverage and improved our search! Search for all things Database across O'Reilly!

Search Search Tips

advertisement
AddThis Social Bookmark Button

Print Subscribe to Databases Subscribe to Newsletters

MySQL FULLTEXT Searching

by Joe Stump
06/26/2003

Have you ever wanted to search text stored in your database, but couldn't figure out how to do it efficiently? Are you lazy like me and don't enjoy maintaining reverse indexes, dictionaries, and word scores? You're in luck. The release of MySQL 4.0 has made searching text stored in databases available to the masses.

MySQL has had FULLTEXT searching in one form or another since version 3.23.23. FULLTEXT indices in MySQL allow database administrators and programmers to designate any character-based field (CHAR, VARCHAR, or TEXT) as a FULLTEXT index, which allows for complex text searching against data stored in those fields.

This feature is not to be confused with the LIKE function in MySQL. LIKE works more along the lines of a regular expression. On the other hand, FULLTEXT indices are fully indexed fields which support stopwords, boolean searches, and relevancy ratings.

This article assumes you have a working installation of MySQL, a good understanding of how MySQL works, and a basic understanding of web programming (with PHP, Perl, or something similar). Further, this article may not be of any interest to those who are already using FULLTEXT indices in a production environment.

How it Works

The MySQL team has made it extremely easy to add FULLTEXT searching to your tables. They are created much like regular KEY and PRIMARY KEY indices. For the purpose of this article we are going to make a basic blog table, put some data into it, and start searching. Before we get ahead of ourselves we need to create some tables.

Creating the tables

-- The main blog table with our FULLTEXT index
-- Nothing extreme here, but you get the idea
CREATE TABLE blog_entries
(
  entryID INT(9) UNSIGNED NOT NULL DEFAULT '0' AUTO_INCREMENT,
  posted INT(11) UNSIGNED NOT NULL DEFAULT '0',
  categoryID TINYINT(2) UNSIGNED NOT NULL DEFAULT '0',
  title CHAR(255) NOT NULL DEFAULT '',
  entry TEXT NOT NULL DEFAULT '',
  PRIMARY KEY (entryID),
  KEY (posted),
  KEY (categoryID),
  FULLTEXT(title,entry)
);

-- A category table so you can organize your posts and 
-- later do some fun searching based on such data.
CREATE TABLE blog_categories
(
  categoryID TINYINT(2) UNSIGNED NOT NULL DEFAULT '0' AUTO_INCREMENT,
  name CHAR(75) NOT NULL DEFAULT '',
  PRIMARY KEY (categoryID)
);

-- Add some data into your blog
INSERT INTO blog_categories VALUES (1,'Personal');
INSERT INTO blog_categories VALUES (2,'Work');
INSERT INTO blog_categories VALUES (3,'Editorials');

INSERT INTO blog_entries 
VALUES (1,
        1050942000,
        1,
        'About miester',
        'I was born in michigan in 1980 in a small town called Adrian. 
         My mother is named Sue, while my father is named Mike. 
         They currently live in a small town called East Jordan. On April 
         27th, 2003 I will graduate from Eastern Michigan University with a 
         degree in Computer Information Systems.');

INSERT INTO blog_entries 
VALUES (2,
        1050942000,
        2,
        'Today at work',
        'While I was at work today I was having some problems
        with the RAID array. It seems that we have a rogue cron script that 
        is causing problems. When I find out more info I will post it here.');

INSERT INTO blog_entries 
VALUES (3,
        1050942000,
        1,
        'Vacation!',
        'After I graduate I am taking a 2 week vacation. On my 
         agenda is a trip to Washington DC to see my girlfriend\'s sister 
         as well as throwing a few discs at the local disc golf course.');

INSERT INTO blog_entries 
VALUES (4,
        1050942000,
        1,
        'Vacation!',
        'I have had a horrible cold for the last few days. Today I drank a
         revive vitamin water with 150% of my daily dose of vitamin C. That
         should help things out.');

Querying the Data

Now that we have data in our tables we can begin to query it. There are some restrictions to FULLTEXT searching, which are covered below. You will want to read over the restrictions before you use FULLTEXT indices in a production environment. For now we are going to do a simple query for the word mother.

mysql> SELECT entryID,title
    -> FROM blog_entries
    -> WHERE MATCH (title,entry) AGAINST('mother');
+---------+---------------+
| entryID | title         |
+---------+---------------+
|       1 | About miester |
+---------+---------------+
1 row in set (0.00 sec)

There are a few things to note when querying FULLTEXT indices. First, MySQL automatically orders the results by their relevancy rating. Second, queries that are longer than 20 characters will not be sorted. Third, and most importantly, the fields in the MATCH() should be identical to the fields listed in the table's FULLTEXT definition.

All other MySQL syntax works as you'd expect with a FULLTEXT search, meaning you can further limit your search terms. We could search blog entries based on posting date or category. If you let your imagination wander you can think of all sorts of ways to filter your data. Let's look for blog entries that only appear in the Personal category and match the term michigan.

mysql> SELECT E.entryID, E.title, C.name
    -> FROM blog_entries AS E, blog_categories AS C
    -> WHERE E.categoryID=C.categoryID AND
    ->       MATCH (E.title, E.entry) AGAINST ('michigan') AND
    ->       E.categoryID=1;
+---------+---------------+----------+
| entryID | title         | name     |
+---------+---------------+----------+
|       1 | About miester | Personal |
+---------+---------------+----------+
1 row in set (0.00 sec)

Note that we not only did a join but also filtered the results based on the category. Another thing to note is that FULLLTEXT indices are not case sensitive. If you would like to use MySQL's relevancy rating in your code you can add the MATCH() ... AGAINST() clause to your SELECT statement as well.

mysql> SELECT E.entryID, E.title, C.name,
    ->        MATCH (E.title, E.entry) AGAINST ('michigan') AS score
    -> FROM blog_entries AS E, blog_categories AS C
    -> WHERE E.categoryID=C.categoryID AND
    ->       MATCH (E.title, E.entry) AGAINST ('michigan') AND
    ->       E.categoryID=1;
+---------+---------------+----------+-----------------+
| entryID | title         | name     | score           |
+---------+---------------+----------+-----------------+
|       1 | About miester | Personal | 1.2635315656662 |
+---------+---------------+----------+-----------------+
1 row in set (0.00 sec)

Boolean Searches

Probably the most anticipated feature in MySQL 4.0's FULLTEXT is the ability to do boolean searches without having to process the query strings. This means you can add +s and -s to your queries, along with a host of other commands, and MySQL will interpret them for you.

mysql> SELECT E.entryID,E.title,C.name
    -> FROM blog_entries AS E, blog_categories AS C
    -> WHERE E.categoryID=C.categoryID AND
    ->       MATCH (E.title,E.entry) AGAINST ('+vacation -washington' IN BOOLEAN MODE) AND
    ->       E.categoryID=1;
+---------+-----------+----------+
| entryID | title     | name     |
+---------+-----------+----------+
|       4 | Vacation! | Personal |
+---------+-----------+----------+
1 row in set (0.00 sec)

We have two entries with the word vacation in the title, but since we removed washington, entryID 4 does not show up in the result. You can read all about IN BOOLEAN MODE on MySQL's FULLTEXT manual page.

Related Reading

MySQL Pocket Reference
By George Reese

Restrictions

A few restrictions affect MySQL FULLTEXT indices. Some of the default behaviors of these restrictions can be changed in your my.cnf or using the SET command.

  • FULLTEXT indices are NOT supported in InnoDB tables.
  • MySQL requires that you have at least three rows of data in your result set before it will return any results.
  • By default, if a search term appears in more than 50% of the rows then MySQL will not return any results.
  • By default, your search query must be at least four characters long and may not exceed 254 characters.
  • MySQL has a default stopwords file that has a list of common words (i.e., the, that, has) which are not returned in your search. In other words, searching for the will return zero rows.
  • According to MySQL's manual, the argument to AGAINST() must be a constant string. In other words, you cannot search for values returned within the query.

Altering FULLTEXT's Default Behavior

There are several ways to alter the default behavior of FULLTEXT. MySQL has some tips for fine tuning the FULLTEXT search, but the details are a little sparse. The most common problem is the four character minimum word length on queries. Before we go over that, let's review the variables associated with the FULLTEXT searching.

mysql> SHOW VARIABLES LIKE 'ft%';
+--------------------------+----------------+
| Variable_name            | Value          |
+--------------------------+----------------+
| ft_boolean_syntax        | + -><()~*:""&| |
| ft_min_word_len          | 2              |
| ft_max_word_len          | 254            |
| ft_max_word_len_for_sort | 20             |
| ft_stopword_file         | (built-in)     |
+--------------------------+----------------+
5 rows in set (0.00 sec)

The variable we wish to change is ft_min_word_len. According to the manual we should be able to change this via the SET VARIABLE command, but, in reality, this does not work. After asking the mailing list about this problem I was told this had to be specified as a startup option. To change the minimum query string to three characters, start the MySQL server as follows.

$ cd /path/to/mysql
$ ./bin/safe_mysqld --user=mysql -O ft_min_word_len=3 &

After you have restarted your MySQL server (don't forget to change your startup scripts) you have to rebuild the FULLTEXT indices. The manual suggests the following command:

-- Replace tbl_name with the name of your table
mysql> REPAIR TABLE tbl_name QUICK;

After you have rebuilt your indices, you should be able to search with query strings of three or more characters instead of the default four character limit.

Having fun with MySQL FULLTEXT Searching

I'll save the implementation details for a later article, but here are some interesting ways in which you could use MySQL FULLTEXT searching to finding data on your website more interesting.

PHP

A great way to add cross referencing to articles would be to store a query (i.e., linux for a post about Debian GNU/Linux) with each article. If an article had a query, PHP could then perform a FULLTEXT search, returning those results as "Related Articles". Furthermore, you could use PHP to create an advanced search script that allowed users to search the database based on category, criteria, pricing, etc.

ispell

ispell is a great tool for open source developers to use to make their applications more user friendly. By adding ispell to your search engine, you can check the spelling of each query, offering alternate queries if the query was spelled wrong. Everyone's favorite search engine does something similar.

Looking towards the Future

According to the manual, the MySQL team has a lot they still want to implement into FULLTEXT searching. Here is a brief overview of those enhancements:

  • Proximity operators
  • Making the stopword lists depend on the language of the data
  • Stemming

The proximity operators will really make FULLTEXT searching impressive. This will allow you to do searches on words based on how close together they are. For example, if you currently searched for 'mysql search' you would get results even where mysql and search appear at opposite ends of the document. With proximity operators, the scoring algorithm gauges how close together the words are. Documents where mysql and search appeared directly next to one another would score higher than documents where they were not close together.

Stemming is a great way to make search engines smarter. This would allow MySQL to search for words that share the same lexical root. For example, queries for running would return documents with ran and run as well as running.

Conclusion

As you can see, the possibilities of FULLTEXT are almost endless. For those of us looking for an easy and powerful solution to our searching woes, MySQL has come up with an answer.

Joe Stump is the Lead Architect for Digg where he spends his time partitioning data, creating internal services, and ensuring the code frameworks are in working order.


Return to ONLamp.com.


Comments on this article
Full Threads Oldest First

Showing messages 1 through 8 of 8.

  • This can be easily done via DBSight
    2007-03-28 17:27:22  chrislusf [View]

    I just need 2 SQLs:
    select * from blog_entries
    select * from blog_categories where categoryID = ?

    Then I can use DBSight to create a Lucene based full text search.

    Please take a look at this

    http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes

    You can create a full-text database search service, return results as HTML/XML/JSON. It uses the Lucene directly in java, but can be easily used with Ruby, PHP, or any existing database web applicatoins.

    You can easily index, re-index, incremental-index. It's also highly scalable and easily customizable.

    The best thing is, it's super easy. You can create a production-level search in 3 minutes.

  • rule of 50%
    2005-05-08 05:52:42  sosingus [View]

    Is there any way to get around this rule, for I need my results to display no mather of % of their existance in a table
  • Fulltext seach and joins
    2004-10-12 13:32:23  johnv8 [View]

    In one of the queries above there is a join in the same clause as the fulltext search. This works in theory but as soon as you start using real databases you will run into significantly long queries when you do something like
    SELECT *
    FROM fulltext_table AS FT
    INNER JOIN other_table AS OT ON OT.id = FT.id
    WHERE
    MATCH(FT.text) AGAINST ('search_string') LIMIT 1000

    Even small databases of 2-4 million rows often take 2-3 minutes in computing this.

    Disclaimer before you bash this:
    yes, the rows are indexed and optimized. This occured on a fast server (dual processor 2Ghz, 2Gb ram). There are ways to make it fast but the example above will be painfully slow if you have a large DB.
    • Fulltext seach and joins
      2004-10-12 13:34:35  johnv8 [View]

      Slight error with that query, I meant to say:

      SELECT *
      FROM fulltext_table AS FT
      INNER JOIN other_table AS OT ON OT.id = FT.id
      WHERE
      OT.id = 123 AND
      MATCH(FT.text) AGAINST ('search_string') LIMIT 1000
  • Inaccuracies in the article.
    2003-07-03 11:06:33  anonymous2 [View]

    This comment is ambiguous: "Queries that are longer than 20 characters will not be sorted." Each of the example queries are longer than 20 characters. Besides, it's the results that are sorted, not the queries. Perhaps the author is referring to the search term, but I can't find any analog in the MySQL docs. 20 characters seems short anyway for multi-word search terms.

    This statement is incorrect: "MySQL requires that you have at least three rows of data in your result set before it will return any results." If your query only has one matching row, MySQL still gives it to you. The requirement is one of the dataset, not the result set. The requirement is a result of the next item in the list (the 50% rule) since a match of 1 out of two rows is 50%.

    This article isn't quite accurate enough to let you avoid reading the documentation:
    http://www.mysql.com/doc/en/Fulltext_Search.html
  • i18n support
    2003-07-01 11:49:11  joestump [View]

    I swore the MySQL team had this on their TODO someplace (allowing multiple charsets), but I can't find it anywhere. I would assume that allowing multiple charsets would answer the i18n case sensitivity question you posed. The main MySQL list may have more info on the subject (as I'm very US-centric).
  • I18n
    2003-07-01 09:28:23  anonymous2 [View]

    You mention that FULLTEXT Searching is case insensitive, but case sensitivity depends on many factors, like the charset used for the text and the locale used... How are these question solved?
  • Full text searching with PostgreSQL
    2003-06-30 02:46:41  anonymous2 [View]

    Please, take a look on tsearch v2 contrib module
    for PostgreSQL, which provides more options
    and flexibility of full text searching to
    PostgreSQL.

    Home page is:
    http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/

    Regards,

    Oleg Bartunov


Tagged Articles

Post to del.icio.us

This article has been tagged:

mysql

Articles that share the tag mysql:

MySQL FULLTEXT Searching (155 tags)

Live Backups of MySQL Using Replication (152 tags)

Advanced MySQL Replication Techniques (125 tags)

Ten MySQL Best Practices (59 tags)

Rolling with Ruby on Rails (56 tags)

View All

search

Articles that share the tag search:

MySQL FULLTEXT Searching (93 tags)

Find What You Want with Plucene (22 tags)

Building a Vector Space Search Engine in Perl (18 tags)

Google Your Desktop (14 tags)

Dreaming of an Atom Store: A Database for the Web (14 tags)

View All

fulltext

Articles that share the tag fulltext:

MySQL FULLTEXT Searching (84 tags)

View All

database

Articles that share the tag database:

MySQL FULLTEXT Searching (54 tags)

Live Backups of MySQL Using Replication (53 tags)

Advanced MySQL Replication Techniques (53 tags)

Dreaming of an Atom Store: A Database for the Web (49 tags)

How to Misuse SQL's FROM Clause (38 tags)

View All

php

Articles that share the tag php:

Understanding MVC in PHP (477 tags)

The PHP Scalability Myth (123 tags)

The Dynamic Duo of PEAR::DB and Smarty (53 tags)

PHP Form Handling (43 tags)

Very Dynamic Web Interfaces (39 tags)

View All

Sponsored Resources

  • Inside Lightroom

Related to this Article

Designing Great Data Products Designing Great Data Products
March 2012
$0.00 USD

Planning for Big Data Planning for Big Data
March 2012
$0.00 USD

Advertisement
Sign up today to receive special discounts,
product alerts, and news from O'Reilly.
Privacy Policy >
View Sample Newsletter >
  • Youtube
  • http://www.youtube.com/OreillyMedia
  • Twitter
  • Subscribe
  • View All RSS Feeds >
O'Reilly Media

800-889-8969 or 707-827-7019
Monday-Friday 7:30am-5pm PT
©2011, O'Reilly Media, Inc.
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
  • About O'Reilly
  • Academic Solutions
  • Contacts
  • Customer Service
  • Careers
  • Press Room
  • Privacy Policy
  • Terms of Service
  • Writing for O'Reilly
  • Community
  • Authors
  • Forums
  • Membership
  • Newsletters
  • RSS Feeds
  • User Groups
  • More O'Reilly Sites
  • igniteshow.com
  • makerfaire.com
  • makezine.com
  • craftzine.com
  • labs.oreilly.com
  • Partner Sites
  • PayPal Developer Zone
  • O'Reilly Insights on Forbes.com