MySQL Manual | 6.8.3 Full-text Search TODO

MySQL Manual

6 MySQL Language Reference
- 6.8 MySQL Full-text Search
  - 6.8.1 Full-text Restrictions
  - 6.8.2 Fine-tuning MySQL Full-text Search
  - 6.8.3 Full-text Search TODO

Previous / Next / Up / Table of Contents

6.8.3 Full-text Search TODO

Make all operations with FULLTEXT index faster.
Proximity operators
Support for "always-index words". They could be any strings the user wants to treat as words, examples are "C++", "AS/400", "TCP/IP", etc.
Support for full-text search in MERGE tables.
Support for multi-byte charsets.
Make stopword list to depend of the language of the data.
Stemming (dependent of the language of the data, of course).
Generic user-suppliable UDF preparser.
Make the model more flexible (by adding some adjustable parameters to FULLTEXT in CREATE/ALTER TABLE).

User Comments

Posted by Nicolas Ross on Friday May 17 2002, @6:24am

[Delete] [Edit]

I agree 100% I am doing a DB with full text and
users aren't able to find part of words. As an
exmple :

L'instustrie du ...

(This is french btw) and i search against
industrie, then this word is not found. Neitheir
plurals like industries wich is not found. It
would be verry appriciated to have partial word
search enable...

Posted by pa44 on Friday May 17 2002, @6:24am

[Delete] [Edit]

This page seems a bit self-contradictory:

"If a thread obtains a READ lock on a table, that thread...can only read from the table" says that the locking thread can't update a READ locked table, whereas "no other thread can update a READ-locked table" implies it can. Experiments show that the first bit is true: no thread at all can update a READ locked table.

There appear to be no provisions for locking a table such that only one thread can update while any thread can read. I would have found this useful.

Posted by [name withheld] on Friday May 17 2002, @6:24am

[Delete] [Edit]

in 3.23.42-log, you cannot match against a merged
table. Eg:
SELECT * FROM table_a WHERE MATCH( description )
AGAINST ('blah');SELECT * FROM table_b WHERE
MATCH( description ) AGAINST ('blah');
SELECT * FROM table_c WHERE MATCH( description )
AGAINST ('blah');SELECT * FROM table_abc_m WHERE
MATCH( description ) AGAINST ('blah'); <== ERROR
1030: Got error -1 from table handler

Posted by John Lucas on Friday May 17 2002, @6:24am

[Delete] [Edit]

We desperately need the enhancements to the FULL
TEXT search. Particlarly it would ideal to have
the following facilities and for the full text
results to be scored based these. 1. Word
Proximity 2. stemming of words. In our
collection words in a search phrase that are
close together should be scored higher than
another document, even if another document has
lots of occurances of just 1 or 2 words in the
phrase but they are not close together. The
proximity of words should be specifiable as
either an operator in a particular search or
defined in the environment as a default.

Posted by Lindsay Pallickal on Friday May 17 2002, @6:24am

[Delete] [Edit]

I agree with Mr. Barszczewski in that we need to
be able to select word delimiters. I am working
on a system where users search through a table of
file names and often things like
brackets/underscores are used in place of spaces.
I can't imagine it would be hard implement the
ability to let the user specify additional
delimeters. Please try and include this in the
next release. For now I am trying to work around
the limitation by creating two synced copies of
the table I need to search - one with the
original filenames and a mirrored copy of that
table where any delimeter I specify is replaced
by a space. The mirrored table is used for the
search and the unique id's of records found in
that table are used to locate the actual filename
in the original table.

Posted by Mattias on Friday May 17 2002, @6:24am

[Delete] [Edit]

There is a description of SQL Server Transaction
Isolation Levels which I believe fairly well
reflects MySQL behavious aswell. URL:
http://www.swynk.com/friends/achigrik/TIL.asp

Posted by Monte Ohrt on Friday May 17 2002, @6:24am

[Delete] [Edit]

The partial word match is a good idea, but only if
proper stemming is done. You want to make sure
that the matching words have the same morpheme
(same basic meaning).

Example:
if you search for the word "runs", it should also
match "run", "running", "runner" since all of
these have the same morpheme of "run". However,
you would NOT want matches such as "rune", "runt",
"rung", "runic" since these words do NOT hold the
same meaning, and would be worthless in your
search.

Another example: a search for "sock" should return
"sock" and "socks", but should NOT return "socket"

Of course, another _great_ addition to the search
engine would be thesaurus matches, so a search for
"doctor" could return "physician", but only if
asked to do so in the query like "@doctor" or
something like that.

Posted by fw4 on Friday May 17 2002, @6:24am

[Delete] [Edit]

With the MATCH AGAINST syntax, how can one
search for let's say: "file.gif"?
It seems to only accept alphanumeric characters
only... the period (.) is dropped...
Let me know!
fw4@tvd.be
Thanks

Posted by Erlend Stromsvik on Friday May 17 2002, @6:24am

[Delete] [Edit]

Joe:
apple* will match ``apple'', ``apples'',
``applesauce'', and ``applet''

fw4@tvd.be:
Read section 6.8 to see what characters are
indexed. I guess the '.' is dropped. And
therfore 'gif' is dropped too (since only words
with 3+ chars are indexed)

Posted by Mike Brittain on Friday May 17 2002, @6:24am

[Delete] [Edit]

Search results are ordered by descending
relevance (as noted above). If you include an
ORDER BY declaration in your query, the relevance
scoring is apparently used by MySQL as a
secondary ordering declaration.

"ORDER BY title ASC"

is essentially the same as saying

"ORDER BY title ASC, relevance DESC"

(I wasn't sure if relevance scoring remained
intact if you were to order off another column,
but this does seem to be the case from the
testing that I did.)

Posted by jody.whitfill on Monday July 22 2002, @12:13pm

[Delete] [Edit]

The parser needs to be a little more complex. I have
a database where users need to search on words
that have characters such as quotes (inches) and
slashes (i.e. 3/4). In a LIKE scenario I can use the
escape sequence character "\", but in full text mode
this does not work. I have a TEXT field that I can
not index with anything other than a FULL TEXT
index. So searching on something like 3/4" does not
work at all as it parses this into "3" and "4" as
words, dropping the other characters.

Posted by Brian Cunningham on Friday May 17 2002, @6:24am

[Delete] [Edit]

In response to Monte Ohrt above: I would like to
be able to choose how a query string is treated.
That is, I would maybe like to see a (optional)
function related to search that would take the
field and the search method as a parameter. So I
could the override the default method with
something like SELECT MATCH (tekst) AGAINST
INDEXTYPE ('run', stemmed) AS x FROM info. Or,
perhaps, add this functionality to MATCH itself.
Similarly I'd like to be able to specify the
smallest word to index in my CREATE statement. I
personally use partial word searches when I don't
know how to spell something. A dumb example: Is
it "socket" or "sockit"? My search for sock
should be able to search using "sock" and return
more than just "sock" and "socks". Then in an app
I might code it so the user can choose to
override the default search method using radio
button selections.

Posted by [name withheld] on Friday May 17 2002, @6:24am

[Delete] [Edit]

Note that you can only set ft_min_word_length in
versions 4 and above. For those of us with 3.23.x
you have to modify myisam/ftdefs.h to

#define MIN_WORD_LEN n (n being the minimum to
index)

And then recompile!

Posted by frederic vandenplas on Saturday June 15 2002, @11:27am

[Delete] [Edit]

This select returns all product_names containing
plas, now i m figuring out how to pass % trough
the queury string
SELECT * FROM Products WHERE Product_name LIKE '%
plas%' ORDER BY product_id ASC

Posted by NTABUYE BUTERA Paul on Monday August 19 2002, @8:48am

[Delete] [Edit]

As an extention of the fulltext capability we
should have a "file path" type of column in MySQL.
Such a column will hold the path a text file on
which one should perform fulltext indexation at
requested indexation time. Thus, we avoid the
extensive amount of time lost while loading data
in a table already containing the FULLTEXT option.

Posted by Martyn Allan on Thursday January 30 2003, @8:08pm

[Delete] [Edit]

I would like to see in boolean mode ( the same as non boolean mode ):

Every correct word in the collection and in the query is weighted according to its significance in the query or collection.

What would also be good is weighting based on the column, so that you may weight hits on a name higher than hits on say the description ( either by a setting in the query or the full-text setup ).

Posted by Frank Mayer on Wednesday February 5 2003, @7:10pm

[Delete] [Edit]

I'd like to know if it is possible (in the future) to weight parts of the MATCH-Condition a little bit more precise. Especialliy in the case, if I have to suppress a large number of irrelevant hits. In these cases it may be more useful to give the exclusion-conditions individual weights than by simply add a "+" or a "-".

Are there any plans or possibilities?

Yours,
Frank Mayer

Posted by Jay G on Sunday March 16 2003, @6:36pm

[Delete] [Edit]

True phonetic searching and Partial Word searching would be nice to have with FullText searching. As it is, I am basically duplicating exactly how MySQL does FullText word searching with a few additions of my own.

By the way, what is the relevance Algorithm that MySQL is using? Mine is using a inverse log based algorithm that is based off the totals given to words as specified by their number of occurrences within the set of strings searched across.

Posted by Ron Carlton on Friday March 28 2003, @3:10pm

[Delete] [Edit]

Please include a description of how the scores of a fulltext search are calculated, their range of values, etc. With a 10 word query with wildcards, I get scores between 5 and 11. The 11 is pretty much an exact match but a score of 5 only has a single matching word. Is there an answer to the question: "11 out of how many?"

Posted by Guido Serra on Tuesday July 8 2003, @3:54pm

[Delete] [Edit]

* Make all operations with FULLTEXT index faster.

"reverse indexes" have already been implemented?
someone is working on them?

and "reindexing"?
i had to work on some situations where, with closed-source full text retrieval software, people had to recreate all the indexes whenever something new had been inserted in the DB.

Add your own comment.

Top / Previous / Next / Up / Table of Contents