I have a mysql table where an indexed INT column is going to be 0 for 90% of the rows. If I change those rows to use NULL instead of 0, will they be left out of the index, making the index about 90% smaller?
MySQL can perform the same optimization on col_name IS NULL that it can use for col_name = constant_value. For example, MySQL can use indexes and ranges to search for NULL with IS NULL
It looks like it does index the
Be careful when you run this because MySQL will LOCK the table for WRITES during the index creation. Building the index can take a while on large tables even if the column is empty (all nulls).
Allowing a column to be null will add a byte to the storage requirements of the column. This will lead to an increased index size which is probably not good. That said if a lot of your queries are changed to use “IS NULL” or “NOT NULL” they might be overall faster than doing value comparisons.
My gut would tell me not null, but there’s one answer: test!
No, it will continue to include them, but don’t make too many assumptions about what the consequences are in either case. A lot depends on the range of other values (google for “cardinality”).
MSSQL has a new index type called a “filtered index” for this type of situation (i.e. includes records in the index based on a filter). dBASE-type systems used to have a similar capability, and it was pretty handy.
Each index has a cardinality means how many distinct values are indexed. AFAIK it’s not a reasonable idea to say indexes repeat the same value for many rows but the index will only addresses a repeated value to the clustered index of many rows (rows having null value for this field) and keeping the reference ID of the clustered index means : each row with a NULL value indexed field wastes a size as large as the PK (for this reason experts recommend to have a reasonable PK size if you have composite PK).