Sunday, October 4, 2009

Rolling Time-based Partitions

Fairly often I hear customers say that they plan a table that accumulates millions of rows per day, and they want to keep around, say, the last 30 days worth of data. (For the sake of examples, I'm going to make it the last 3 days.) So this is a kind of round-robin table, with rolling addition of new data and removal of the oldest data.

With a high volume of data, this sounds like a table partitioned on day boundaries (in MySQL 5.1). See Sarah's blog and her links for a quick ramp-up on time-based table partitioning (http://everythingmysql.ning.com/profiles/blogs/partitioning-by-dates-the). One great benefit of table partitioning is that you can drop a partition to lose millions of rows in one quick statement, much faster than deleting millions of rows. Sort of like a partial TRUNCATE TABLE.

First create the table with 4 partitions, and then, once a day, drop the oldest partition and add another partition to store the next day's rows. (The table will really have exactly 3 days worth data at the moment of this transformation and will accumulate one more day's worth until the next such transformation.)
CREATE DATABASE IF NOT EXISTS demotimeparts;
USE demotimeparts;

DROP TABLE IF EXISTS pagehits;
CREATE TABLE pagehits (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
urlviewed VARCHAR(255),
whodone VARCHAR(40) DEFAULT NULL,
whendone DATETIME NOT NULL DEFAULT '0001-01-01 00:00:00',
PRIMARY KEY (id, whendone)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
PARTITION BY RANGE (to_days(whendone))
(PARTITION p20090930 VALUES LESS THAN (TO_DAYS('2009-09-30')),
PARTITION p20091001 VALUES LESS THAN (TO_DAYS('2009-10-01')),
PARTITION p20091002 VALUES LESS THAN (TO_DAYS('2009-10-02')),
PARTITION p20091003 VALUES LESS THAN (TO_DAYS('2009-10-03')));
A few notes about this table:
  1. The date boundary of a partition is in the partition's name. We'll use this. (It's also convenient for metadata reports.)
  2. The column whendone is included in the primary key because of the rule that your partitioning column must participate in every unique index.
  3. For this time-based table to benefit from partition pruning, in which the optimizer eliminates some of the partitions from query execution, your partitioned column must be of type DATE or DATETIME, not TIMESTAMP.
  4. The default value on the column whendone is to accommodate a SQL_MODE including NO_ZERO_DATE, NO_ZERO_IN_DATE.
So far, so good: The table is set up to have 4 partitions, one for each of 4 consecutive days. Now, how to accomplish the "rolling" part? Here's one way. The procedure below takes a DATETIME argument and "rolls" the table to accept rows up to the limit of that date, not inclusive. It uses prepared statements to drop the oldest partition in the table, and add a new partition using the DATE limit you've given.
USE demotimeparts;
DROP PROCEDURE IF EXISTS RotateTimePartition;
DELIMITER ;;
CREATE PROCEDURE RotateTimePartition (newPartValue DATETIME)
BEGIN
-- Setup
DECLARE keepStmt VARCHAR(2000) DEFAULT @stmt;
DECLARE partitionToDrop VARCHAR(64);

-- Find and drop the first partition in the table.
SELECT partition_name
INTO partitionToDrop
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE table_schema='demotimeparts'
AND table_name='pagehits'
AND partition_ordinal_position=1;
SET @stmt = CONCAT('ALTER TABLE pagehits DROP PARTITION ',
partitionToDrop);
PREPARE pStmt FROM @stmt;
EXECUTE pStmt;
DEALLOCATE PREPARE pStmt;

-- Add a new partition using the input date for a value limit.
SET @stmt = CONCAT('ALTER TABLE pagehits ADD PARTITION (PARTITION p',
DATE_FORMAT(newPartValue, '%Y%m%d'),
' VALUES LESS THAN (TO_DAYS(\'',
DATE_FORMAT(newPartValue, '%Y-%m-%d'),
'\')))');
PREPARE pStmt FROM @stmt;
EXECUTE pStmt;
DEALLOCATE PREPARE pStmt;

-- Cleanup
SET @stmt = keepStmt;
END;;
DELIMITER ;
So, before calling RotateTimePartition, the pagehits table definition includes:
/*!50100 PARTITION BY RANGE (to_days(whendone))
(PARTITION p20090930 VALUES LESS THAN (734045) ENGINE = MyISAM,
PARTITION p20091001 VALUES LESS THAN (734046) ENGINE = MyISAM,
PARTITION p20091002 VALUES LESS THAN (734047) ENGINE = MyISAM,
PARTITION p20091003 VALUES LESS THAN (734048) ENGINE = MyISAM) */
Then, after:
CALL RotateTimePartition('2009-10-04');
, the table includes:
/*!50100 PARTITION BY RANGE (to_days(whendone))
(PARTITION p20091001 VALUES LESS THAN (734046) ENGINE = MyISAM,
PARTITION p20091002 VALUES LESS THAN (734047) ENGINE = MyISAM,
PARTITION p20091003 VALUES LESS THAN (734048) ENGINE = MyISAM,
PARTITION p20091004 VALUES LESS THAN (734049) ENGINE = MyISAM) */
Notice that the generated partition names let you easily see the date boundaries.

Then you can issue:
CALL RotateTimePartition(NOW() + INTERVAL 1 DAY);
, to get:
/*!50100 PARTITION BY RANGE (to_days(whendone))
(PARTITION p20091002 VALUES LESS THAN (734047) ENGINE = MyISAM,
PARTITION p20091003 VALUES LESS THAN (734048) ENGINE = MyISAM,
PARTITION p20091004 VALUES LESS THAN (734049) ENGINE = MyISAM,
PARTITION p20091005 VALUES LESS THAN (734050) ENGINE = MyISAM) */
There: Lose a partition, add a partition. Now you can easily write an event to call this procedure daily, to automatically maintain your storage for the table.

This table design and procedure work just as well for a table with 31 day-sized partitions, 5 week-sized partitions, 25 month-sized partitions, or whatever. The procedure takes a date input, so it doesn't care whether you're using day, week, month, or any other intervals.

Enjoy!

6 comments:

  1. I think the key here is "no fragmentation if you use partitioning". Also dropping the partitions is very fast.

    ReplyDelete
  2. Hi Glynn,
    Eoin here. Once again thanks for the excellent course in San Fancisco the other week. I have spread the word amoung colleagues and you might have a few others attend your class as a result!

    I have a question for you! I don't have your personal email so I hope you don't mind me posting here - it's to do with partitioning....

    I wrote the following:


    CREATE TABLE Eoin_part (id INT, purchased DATE)
    ENGINE=MyISAM DEFAULT CHARSET=latin1
    PARTITION BY RANGE( YEAR(purchased) )
    SUBPARTITION BY HASH( TO_DAYS(purchased) ) (
    PARTITION p0 VALUES LESS THAN (1990) (
    SUBPARTITION s0
    DATA DIRECTORY = 'C:\\Database\\Disk1'
    INDEX DIRECTORY = 'C:\\Database\\Disk1',
    SUBPARTITION s1
    DATA DIRECTORY = 'C:\\Database\\Disk2'
    INDEX DIRECTORY = 'C:\\Database\\Disk2'
    ),
    PARTITION p1 VALUES LESS THAN MAXVALUE (
    SUBPARTITION s2
    DATA DIRECTORY = 'C:\\Database\\Disk3'
    INDEX DIRECTORY = 'C:\\Database\\Disk3',
    SUBPARTITION s3
    DATA DIRECTORY = 'C:\\Database\\Disk4'
    INDEX DIRECTORY = 'C:\\Database\\Disk4'
    )
    );


    Now the problem is that if I now try to insert or select I get an error 1017 'can't find file'

    I have tried playing around with my.ini settings changed the datadir and various sql_modes but i haven't gotten any success. I then tried google and mysql.com to no avail.

    can you shed some light on this issue? I am using version 5.1.45 winx64

    Happy travels
    Eoin

    ReplyDelete
  3. but did not feel pain 4Story Money.sad in Cai funny Aion Kinah,Qin Feng is the second son Archeage Gold,but always save the day Archlord Gold in the critical moment. Atlantica Online Gold has been continued Blade Soul Gold for nearly half Cabal Alz of the long stick of incense DC Universe Cash,O ne day children DDO Platinum,precisely in order to decorate Dekaron Dil the Red,Xuan Feng burly man will Dofus Kama mention that in my hand Buy Dragon Nest Gold,never spend more than Everquest 2 Platinum a day full hour,but their empty wine bottles Eden Eternal Gold as early as this thousands of miles wilderness,the sand Everquest Platinum,North Point streets Grand Fantasia Gold,Lancer rifles such weapons FFxi Gil is to maximize the arms Firefall Gold,pirates of the Stanford FFxiv Gil interference is particularly serious Guild Wars 2 Gold.Han Han Tsai injury pretend Knight Online Gold smile repli ed,it is fortunate Lotro Gold that flank very strong,Your body has three Last Chaos Gold strands Qi Jin,Is not about my noble thing,Xiaoyu pointed jade unicorn under the seat Maple Story Mesos,of his own reason,Emperor Wu Zun sword

    ReplyDelete
  4. ran up and Grand Fantasia Gold eliminating most Guild Wars 2 Gold of the effort Iris Gold.Really see too low Knight Noah to friends! which will Last Chaos Gold include the previous Lotro Gold day to save the live s of kindred Mabinogi Gold absolutely Vindictus Gold.not at all mind the Maple Mesos enemy Buy Mesos.but also in the invisible Maplestory Mesos film Ashikaga General a very loud ass Metin2 Yang.like a lumbering Nostale Gold elephant to Perfect World Gold deal with the same mice R2 Gold,Suddenly see Ragnarok Zeny a group of savage Runes of Magic Gold,shouted: Who.Rappelz Rupees Xu Tiande great surprise Shaiya Gold,Side arms of Silkraod Gold the legendary Swtor Credits Dream Volume Cavalry Tera Gold.......... Moedas PW......

    ReplyDelete
  5. Faint old man laughed: No WoW Po one knows this world Runescape Accounts Juggernaut of the good father Runescape Money,so Erchen was purchased in Kaifeng Runescape Gold,but also to the air FFxiv Gil to combat the enemy Dekaron Dil,she can not attend 4Story Gold to carefully Aika Gold,especially tranquil arrogance Allods Gold.Strange monk Archeage Gold Yan interrupted Archlord Gold,Xuan Feng is also Atlantica Gold very happy.So Kenshin Blade Soul Gold seal sword.villain has been Cabal Alz Santo clean up.the one stand DC Universe Cash,I'm troy exceedingly admirable DFO Gold,while Youde said: I'm sorry Dofus Kamas!,Suddenly hearing the bad news Dragonica Gold.Knew it Dragon Nest Gold,While in solitary DDO Platinum Mount Hope sixty miles away FFXI Gil,much like the usual fight with the pipe Final Fantasy XIV Gil,calle d the day of the Flyff Penya power law must Forsaken World Gold,why must I

    ReplyDelete
  6. HI NIce article..
    i have one question.
    i wanted to use table partitioning but im using aws mysql rds....
    i cant execute table partitioning on slave table only..it will be on both master and salve and i want to have partition on date. Will it help in performance or not. ?

    ReplyDelete