Monday, January 11, 2010

Finding Updateable Columns

A customer says: "I am building a form that presents columns in a table as fields in the form. I want to use the end user's database account to discover which columns are updateable and present the corresponding fields appropriately to the end user: I will gray-out the fields that are not updateable. How do I discover the updateable columns in a table?"

This is a straight-forward set of queries on the data dictionary, which is presented in the information_schema database. The only thing to note is that a user's access to table columns is determined at 4 levels: server level, database level, table level, and column level. These privileges are reported in the information_schema tables USER_PRIVILEGES, SCHEMA_PRIVILEGES, TABLE_PRIVILEGES, and COLUMN_PRIVILEGES. The access I have is the union of all the privileges I have been granted. So, if I have been granted access to all tables in the world database with, say:

GRANT UPDATE ON world.* TO 'user';

, and also to the world.city table with:

GRANT UPDATE ON world.city TO 'user';

, the second GRANT doesn't really matter at all, as the first GRANT already gives me UPDATE access to all the tables in the world database, regardless of access given at the finer level of just one table. So, to find if I have access to a given column, I need to look in up to 4 data dictionary tables, but once I find a suitable privilege, I need look no further to be sure I have that access.

If I'm working interactively, I can always find out the total privileges I have by executing the command SHOW GRANTS, but it's a different matter to write a program that responds to this information dynamically.

Following are a couple of stored functions to do the job. Note that both functions use SECURITY INVOKER, as we are interested in the access allowed to the invoker of the function, not the definer. Note also that both functions attempt to avoid redundant work: once a result is known, a return is made without further database queries. (And by the way, don't forget to grant execute on these functions to the users you want to have access to them.)

I. Function 'updateable'
Takes as input database, table, and column names, and returns a boolean result to tell the user whether the column in question is updateable.

II. Function 'updatelist'
Takes as input a database and table name, and a flag. If the flag is TRUE, returns the columns in the table that are updateable for this user; if FALSE, returns the columns that are not updateable for this user. In both cases, the return is a string containing a comma-separated list of column names.

I. Function 'updateable' definition:

DROP FUNCTION IF EXISTS Updateable;
DELIMITER ;;
CREATE FUNCTION Updateable(db VARCHAR(64),
tab VARCHAR(64),
col VARCHAR(64)) RETURNS BOOL
DETERMINISTIC
SQL SECURITY INVOKER
BEGIN

# Given a database, table, and column in parameters (db, tab, col),
# return TRUE if the function invoker can update that column,
# FALSE otherwise.

DECLARE userId VARCHAR(81);
DECLARE rowCount INT;

# Put userId into form 'user'@'host' (with quotes) for matches
# with information_schema GRANTEE column values.
SET userId = CONCAT("'",
REPLACE(CURRENT_USER(), "@", "'@'"),
"'");

# Look for UPDATE privilege server-wide.
SELECT COUNT(*)
INTO rowCount
FROM information_schema.USER_PRIVILEGES
WHERE GRANTEE = userId
AND PRIVILEGE_TYPE = 'UPDATE';
IF rowCount > 0 THEN RETURN TRUE; END IF;

# Look for database-level privilege.
SELECT COUNT(*)
INTO rowCount
FROM information_schema.SCHEMA_PRIVILEGES
WHERE GRANTEE = userId
AND TABLE_SCHEMA = db
AND PRIVILEGE_TYPE = 'UPDATE';
IF rowCount > 0 THEN RETURN TRUE; END IF;

# Look for table-level privilege.
SELECT COUNT(*)
INTO rowCount
FROM information_schema.TABLE_PRIVILEGES
WHERE GRANTEE = userId
AND TABLE_SCHEMA = db
AND TABLE_NAME = tab
AND PRIVILEGE_TYPE = 'UPDATE';
IF rowCount > 0 THEN RETURN TRUE; END IF;

# Look for column-level privilege.
SELECT COUNT(*)
INTO rowCount
FROM information_schema.COLUMN_PRIVILEGES
WHERE GRANTEE = userId
AND TABLE_SCHEMA = db
AND TABLE_NAME = tab
AND COLUMN_NAME = col
AND PRIVILEGE_TYPE = 'UPDATE';
IF rowCount > 0 THEN RETURN TRUE; END IF;

# No UPDATE privilege found: return FALSE.
RETURN FALSE;
END;;
DELIMITER ;

II. Function 'updatelist' definition:

DROP FUNCTION IF EXISTS UpdateList;
DELIMITER ;;
CREATE FUNCTION UpdateList(db VARCHAR(64),
tab VARCHAR(64),
upd BOOL) RETURNS MEDIUMTEXT
DETERMINISTIC
SQL SECURITY INVOKER
BEGIN

# Given database and table name in parameters (db, tab),
# and an update flag in parameter (upd):
# if upd is TRUE, return a list of the names of columns
# that the invoker can update;
# if upd is FALSE, return a list of the names of columns
# that the invoker cannot update.

DECLARE userId VARCHAR(81);
DECLARE rowCount INT;
DECLARE allColumnsUpdateable BOOL DEFAULT FALSE;
DECLARE result MEDIUMTEXT DEFAULT '';

# Put userId into form 'user'@'host' (with quotes) for matches
# with information_schema GRANTEE column values.
SET userId = CONCAT("'",
REPLACE(CURRENT_USER(), "@", "'@'"),
"'");

# Big IF-THEN structure to find if all columns are updateable.
# Look for UPDATE privilege server-wide.
SELECT COUNT(*)
INTO rowCount
FROM information_schema.USER_PRIVILEGES
WHERE GRANTEE = userId
AND PRIVILEGE_TYPE = 'UPDATE';
IF rowCount > 0 THEN
SET allColumnsUpdateable = TRUE;
ELSE
# Look for database-level privilege.
SELECT COUNT(*)
INTO rowCount
FROM information_schema.SCHEMA_PRIVILEGES
WHERE GRANTEE = userId
AND TABLE_SCHEMA = db
AND PRIVILEGE_TYPE = 'UPDATE';
IF rowCount > 0 THEN
SET allColumnsUpdateable = TRUE;
ELSE
# Look for table-level privilege.
SELECT COUNT(*)
INTO rowCount
FROM information_schema.TABLE_PRIVILEGES
WHERE GRANTEE = userId
AND TABLE_SCHEMA = db
AND TABLE_NAME = tab
AND PRIVILEGE_TYPE = 'UPDATE';
IF rowCount > 0 THEN
SET allColumnsUpdateable = TRUE;
END IF;
END IF;
END IF;

IF upd THEN
# Get updateable column list.
IF allColumnsUpdateable THEN
# As determined above, report all columns.
SELECT GROUP_CONCAT(COLUMN_NAME)
INTO result
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = db
AND TABLE_NAME = tab;
ELSE
# Report column-level privilege, or empty string for none.
SELECT IFNULL(GROUP_CONCAT(COLUMN_NAME), '')
INTO result
FROM information_schema.COLUMN_PRIVILEGES
WHERE GRANTEE = userId
AND TABLE_SCHEMA = db
AND TABLE_NAME = tab
AND PRIVILEGE_TYPE = 'UPDATE';
END IF;
ELSE
# Get non-updateable column list.
IF allColumnsUpdateable THEN
SET result = '';
ELSE
SELECT IFNULL(GROUP_CONCAT(COLUMN_NAME), '')
INTO result
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = db
AND TABLE_NAME = tab
AND COLUMN_NAME NOT IN (
SELECT COLUMN_NAME
FROM information_schema.COLUMN_PRIVILEGES
WHERE GRANTEE = userId
AND TABLE_SCHEMA = db
AND TABLE_NAME = tab
AND PRIVILEGE_TYPE = 'UPDATE');
END IF;
END IF;

RETURN result;
END;;
DELIMITER ;

You could write similar functions to discover which columns are insertable or selectable for a user.

Enjoy!

Sunday, October 4, 2009

Rolling Time-based Partitions

Fairly often I hear customers say that they plan a table that accumulates millions of rows per day, and they want to keep around, say, the last 30 days worth of data. (For the sake of examples, I'm going to make it the last 3 days.) So this is a kind of round-robin table, with rolling addition of new data and removal of the oldest data.

With a high volume of data, this sounds like a table partitioned on day boundaries (in MySQL 5.1). See Sarah's blog and her links for a quick ramp-up on time-based table partitioning (http://everythingmysql.ning.com/profiles/blogs/partitioning-by-dates-the). One great benefit of table partitioning is that you can drop a partition to lose millions of rows in one quick statement, much faster than deleting millions of rows. Sort of like a partial TRUNCATE TABLE.

First create the table with 4 partitions, and then, once a day, drop the oldest partition and add another partition to store the next day's rows. (The table will really have exactly 3 days worth data at the moment of this transformation and will accumulate one more day's worth until the next such transformation.)
CREATE DATABASE IF NOT EXISTS demotimeparts;
USE demotimeparts;

DROP TABLE IF EXISTS pagehits;
CREATE TABLE pagehits (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
urlviewed VARCHAR(255),
whodone VARCHAR(40) DEFAULT NULL,
whendone DATETIME NOT NULL DEFAULT '0001-01-01 00:00:00',
PRIMARY KEY (id, whendone)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
PARTITION BY RANGE (to_days(whendone))
(PARTITION p20090930 VALUES LESS THAN (TO_DAYS('2009-09-30')),
PARTITION p20091001 VALUES LESS THAN (TO_DAYS('2009-10-01')),
PARTITION p20091002 VALUES LESS THAN (TO_DAYS('2009-10-02')),
PARTITION p20091003 VALUES LESS THAN (TO_DAYS('2009-10-03')));
A few notes about this table:
  1. The date boundary of a partition is in the partition's name. We'll use this. (It's also convenient for metadata reports.)
  2. The column whendone is included in the primary key because of the rule that your partitioning column must participate in every unique index.
  3. For this time-based table to benefit from partition pruning, in which the optimizer eliminates some of the partitions from query execution, your partitioned column must be of type DATE or DATETIME, not TIMESTAMP.
  4. The default value on the column whendone is to accommodate a SQL_MODE including NO_ZERO_DATE, NO_ZERO_IN_DATE.
So far, so good: The table is set up to have 4 partitions, one for each of 4 consecutive days. Now, how to accomplish the "rolling" part? Here's one way. The procedure below takes a DATETIME argument and "rolls" the table to accept rows up to the limit of that date, not inclusive. It uses prepared statements to drop the oldest partition in the table, and add a new partition using the DATE limit you've given.
USE demotimeparts;
DROP PROCEDURE IF EXISTS RotateTimePartition;
DELIMITER ;;
CREATE PROCEDURE RotateTimePartition (newPartValue DATETIME)
BEGIN
-- Setup
DECLARE keepStmt VARCHAR(2000) DEFAULT @stmt;
DECLARE partitionToDrop VARCHAR(64);

-- Find and drop the first partition in the table.
SELECT partition_name
INTO partitionToDrop
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE table_schema='demotimeparts'
AND table_name='pagehits'
AND partition_ordinal_position=1;
SET @stmt = CONCAT('ALTER TABLE pagehits DROP PARTITION ',
partitionToDrop);
PREPARE pStmt FROM @stmt;
EXECUTE pStmt;
DEALLOCATE PREPARE pStmt;

-- Add a new partition using the input date for a value limit.
SET @stmt = CONCAT('ALTER TABLE pagehits ADD PARTITION (PARTITION p',
DATE_FORMAT(newPartValue, '%Y%m%d'),
' VALUES LESS THAN (TO_DAYS(\'',
DATE_FORMAT(newPartValue, '%Y-%m-%d'),
'\')))');
PREPARE pStmt FROM @stmt;
EXECUTE pStmt;
DEALLOCATE PREPARE pStmt;

-- Cleanup
SET @stmt = keepStmt;
END;;
DELIMITER ;
So, before calling RotateTimePartition, the pagehits table definition includes:
/*!50100 PARTITION BY RANGE (to_days(whendone))
(PARTITION p20090930 VALUES LESS THAN (734045) ENGINE = MyISAM,
PARTITION p20091001 VALUES LESS THAN (734046) ENGINE = MyISAM,
PARTITION p20091002 VALUES LESS THAN (734047) ENGINE = MyISAM,
PARTITION p20091003 VALUES LESS THAN (734048) ENGINE = MyISAM) */
Then, after:
CALL RotateTimePartition('2009-10-04');
, the table includes:
/*!50100 PARTITION BY RANGE (to_days(whendone))
(PARTITION p20091001 VALUES LESS THAN (734046) ENGINE = MyISAM,
PARTITION p20091002 VALUES LESS THAN (734047) ENGINE = MyISAM,
PARTITION p20091003 VALUES LESS THAN (734048) ENGINE = MyISAM,
PARTITION p20091004 VALUES LESS THAN (734049) ENGINE = MyISAM) */
Notice that the generated partition names let you easily see the date boundaries.

Then you can issue:
CALL RotateTimePartition(NOW() + INTERVAL 1 DAY);
, to get:
/*!50100 PARTITION BY RANGE (to_days(whendone))
(PARTITION p20091002 VALUES LESS THAN (734047) ENGINE = MyISAM,
PARTITION p20091003 VALUES LESS THAN (734048) ENGINE = MyISAM,
PARTITION p20091004 VALUES LESS THAN (734049) ENGINE = MyISAM,
PARTITION p20091005 VALUES LESS THAN (734050) ENGINE = MyISAM) */
There: Lose a partition, add a partition. Now you can easily write an event to call this procedure daily, to automatically maintain your storage for the table.

This table design and procedure work just as well for a table with 31 day-sized partitions, 5 week-sized partitions, 25 month-sized partitions, or whatever. The procedure takes a date input, so it doesn't care whether you're using day, week, month, or any other intervals.

Enjoy!

Sunday, September 6, 2009

SQL from SQL

This is a simple technique, to write a report that generates SQL syntax, that you can then turn around and execute. I've heard this called "SQL from SQL" or sometimes "dynamic SQL"--although it doesn't use prepared statements or anything like that.

There are a few simple options in SQL*Plus (Oracle's command-line tool) to accomplish this, and a few simple--but different--options in mysql (MySQL's command-line tool).

Try this line:

shell> mysql --silent --skip-column-names -e "select concat('describe ', table_name, ';') from information_schema.tables where table_schema = 'world'"

(Note I'm supplying my username and password to the session from an option file.) This sends the following to standard output:

describe City;
describe Country;
describe CountryLanguage;

The trick is to select the data you want from your system, combined with the literal syntax you want to generate. It's a simple matter to execute these commands, by directing this output stream into another mysql session:

shell> mysql --silent --skip-column-names -e "select concat('describe ', table_name, ';') from information_schema.tables where table_schema = 'world'" | mysql -v -v -v world


(The -v -v -v options give extra verbose output suitable to this particular report.) If you like, save the final result in a file:

shell> mysql --silent --skip-column-names -e "select concat('describe ', table_name, ';') from information_schema.tables where table_schema = 'world'" | mysql -v -v -v world > worldtables.txt


It should come as no surprise that the meta-SQL statements you write will often use the meta-data in your database: the information_schema tables. You can use this technique to optimize all the base tables in a database, collect checksum reports, and so forth.

Enjoy!

Introduction

Folks ---

I'm inspired by two of my colleagues: Sarah Sproehnle (http://sarahdba.blogspot.com) and George Trujillo (http://mysql-dba-journey.blogspot.com/). Sarah and George are two excellent MySQL instructors, and their blogs clearly work as tools of the trade. I intend to use this space for a similar purpose: to address questions and techniques of general interest that arise in my MySQL classes.

Enjoy!

--- Glynn