-
由于需要大概研究了一下MySQL的随机抽取实现方法。如:要从tablename表中随机提取一条记录,大家一般的写法就是:SELECT * FROM tablename ORDER BY RAND() LIMIT 1。
-
但是,在MYSQL的官方手册,里面针对RAND()的提示大概意思就是,在ORDER BY从句里面不能使用RAND()函数,因为这样会导致数据列被多次扫描。但是在MYSQL 3.23版本中,仍然可以通过ORDER BY RAND()来实现随机。
You cannot use a column with RAND() values in an ORDER BY clause, because ORDER BY would evaluate the column multiple times. -
但是真正测试一下才发现这样效率非常低。一个15万余条的库,查询5条数据,居然要8秒以上。查看官方手册,也说rand()放在ORDER BY 子句中会被执行多次,自然效率及很低。
-
开启mysql的profiling变量,可以查看sql语句的执行时间。(具体对MySQL变量的认知参考我的另一篇文章MySQL变量)
mysql> show variables like "%pro%";
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| profiling | ON |msqyl> set profiling=1;
mysql> show profiles;
+----------+------------+------------------------------------------------------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+------------------------------------------------------------------------------------------------------------+ |
| 7 | 0.00039500 | select * from test as t1 join (select ROUND(RAND()*(select MAX(id) from test)) as id) as t2 on t1.id>t2.id |
| 8 | 0.00050100 | select * from test as t1 join (select ROUND(RAND()*MAX(id)) as id from test) as t2 on t1.id>t2.id |
| 9 | 0.00029700 | select * from test where id> (SELECT ROUND( MAX(id) * RAND()) FROM test ) |
| 11 | 0.00022700 | select * from test where id> (SELECT ROUND(RAND()*(select MAX(id) FROM test))) |
+----------+------------+------------------------------------------------------------------------------------------------------------+
我执行的分别是一下四条语句:
mysql> select * from test as t1 join (select ROUND(RAND()*(select MAX(id) from test)) as id) as t2 on t1.id>t2.id;
+----+-----+-----+-----------+------+
| id | age | sex | name | id |
+----+-----+-----+-----------+------+
| 6 | 0 | 0 | 个 | 5 |
| 7 | 1 | 1 | 第三方 | 5 |
| 9 | 1 | 1 | 阿斯顿飞 | 5 |
| 10 | 1 | 1 | 张 | 5 |
+----+-----+-----+-----------+------+
4 rows in set (0.00 sec)mysql> select * from test as t1 join (select ROUND(RAND()*MAX(id)) as id from test) as t2 on t1.id>t2.id;
+----+-----+-----+-----------+------+
| id | age | sex | name | id |
+----+-----+-----+-----------+------+
| 5 | 1 | 1 | 王 | 4 |
| 6 | 0 | 0 | 个 | 4 |
| 7 | 1 | 1 | 第三方 | 4 |
| 9 | 1 | 1 | 阿斯顿飞 | 4 |
| 10 | 1 | 1 | 张 | 4 |
+----+-----+-----+-----------+------+
5 rows in set (0.00 sec)mysql> select * from test where id> (SELECT ROUND( MAX(id) * RAND()) FROM test );
+----+-----+-----+----------+
| id | age | sex | name |
+----+-----+-----+----------+
| 3 | 1 | 1 | 萨芬 |
| 5 | 1 | 1 | 王 |
| 6 | 0 | 0 | 个 |
| 9 | 1 | 1 | 阿斯顿飞 |
| 10 | 1 | 1 | 张 |
+----+-----+-----+-----------+
5 rows in set (0.00 sec)mysql> select * from test where id> (SELECT ROUND(RAND()*(select MAX(id) FROM test)));
+----+-----+-----+--------------+
| id | age | sex | name |
+----+-----+-----+--------------+
| 3 | 1 | 1 | 萨芬 |
| 4 | 78 | 0 | 的 |
| 5 | 1 | 1 | 王 |
| 7 | 1 | 1 | 第三方 |
| 9 | 1 | 1 | 阿斯顿飞 |
| 10 | 1 | 1 | 张 |
+----+-----+-----+--------------+
6 rows in set (0.00 sec)
可以看到:
1. SELECT ROUND(RAND()*(select MAX(id) FROM test
比
SELECT ROUND( MAX(id) * RAND()) FROM test
要快;
2. 采用JOIN的语法比直接在WHERE中使用函数效率要低
但是最后我还有个问题:
大家可以看到我的前两个用了join的语句的返回结果里,表里数据后面几条id为6 7 9 10,但是第3条语句结果里没有id=7和4那条数据,第4条语句结果里没有id=6那条数据。
不知各位老铁怎么看。咕~~(╯﹏╰)b