• 由于需要大概研究了一下MySQL的随机抽取实现方法。如:要从tablename表中随机提取一条记录,大家一般的写法就是:SELECT * FROM tablename ORDER BY RAND() LIMIT 1。

  • 但是,在MYSQL的官方手册,里面针对RAND()的提示大概意思就是,在ORDER BY从句里面不能使用RAND()函数,因为这样会导致数据列被多次扫描。但是在MYSQL 3.23版本中,仍然可以通过ORDER BY RAND()来实现随机。
    You cannot use a column with RAND() values in an ORDER BY clause, because ORDER BY would evaluate the column multiple times.

  • 但是真正测试一下才发现这样效率非常低。一个15万余条的库,查询5条数据,居然要8秒以上。查看官方手册,也说rand()放在ORDER BY 子句中会被执行多次,自然效率及很低。

  • 开启mysql的profiling变量,可以查看sql语句的执行时间。(具体对MySQL变量的认知参考我的另一篇文章MySQL变量)

mysql> show variables like "%pro%";
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| profiling                | ON    |msqyl> set profiling=1;
mysql> show profiles;
+----------+------------+------------------------------------------------------------------------------------------------------------+
| Query_ID | Duration   | Query                                                                                                      |
+----------+------------+------------------------------------------------------------------------------------------------------------+                                                                                       |
|        7 | 0.00039500 | select * from test as t1 join (select ROUND(RAND()*(select MAX(id) from test)) as id) as t2 on t1.id>t2.id |
|        8 | 0.00050100 | select * from test as t1 join (select ROUND(RAND()*MAX(id)) as id from test) as t2 on t1.id>t2.id          |
|        9 | 0.00029700 | select * from test where id> (SELECT ROUND( MAX(id) * RAND()) FROM test )                                  |
|       11 | 0.00022700 | select * from test where id> (SELECT ROUND(RAND()*(select MAX(id) FROM test)))                             |
+----------+------------+------------------------------------------------------------------------------------------------------------+

我执行的分别是一下四条语句:

mysql> select * from test as t1 join (select ROUND(RAND()*(select MAX(id) from test)) as id) as t2 on t1.id>t2.id;
+----+-----+-----+-----------+------+
| id | age | sex | name      | id   |
+----+-----+-----+-----------+------+
|  6 |   0 |   0 | 个        |    5 |
|  7 |   1 |   1 | 第三方     |    5 |
|  9 |   1 |   1 | 阿斯顿飞   |    5 |
| 10 |   1 |   1 | 张        |    5 |
+----+-----+-----+-----------+------+
4 rows in set (0.00 sec)mysql> select * from test as t1 join (select ROUND(RAND()*MAX(id)) as id from test) as t2 on t1.id>t2.id;
+----+-----+-----+-----------+------+
| id | age | sex | name      | id   |
+----+-----+-----+-----------+------+
|  5 |   1 |   1 | 王        |    4 |
|  6 |   0 |   0 | 个        |    4 |
|  7 |   1 |   1 | 第三方     |    4 |
|  9 |   1 |   1 | 阿斯顿飞   |    4 |
| 10 |   1 |   1 | 张        |    4 |
+----+-----+-----+-----------+------+
5 rows in set (0.00 sec)mysql> select * from test where id> (SELECT ROUND( MAX(id) * RAND()) FROM test );
+----+-----+-----+----------+
| id | age | sex | name     |
+----+-----+-----+----------+
|  3 |   1 |   1 | 萨芬      |
|  5 |   1 |   1 | 王        |
|  6 |   0 |   0 | 个        |
|  9 |   1 |   1 | 阿斯顿飞   |
| 10 |   1 |   1 | 张        |
+----+-----+-----+-----------+
5 rows in set (0.00 sec)mysql> select * from test where id> (SELECT ROUND(RAND()*(select MAX(id) FROM test)));
+----+-----+-----+--------------+
| id | age | sex | name         |
+----+-----+-----+--------------+
|  3 |   1 |   1 | 萨芬         |
|  4 |  78 |   0 | 的           |
|  5 |   1 |   1 | 王           |
|  7 |   1 |   1 | 第三方       |
|  9 |   1 |   1 | 阿斯顿飞     |
| 10 |   1 |   1 | 张           |
+----+-----+-----+--------------+
6 rows in set (0.00 sec)

可以看到:
1. SELECT ROUND(RAND()*(select MAX(id) FROM test

SELECT ROUND( MAX(id) * RAND()) FROM test
要快;
2. 采用JOIN的语法比直接在WHERE中使用函数效率要低

但是最后我还有个问题:

大家可以看到我的前两个用了join的语句的返回结果里,表里数据后面几条id为6 7 9 10,但是第3条语句结果里没有id=7和4那条数据,第4条语句结果里没有id=6那条数据。
不知各位老铁怎么看。咕~~(╯﹏╰)b