博客
关于我
hive窗口函数练习题
阅读量:615 次
发布时间:2019-03-13

本文共 21800 字,大约阅读时间需要 72 分钟。

一、第一套练习

需求:

1、求用户明细并统计每天的用户总数
2、计算从第一天到现在的所有 score 大于80分的用户总数
3、计算每个用户到当前日期分数大于80的天数

test_window.txt数据:

20191020,11111,8520191020,22222,8320191020,33333,8620191021,11111,8720191021,22222,6520191021,33333,9820191022,11111,6720191022,22222,3420191022,33333,8820191023,11111,9920191023,22222,33

建表:

0: jdbc:hive2://hadoop:11240> create table test_window(logday string,userid string,score int). . . . . . . . . . . . . . > row format delimited. . . . . . . . . . . . . . > fields terminated by ',';0: jdbc:hive2://hadoop:11240> load data local inpath '/home/xiaokang/hivedata/test_window.txt'. . . . . . . . . . . . . . > into table test_window;0: jdbc:hive2://hadoop:11240> select * from test_window;+---------------------+---------------------+--------------------+| test_window.logday  | test_window.userid  | test_window.score  |+---------------------+---------------------+--------------------+| 20191020            | 11111               | 85                 || 20191020            | 22222               | 83                 || 20191020            | 33333               | 86                 || 20191021            | 11111               | 87                 || 20191021            | 22222               | 65                 || 20191021            | 33333               | 98                 || 20191022            | 11111               | 67                 || 20191022            | 22222               | 34                 || 20191022            | 33333               | 88                 || 20191023            | 11111               | 99                 || 20191023            | 22222               | 33                 |+---------------------+---------------------+--------------------+

1、求用户明细并统计每天的用户总数

0: jdbc:hive2://hadoop:11240> select *,count()over(partition by logday)as day_total from test_window;+---------------------+---------------------+--------------------+------------+| test_window.logday  | test_window.userid  | test_window.score  | day_total  |+---------------------+---------------------+--------------------+------------+| 20191020            | 33333               | 86                 | 3          || 20191020            | 22222               | 83                 | 3          || 20191020            | 11111               | 85                 | 3          || 20191021            | 33333               | 98                 | 3          || 20191021            | 22222               | 65                 | 3          || 20191021            | 11111               | 87                 | 3          || 20191022            | 33333               | 88                 | 3          || 20191022            | 22222               | 34                 | 3          || 20191022            | 11111               | 67                 | 3          || 20191023            | 22222               | 33                 | 2          || 20191023            | 11111               | 99                 | 2          |+---------------------+---------------------+--------------------+------------+

2、计算从第一天到现在的所有 score 大于80分的用户总数

0: jdbc:hive2://hadoop:11240> select *,count()over(order by logday rows between unbounded preceding and current row) as total from test_window where score>80;+---------------------+---------------------+--------------------+--------+| test_window.logday  | test_window.userid  | test_window.score  | total  |+---------------------+---------------------+--------------------+--------+| 20191020            | 33333               | 86                 | 1      || 20191020            | 22222               | 83                 | 2      || 20191020            | 11111               | 85                 | 3      || 20191021            | 33333               | 98                 | 4      || 20191021            | 11111               | 87                 | 5      || 20191022            | 33333               | 88                 | 6      || 20191023            | 11111               | 99                 | 7      |+---------------------+---------------------+--------------------+--------+

3、计算每个用户到当前日期分数大于80的天数

0: jdbc:hive2://hadoop:11240> select *,count()over(partition by userid order by logday rows between unbounded preceding and current row) as total. . . . . . . . . . . . . . > from test_window where score>80 order by logday,userid;+---------------------+---------------------+--------------------+--------+| test_window.logday  | test_window.userid  | test_window.score  | total  |+---------------------+---------------------+--------------------+--------+| 20191020            | 11111               | 85                 | 1      || 20191020            | 22222               | 83                 | 1      || 20191020            | 33333               | 86                 | 1      || 20191021            | 11111               | 87                 | 2      || 20191021            | 33333               | 98                 | 2      || 20191022            | 33333               | 88                 | 3      || 20191023            | 11111               | 99                 | 3      |+---------------------+---------------------+--------------------+--------+

二、第二套练习

需求:

1、查询在2017年4月份购买过的顾客及总人数
2、查询顾客的购买明细及月购买总额
3、查询顾客的购买明细及到目前为止每个顾客购买总金额
4、查询顾客上次的购买时间----lag()over()偏移量分析函数的运用

数据:

jack,2017-01-01,10tony,2017-01-02,15jack,2017-02-03,23tony,2017-01-04,29jack,2017-01-05,46jack,2017-04-06,42tony,2017-01-07,50jack,2017-01-08,55mart,2017-04-08,62mart,2017-04-09,68neil,2017-05-10,12mart,2017-04-11,75neil,2017-06-12,80mart,2017-04-13,94

建表:

0: jdbc:hive2://hadoop:11240> create table business(name string,orderdate string,cost int). . . . . . . . . . . . . . > row format delimited. . . . . . . . . . . . . . > fields terminated by ',';0: jdbc:hive2://hadoop:11240> load data local inpath "/home/xiaokang/hivedata/business.txt". . . . . . . . . . . . . . > into table business;0: jdbc:hive2://hadoop:11240> select * from business;+----------------+---------------------+----------------+| business.name  | business.orderdate  | business.cost  |+----------------+---------------------+----------------+| jack           | 2017-01-01          | 10             || tony           | 2017-01-02          | 15             || jack           | 2017-02-03          | 23             || tony           | 2017-01-04          | 29             || jack           | 2017-01-05          | 46             || jack           | 2017-04-06          | 42             || tony           | 2017-01-07          | 50             || jack           | 2017-01-08          | 55             || mart           | 2017-04-08          | 62             || mart           | 2017-04-09          | 68             || neil           | 2017-05-10          | 12             || mart           | 2017-04-11          | 75             || neil           | 2017-06-12          | 80             || mart           | 2017-04-13          | 94             |+----------------+---------------------+----------------+

1、查询在2017年4月份购买过的顾客及总人数

在本例中:

  • over()必须跟在聚合函数(本例中count())后面,over()叫做开窗函数。
  • 开窗的意义在于它开了一个窗口,这个窗口叫做数据集。
  • 开窗的作用范围:仅仅是给前面的聚合函数count()使用的
  • 开窗等于开一部分数据集出来
  • over()中为空,表示对整个数据集开窗
0: jdbc:hive2://hadoop:11240> select name,count(*) over(). . . . . . . . . . . . . . > from business. . . . . . . . . . . . . . > where substring(orderdate,1,7)='2017-04'+-------+-----------------+| name  | count_window_0  |+-------+-----------------+| mart  | 5               || mart  | 5               || mart  | 5               || mart  | 5               || jack  | 5               |+-------+-----------------+

2、查询顾客的购买明细及所有顾客的购买总额

所有人的花费求和

0: jdbc:hive2://hadoop:11240> select *,sum(cost)over() . . . . . . . . . . . . . . > from business;+----------------+---------------------+----------------+---------------+| business.name  | business.orderdate  | business.cost  | sum_window_0  |+----------------+---------------------+----------------+---------------+| mart           | 2017-04-13          | 94             | 661           || neil           | 2017-06-12          | 80             | 661           || mart           | 2017-04-11          | 75             | 661           || neil           | 2017-05-10          | 12             | 661           || mart           | 2017-04-09          | 68             | 661           || mart           | 2017-04-08          | 62             | 661           || jack           | 2017-01-08          | 55             | 661           || tony           | 2017-01-07          | 50             | 661           || jack           | 2017-04-06          | 42             | 661           || jack           | 2017-01-05          | 46             | 661           || tony           | 2017-01-04          | 29             | 661           || jack           | 2017-02-03          | 23             | 661           || tony           | 2017-01-02          | 15             | 661           || jack           | 2017-01-01          | 10             | 661           |+----------------+---------------------+----------------+---------------+

3、上述的场景,要将 cost 按照日期进行累加

0: jdbc:hive2://hadoop:11240> select orderdate,cost,sum(cost) over(order by orderdate). . . . . . . . . . . . . . > from business;+-------------+-------+---------------+|  orderdate  | cost  | sum_window_0  |+-------------+-------+---------------+| 2017-01-01  | 10    | 10            || 2017-01-02  | 15    | 25            || 2017-01-04  | 29    | 54            || 2017-01-05  | 46    | 100           || 2017-01-07  | 50    | 150           || 2017-01-08  | 55    | 205           || 2017-02-03  | 23    | 228           || 2017-04-06  | 42    | 270           || 2017-04-08  | 62    | 332           || 2017-04-09  | 68    | 400           || 2017-04-11  | 75    | 475           || 2017-04-13  | 94    | 569           || 2017-05-10  | 12    | 581           || 2017-06-12  | 80    | 661           |+-------------+-------+---------------+

4、查询顾客的购买明细以及每位顾客的总花费

按人分组求和

0: jdbc:hive2://hadoop:11240> select *,sum(cost) over(distribute by name). . . . . . . . . . . . . . > from business;+----------------+---------------------+----------------+---------------+| business.name  | business.orderdate  | business.cost  | sum_window_0  |+----------------+---------------------+----------------+---------------+| jack           | 2017-01-05          | 46             | 176           || jack           | 2017-01-08          | 55             | 176           || jack           | 2017-01-01          | 10             | 176           || jack           | 2017-04-06          | 42             | 176           || jack           | 2017-02-03          | 23             | 176           || mart           | 2017-04-13          | 94             | 299           || mart           | 2017-04-11          | 75             | 299           || mart           | 2017-04-09          | 68             | 299           || mart           | 2017-04-08          | 62             | 299           || neil           | 2017-05-10          | 12             | 92            || neil           | 2017-06-12          | 80             | 92            || tony           | 2017-01-04          | 29             | 94            || tony           | 2017-01-02          | 15             | 94            || tony           | 2017-01-07          | 50             | 94            |+----------------+---------------------+----------------+---------------+

5、查询顾客的购买明细及到目前为止每个顾客购买总金额

按人分组,按时间排序,花费累加

# 方法一:0: jdbc:hive2://hadoop:11240> select * ,sum(cost) over(distribute by name sort by orderdate). . . . . . . . . . . . . . > from business;+----------------+---------------------+----------------+---------------+| business.name  | business.orderdate  | business.cost  | sum_window_0  |+----------------+---------------------+----------------+---------------+| jack           | 2017-01-01          | 10             | 10            || jack           | 2017-01-05          | 46             | 56            || jack           | 2017-01-08          | 55             | 111           || jack           | 2017-02-03          | 23             | 134           || jack           | 2017-04-06          | 42             | 176           || mart           | 2017-04-08          | 62             | 62            || mart           | 2017-04-09          | 68             | 130           || mart           | 2017-04-11          | 75             | 205           || mart           | 2017-04-13          | 94             | 299           || neil           | 2017-05-10          | 12             | 12            || neil           | 2017-06-12          | 80             | 92            || tony           | 2017-01-02          | 15             | 15            || tony           | 2017-01-04          | 29             | 44            || tony           | 2017-01-07          | 50             | 94            |+----------------+---------------------+----------------+---------------+
#方法二:0: jdbc:hive2://hadoop:11240> select *,. . . . . . . . . . . . . . > sum(cost). . . . . . . . . . . . . . > over(partition by name. . . . . . . . . . . . . . > order by orderdate rows between unbounded preceding and current row). . . . . . . . . . . . . . > as total_amount. . . . . . . . . . . . . . > from business;+----------------+---------------------+----------------+---------------+| business.name  | business.orderdate  | business.cost  | total_amount  |+----------------+---------------------+----------------+---------------+| jack           | 2017-01-01          | 10             | 10            || jack           | 2017-01-05          | 46             | 56            || jack           | 2017-01-08          | 55             | 111           || jack           | 2017-02-03          | 23             | 134           || jack           | 2017-04-06          | 42             | 176           || mart           | 2017-04-08          | 62             | 62            || mart           | 2017-04-09          | 68             | 130           || mart           | 2017-04-11          | 75             | 205           || mart           | 2017-04-13          | 94             | 299           || neil           | 2017-05-10          | 12             | 12            || neil           | 2017-06-12          | 80             | 92            || tony           | 2017-01-02          | 15             | 15            || tony           | 2017-01-04          | 29             | 44            || tony           | 2017-01-07          | 50             | 94            |+----------------+---------------------+----------------+---------------+

6、查询顾客上次的购买时间----lag()over()偏移量分析函数的运用

0: jdbc:hive2://hadoop:11240> select *,#如果上次的购买时间为null,将其处理为1970-01-01. . . . . . . . . . . . . . > lag(orderdate,1,'1970-01-01') over(partition by name order by orderdate) last_date. . . . . . . . . . . . . . > from business;+----------------+---------------------+----------------+-------------+| business.name  | business.orderdate  | business.cost  |  last_date  |+----------------+---------------------+----------------+-------------+| jack           | 2017-01-01          | 10             | 1970-01-01  || jack           | 2017-01-05          | 46             | 2017-01-01  || jack           | 2017-01-08          | 55             | 2017-01-05  || jack           | 2017-02-03          | 23             | 2017-01-08  || jack           | 2017-04-06          | 42             | 2017-02-03  || mart           | 2017-04-08          | 62             | 1970-01-01  || mart           | 2017-04-09          | 68             | 2017-04-08  || mart           | 2017-04-11          | 75             | 2017-04-09  || mart           | 2017-04-13          | 94             | 2017-04-11  || neil           | 2017-05-10          | 12             | 1970-01-01  || neil           | 2017-06-12          | 80             | 2017-05-10  || tony           | 2017-01-02          | 15             | 1970-01-01  || tony           | 2017-01-04          | 29             | 2017-01-02  || tony           | 2017-01-07          | 50             | 2017-01-04  |+----------------+---------------------+----------------+-------------+

7、查询顾客下一次的购买时间

0: jdbc:hive2://hadoop:11240> select *,. . . . . . . . . . . . . . > lead(orderdate,1,'9999-99-99') over(partition by name order by orderdate) last_date. . . . . . . . . . . . . . > from business;+----------------+---------------------+----------------+-------------+| business.name  | business.orderdate  | business.cost  |  last_date  |+----------------+---------------------+----------------+-------------+| jack           | 2017-01-01          | 10             | 2017-01-05  || jack           | 2017-01-05          | 46             | 2017-01-08  || jack           | 2017-01-08          | 55             | 2017-02-03  || jack           | 2017-02-03          | 23             | 2017-04-06  || jack           | 2017-04-06          | 42             | 9999-99-99  || mart           | 2017-04-08          | 62             | 2017-04-09  || mart           | 2017-04-09          | 68             | 2017-04-11  || mart           | 2017-04-11          | 75             | 2017-04-13  || mart           | 2017-04-13          | 94             | 9999-99-99  || neil           | 2017-05-10          | 12             | 2017-06-12  || neil           | 2017-06-12          | 80             | 9999-99-99  || tony           | 2017-01-02          | 15             | 2017-01-04  || tony           | 2017-01-04          | 29             | 2017-01-07  || tony           | 2017-01-07          | 50             | 9999-99-99  |+----------------+---------------------+----------------+-------------+

三、第三套练习

需求:

1、每门学科学生成绩排名(是否并列排名、空位排名三种实现)
2、每门学科成绩排名top n的学生

score.txt

name	subject	score孙悟空	语文	87孙悟空	数学	95孙悟空	英语	68大海	语文	94大海	数学	56大海	英语	84宋宋	语文	64宋宋	数学	86宋宋	英语	84婷婷	语文	65婷婷	数学	85婷婷	英语	78

建表:

0: jdbc:hive2://hadoop:11240> create table score(name string,subject string,score int). . . . . . . . . . . . . . > row format delimited fields terminated by "\t";0: jdbc:hive2://hadoop:11240> load data local inpath '/home/xiaokang/hivedata/score.txt' into table score;0: jdbc:hive2://hadoop:11240> select * from score;+-------------+----------------+--------------+| score.name  | score.subject  | score.score  |+-------------+----------------+--------------+| 孙悟空         | 语文             | 87           || 孙悟空         | 数学             | 95           || 孙悟空         | 英语             | 68           || 大海          | 语文             | 94           || 大海          | 数学             | 56           || 大海          | 英语             | 84           || 宋宋          | 语文             | 64           || 宋宋          | 数学             | 86           || 宋宋          | 英语             | 84           || 婷婷          | 语文             | 65           || 婷婷          | 数学             | 85           || 婷婷          | 英语             | 78           |+-------------+----------------+--------------+

1、每门学科学生成绩排名(是否并列排名、空位排名三种实现)

  • row_number()按照值排序时产生一个自增编号,不会重复(如:1、2、3、4、5、6)
  • rank() 按照值排序时产生一个自增编号,值相等时会重复,会产生空位(如:1、2、3、3、3、6)
  • dense_rank() 按照值排序时产生一个自增编号,值相等时会重复,不会产生空位(如:1、2、3、3、3、4)
0: jdbc:hive2://hadoop:11240> select *,. . . . . . . . . . . . . . > row_number()over(partition by subject order by score desc) as row_number_method,. . . . . . . . . . . . . . > rank()over(partition by subject order by score desc) as rank_method,. . . . . . . . . . . . . . > dense_rank()over(partition by subject order by score desc) as dense_rank_method. . . . . . . . . . . . . . > from score;+-------------+----------------+--------------+--------------------+--------------+--------------------+| score.name  | score.subject  | score.score  | row_number_method  | rank_method  | dense_rank_method  |+-------------+----------------+--------------+--------------------+--------------+--------------------+| 孙悟空         | 数学             | 95           | 1                  | 1            | 1                  || 宋宋          | 数学             | 86           | 2                  | 2            | 2                  || 婷婷          | 数学             | 85           | 3                  | 3            | 3                  || 大海          | 数学             | 56           | 4                  | 4            | 4                  || 宋宋          | 英语             | 84           | 1                  | 1            | 1                  || 大海          | 英语             | 84           | 2                  | 1            | 1                  || 婷婷          | 英语             | 78           | 3                  | 3            | 2                  || 孙悟空         | 英语             | 68           | 4                  | 4            | 3                  || 大海          | 语文             | 94           | 1                  | 1            | 1                  || 孙悟空         | 语文             | 87           | 2                  | 2            | 2                  || 婷婷          | 语文             | 65           | 3                  | 3            | 3                  || 宋宋          | 语文             | 64           | 4                  | 4            | 4                  |+-------------+----------------+--------------+--------------------+--------------+--------------------+

2、每门学科成绩排名前三的学生

0: jdbc:hive2://hadoop:11240> select * from (. . . . . . . . . . . . . . > select *,. . . . . . . . . . . . . . > row_number() over(partition by subject order by score desc) as rmp. . . . . . . . . . . . . . > from score. . . . . . . . . . . . . . > ) as t. . . . . . . . . . . . . . > where t.rmp<=3;+---------+------------+----------+--------+| t.name  | t.subject  | t.score  | t.rmp  |+---------+------------+----------+--------+| 孙悟空     | 数学         | 95       | 1      || 宋宋      | 数学         | 86       | 2      || 婷婷      | 数学         | 85       | 3      || 宋宋      | 英语         | 84       | 1      || 大海      | 英语         | 84       | 2      || 婷婷      | 英语         | 78       | 3      || 大海      | 语文         | 94       | 1      || 孙悟空     | 语文         | 87       | 2      || 婷婷      | 语文         | 65       | 3      |+---------+------------+----------+--------+

转载地址:http://xnpaz.baihongyu.com/

你可能感兴趣的文章
mysql之连接查询,多表连接
查看>>
mysql乐观锁总结和实践 - 青葱岁月 - ITeye博客
查看>>
mysql也能注册到eureka_SpringCloud如何向Eureka中进行注册微服务-百度经验
查看>>
mysql乱码
查看>>
Mysql事务。开启事务、脏读、不可重复读、幻读、隔离级别
查看>>
MySQL事务与锁详解
查看>>
MySQL事务原理以及MVCC详解
查看>>
MySQL事务及其特性与锁机制
查看>>
mysql事务理解
查看>>
MySQL事务详解结合MVCC机制的理解
查看>>
MySQL事务隔离级别:读未提交、读已提交、可重复读和串行
查看>>
MySQL事务隔离级别:读未提交、读已提交、可重复读和串行
查看>>
webpack css文件处理
查看>>
mysql二进制包安装和遇到的问题
查看>>
MySql二进制日志的应用及恢復
查看>>
mysql互换表中两列数据方法
查看>>
mysql五补充部分:SQL逻辑查询语句执行顺序
查看>>
mysql交互式连接&非交互式连接
查看>>
MySQL什么情况下会导致索引失效
查看>>
Mysql什么时候建索引
查看>>