本文共 21800 字,大约阅读时间需要 72 分钟。
需求:
1、求用户明细并统计每天的用户总数 2、计算从第一天到现在的所有 score 大于80分的用户总数 3、计算每个用户到当前日期分数大于80的天数test_window.txt数据:
20191020,11111,8520191020,22222,8320191020,33333,8620191021,11111,8720191021,22222,6520191021,33333,9820191022,11111,6720191022,22222,3420191022,33333,8820191023,11111,9920191023,22222,33
建表:
0: jdbc:hive2://hadoop:11240> create table test_window(logday string,userid string,score int). . . . . . . . . . . . . . > row format delimited. . . . . . . . . . . . . . > fields terminated by ',';0: jdbc:hive2://hadoop:11240> load data local inpath '/home/xiaokang/hivedata/test_window.txt'. . . . . . . . . . . . . . > into table test_window;0: jdbc:hive2://hadoop:11240> select * from test_window;+---------------------+---------------------+--------------------+| test_window.logday | test_window.userid | test_window.score |+---------------------+---------------------+--------------------+| 20191020 | 11111 | 85 || 20191020 | 22222 | 83 || 20191020 | 33333 | 86 || 20191021 | 11111 | 87 || 20191021 | 22222 | 65 || 20191021 | 33333 | 98 || 20191022 | 11111 | 67 || 20191022 | 22222 | 34 || 20191022 | 33333 | 88 || 20191023 | 11111 | 99 || 20191023 | 22222 | 33 |+---------------------+---------------------+--------------------+
1、求用户明细并统计每天的用户总数
0: jdbc:hive2://hadoop:11240> select *,count()over(partition by logday)as day_total from test_window;+---------------------+---------------------+--------------------+------------+| test_window.logday | test_window.userid | test_window.score | day_total |+---------------------+---------------------+--------------------+------------+| 20191020 | 33333 | 86 | 3 || 20191020 | 22222 | 83 | 3 || 20191020 | 11111 | 85 | 3 || 20191021 | 33333 | 98 | 3 || 20191021 | 22222 | 65 | 3 || 20191021 | 11111 | 87 | 3 || 20191022 | 33333 | 88 | 3 || 20191022 | 22222 | 34 | 3 || 20191022 | 11111 | 67 | 3 || 20191023 | 22222 | 33 | 2 || 20191023 | 11111 | 99 | 2 |+---------------------+---------------------+--------------------+------------+
2、计算从第一天到现在的所有 score 大于80分的用户总数
0: jdbc:hive2://hadoop:11240> select *,count()over(order by logday rows between unbounded preceding and current row) as total from test_window where score>80;+---------------------+---------------------+--------------------+--------+| test_window.logday | test_window.userid | test_window.score | total |+---------------------+---------------------+--------------------+--------+| 20191020 | 33333 | 86 | 1 || 20191020 | 22222 | 83 | 2 || 20191020 | 11111 | 85 | 3 || 20191021 | 33333 | 98 | 4 || 20191021 | 11111 | 87 | 5 || 20191022 | 33333 | 88 | 6 || 20191023 | 11111 | 99 | 7 |+---------------------+---------------------+--------------------+--------+
3、计算每个用户到当前日期分数大于80的天数
0: jdbc:hive2://hadoop:11240> select *,count()over(partition by userid order by logday rows between unbounded preceding and current row) as total. . . . . . . . . . . . . . > from test_window where score>80 order by logday,userid;+---------------------+---------------------+--------------------+--------+| test_window.logday | test_window.userid | test_window.score | total |+---------------------+---------------------+--------------------+--------+| 20191020 | 11111 | 85 | 1 || 20191020 | 22222 | 83 | 1 || 20191020 | 33333 | 86 | 1 || 20191021 | 11111 | 87 | 2 || 20191021 | 33333 | 98 | 2 || 20191022 | 33333 | 88 | 3 || 20191023 | 11111 | 99 | 3 |+---------------------+---------------------+--------------------+--------+
需求:
1、查询在2017年4月份购买过的顾客及总人数 2、查询顾客的购买明细及月购买总额 3、查询顾客的购买明细及到目前为止每个顾客购买总金额 4、查询顾客上次的购买时间----lag()over()偏移量分析函数的运用数据:
jack,2017-01-01,10tony,2017-01-02,15jack,2017-02-03,23tony,2017-01-04,29jack,2017-01-05,46jack,2017-04-06,42tony,2017-01-07,50jack,2017-01-08,55mart,2017-04-08,62mart,2017-04-09,68neil,2017-05-10,12mart,2017-04-11,75neil,2017-06-12,80mart,2017-04-13,94
建表:
0: jdbc:hive2://hadoop:11240> create table business(name string,orderdate string,cost int). . . . . . . . . . . . . . > row format delimited. . . . . . . . . . . . . . > fields terminated by ',';0: jdbc:hive2://hadoop:11240> load data local inpath "/home/xiaokang/hivedata/business.txt". . . . . . . . . . . . . . > into table business;0: jdbc:hive2://hadoop:11240> select * from business;+----------------+---------------------+----------------+| business.name | business.orderdate | business.cost |+----------------+---------------------+----------------+| jack | 2017-01-01 | 10 || tony | 2017-01-02 | 15 || jack | 2017-02-03 | 23 || tony | 2017-01-04 | 29 || jack | 2017-01-05 | 46 || jack | 2017-04-06 | 42 || tony | 2017-01-07 | 50 || jack | 2017-01-08 | 55 || mart | 2017-04-08 | 62 || mart | 2017-04-09 | 68 || neil | 2017-05-10 | 12 || mart | 2017-04-11 | 75 || neil | 2017-06-12 | 80 || mart | 2017-04-13 | 94 |+----------------+---------------------+----------------+
1、查询在2017年4月份购买过的顾客及总人数
在本例中:
0: jdbc:hive2://hadoop:11240> select name,count(*) over(). . . . . . . . . . . . . . > from business. . . . . . . . . . . . . . > where substring(orderdate,1,7)='2017-04'+-------+-----------------+| name | count_window_0 |+-------+-----------------+| mart | 5 || mart | 5 || mart | 5 || mart | 5 || jack | 5 |+-------+-----------------+
2、查询顾客的购买明细及所有顾客的购买总额
所有人的花费求和0: jdbc:hive2://hadoop:11240> select *,sum(cost)over() . . . . . . . . . . . . . . > from business;+----------------+---------------------+----------------+---------------+| business.name | business.orderdate | business.cost | sum_window_0 |+----------------+---------------------+----------------+---------------+| mart | 2017-04-13 | 94 | 661 || neil | 2017-06-12 | 80 | 661 || mart | 2017-04-11 | 75 | 661 || neil | 2017-05-10 | 12 | 661 || mart | 2017-04-09 | 68 | 661 || mart | 2017-04-08 | 62 | 661 || jack | 2017-01-08 | 55 | 661 || tony | 2017-01-07 | 50 | 661 || jack | 2017-04-06 | 42 | 661 || jack | 2017-01-05 | 46 | 661 || tony | 2017-01-04 | 29 | 661 || jack | 2017-02-03 | 23 | 661 || tony | 2017-01-02 | 15 | 661 || jack | 2017-01-01 | 10 | 661 |+----------------+---------------------+----------------+---------------+
3、上述的场景,要将 cost 按照日期进行累加
0: jdbc:hive2://hadoop:11240> select orderdate,cost,sum(cost) over(order by orderdate). . . . . . . . . . . . . . > from business;+-------------+-------+---------------+| orderdate | cost | sum_window_0 |+-------------+-------+---------------+| 2017-01-01 | 10 | 10 || 2017-01-02 | 15 | 25 || 2017-01-04 | 29 | 54 || 2017-01-05 | 46 | 100 || 2017-01-07 | 50 | 150 || 2017-01-08 | 55 | 205 || 2017-02-03 | 23 | 228 || 2017-04-06 | 42 | 270 || 2017-04-08 | 62 | 332 || 2017-04-09 | 68 | 400 || 2017-04-11 | 75 | 475 || 2017-04-13 | 94 | 569 || 2017-05-10 | 12 | 581 || 2017-06-12 | 80 | 661 |+-------------+-------+---------------+
4、查询顾客的购买明细以及每位顾客的总花费
按人分组求和0: jdbc:hive2://hadoop:11240> select *,sum(cost) over(distribute by name). . . . . . . . . . . . . . > from business;+----------------+---------------------+----------------+---------------+| business.name | business.orderdate | business.cost | sum_window_0 |+----------------+---------------------+----------------+---------------+| jack | 2017-01-05 | 46 | 176 || jack | 2017-01-08 | 55 | 176 || jack | 2017-01-01 | 10 | 176 || jack | 2017-04-06 | 42 | 176 || jack | 2017-02-03 | 23 | 176 || mart | 2017-04-13 | 94 | 299 || mart | 2017-04-11 | 75 | 299 || mart | 2017-04-09 | 68 | 299 || mart | 2017-04-08 | 62 | 299 || neil | 2017-05-10 | 12 | 92 || neil | 2017-06-12 | 80 | 92 || tony | 2017-01-04 | 29 | 94 || tony | 2017-01-02 | 15 | 94 || tony | 2017-01-07 | 50 | 94 |+----------------+---------------------+----------------+---------------+
5、查询顾客的购买明细及到目前为止每个顾客购买总金额
按人分组,按时间排序,花费累加# 方法一:0: jdbc:hive2://hadoop:11240> select * ,sum(cost) over(distribute by name sort by orderdate). . . . . . . . . . . . . . > from business;+----------------+---------------------+----------------+---------------+| business.name | business.orderdate | business.cost | sum_window_0 |+----------------+---------------------+----------------+---------------+| jack | 2017-01-01 | 10 | 10 || jack | 2017-01-05 | 46 | 56 || jack | 2017-01-08 | 55 | 111 || jack | 2017-02-03 | 23 | 134 || jack | 2017-04-06 | 42 | 176 || mart | 2017-04-08 | 62 | 62 || mart | 2017-04-09 | 68 | 130 || mart | 2017-04-11 | 75 | 205 || mart | 2017-04-13 | 94 | 299 || neil | 2017-05-10 | 12 | 12 || neil | 2017-06-12 | 80 | 92 || tony | 2017-01-02 | 15 | 15 || tony | 2017-01-04 | 29 | 44 || tony | 2017-01-07 | 50 | 94 |+----------------+---------------------+----------------+---------------+
#方法二:0: jdbc:hive2://hadoop:11240> select *,. . . . . . . . . . . . . . > sum(cost). . . . . . . . . . . . . . > over(partition by name. . . . . . . . . . . . . . > order by orderdate rows between unbounded preceding and current row). . . . . . . . . . . . . . > as total_amount. . . . . . . . . . . . . . > from business;+----------------+---------------------+----------------+---------------+| business.name | business.orderdate | business.cost | total_amount |+----------------+---------------------+----------------+---------------+| jack | 2017-01-01 | 10 | 10 || jack | 2017-01-05 | 46 | 56 || jack | 2017-01-08 | 55 | 111 || jack | 2017-02-03 | 23 | 134 || jack | 2017-04-06 | 42 | 176 || mart | 2017-04-08 | 62 | 62 || mart | 2017-04-09 | 68 | 130 || mart | 2017-04-11 | 75 | 205 || mart | 2017-04-13 | 94 | 299 || neil | 2017-05-10 | 12 | 12 || neil | 2017-06-12 | 80 | 92 || tony | 2017-01-02 | 15 | 15 || tony | 2017-01-04 | 29 | 44 || tony | 2017-01-07 | 50 | 94 |+----------------+---------------------+----------------+---------------+
6、查询顾客上次的购买时间----lag()over()偏移量分析函数的运用
0: jdbc:hive2://hadoop:11240> select *,#如果上次的购买时间为null,将其处理为1970-01-01. . . . . . . . . . . . . . > lag(orderdate,1,'1970-01-01') over(partition by name order by orderdate) last_date. . . . . . . . . . . . . . > from business;+----------------+---------------------+----------------+-------------+| business.name | business.orderdate | business.cost | last_date |+----------------+---------------------+----------------+-------------+| jack | 2017-01-01 | 10 | 1970-01-01 || jack | 2017-01-05 | 46 | 2017-01-01 || jack | 2017-01-08 | 55 | 2017-01-05 || jack | 2017-02-03 | 23 | 2017-01-08 || jack | 2017-04-06 | 42 | 2017-02-03 || mart | 2017-04-08 | 62 | 1970-01-01 || mart | 2017-04-09 | 68 | 2017-04-08 || mart | 2017-04-11 | 75 | 2017-04-09 || mart | 2017-04-13 | 94 | 2017-04-11 || neil | 2017-05-10 | 12 | 1970-01-01 || neil | 2017-06-12 | 80 | 2017-05-10 || tony | 2017-01-02 | 15 | 1970-01-01 || tony | 2017-01-04 | 29 | 2017-01-02 || tony | 2017-01-07 | 50 | 2017-01-04 |+----------------+---------------------+----------------+-------------+
7、查询顾客下一次的购买时间
0: jdbc:hive2://hadoop:11240> select *,. . . . . . . . . . . . . . > lead(orderdate,1,'9999-99-99') over(partition by name order by orderdate) last_date. . . . . . . . . . . . . . > from business;+----------------+---------------------+----------------+-------------+| business.name | business.orderdate | business.cost | last_date |+----------------+---------------------+----------------+-------------+| jack | 2017-01-01 | 10 | 2017-01-05 || jack | 2017-01-05 | 46 | 2017-01-08 || jack | 2017-01-08 | 55 | 2017-02-03 || jack | 2017-02-03 | 23 | 2017-04-06 || jack | 2017-04-06 | 42 | 9999-99-99 || mart | 2017-04-08 | 62 | 2017-04-09 || mart | 2017-04-09 | 68 | 2017-04-11 || mart | 2017-04-11 | 75 | 2017-04-13 || mart | 2017-04-13 | 94 | 9999-99-99 || neil | 2017-05-10 | 12 | 2017-06-12 || neil | 2017-06-12 | 80 | 9999-99-99 || tony | 2017-01-02 | 15 | 2017-01-04 || tony | 2017-01-04 | 29 | 2017-01-07 || tony | 2017-01-07 | 50 | 9999-99-99 |+----------------+---------------------+----------------+-------------+
需求:
1、每门学科学生成绩排名(是否并列排名、空位排名三种实现) 2、每门学科成绩排名top n的学生score.txt
name subject score孙悟空 语文 87孙悟空 数学 95孙悟空 英语 68大海 语文 94大海 数学 56大海 英语 84宋宋 语文 64宋宋 数学 86宋宋 英语 84婷婷 语文 65婷婷 数学 85婷婷 英语 78
建表:
0: jdbc:hive2://hadoop:11240> create table score(name string,subject string,score int). . . . . . . . . . . . . . > row format delimited fields terminated by "\t";0: jdbc:hive2://hadoop:11240> load data local inpath '/home/xiaokang/hivedata/score.txt' into table score;0: jdbc:hive2://hadoop:11240> select * from score;+-------------+----------------+--------------+| score.name | score.subject | score.score |+-------------+----------------+--------------+| 孙悟空 | 语文 | 87 || 孙悟空 | 数学 | 95 || 孙悟空 | 英语 | 68 || 大海 | 语文 | 94 || 大海 | 数学 | 56 || 大海 | 英语 | 84 || 宋宋 | 语文 | 64 || 宋宋 | 数学 | 86 || 宋宋 | 英语 | 84 || 婷婷 | 语文 | 65 || 婷婷 | 数学 | 85 || 婷婷 | 英语 | 78 |+-------------+----------------+--------------+
1、每门学科学生成绩排名(是否并列排名、空位排名三种实现)
0: jdbc:hive2://hadoop:11240> select *,. . . . . . . . . . . . . . > row_number()over(partition by subject order by score desc) as row_number_method,. . . . . . . . . . . . . . > rank()over(partition by subject order by score desc) as rank_method,. . . . . . . . . . . . . . > dense_rank()over(partition by subject order by score desc) as dense_rank_method. . . . . . . . . . . . . . > from score;+-------------+----------------+--------------+--------------------+--------------+--------------------+| score.name | score.subject | score.score | row_number_method | rank_method | dense_rank_method |+-------------+----------------+--------------+--------------------+--------------+--------------------+| 孙悟空 | 数学 | 95 | 1 | 1 | 1 || 宋宋 | 数学 | 86 | 2 | 2 | 2 || 婷婷 | 数学 | 85 | 3 | 3 | 3 || 大海 | 数学 | 56 | 4 | 4 | 4 || 宋宋 | 英语 | 84 | 1 | 1 | 1 || 大海 | 英语 | 84 | 2 | 1 | 1 || 婷婷 | 英语 | 78 | 3 | 3 | 2 || 孙悟空 | 英语 | 68 | 4 | 4 | 3 || 大海 | 语文 | 94 | 1 | 1 | 1 || 孙悟空 | 语文 | 87 | 2 | 2 | 2 || 婷婷 | 语文 | 65 | 3 | 3 | 3 || 宋宋 | 语文 | 64 | 4 | 4 | 4 |+-------------+----------------+--------------+--------------------+--------------+--------------------+
2、每门学科成绩排名前三的学生
0: jdbc:hive2://hadoop:11240> select * from (. . . . . . . . . . . . . . > select *,. . . . . . . . . . . . . . > row_number() over(partition by subject order by score desc) as rmp. . . . . . . . . . . . . . > from score. . . . . . . . . . . . . . > ) as t. . . . . . . . . . . . . . > where t.rmp<=3;+---------+------------+----------+--------+| t.name | t.subject | t.score | t.rmp |+---------+------------+----------+--------+| 孙悟空 | 数学 | 95 | 1 || 宋宋 | 数学 | 86 | 2 || 婷婷 | 数学 | 85 | 3 || 宋宋 | 英语 | 84 | 1 || 大海 | 英语 | 84 | 2 || 婷婷 | 英语 | 78 | 3 || 大海 | 语文 | 94 | 1 || 孙悟空 | 语文 | 87 | 2 || 婷婷 | 语文 | 65 | 3 |+---------+------------+----------+--------+
转载地址:http://xnpaz.baihongyu.com/