python数据分析期末测验,python数据分析基础题库

统计购买总金额不低于800的用户的总购买金额，总订单数和总购买商品数，订单号相同的算作一单。类型），用户每登录一次，该表中就有一条对应的记录，每天至少有一个用户有登录行为，请完成下面的查询。有如下所示的建库建表SQL语句，请在MySQL中完成数据库和表的创建，然后解决后续的问题。统计购买过"cateA"产品的用户和他们的平均订单金额，订单号相同的算作一单。统计所有城市（包含无购买记录的城市）的总店

M3666789

928人浏览 · 2024-03-09 13:35:46

M3666789 · 2024-03-09 13:35:46 发布

本篇文章给大家谈谈python数据分析基础题库exe，以及python数据分析选择题题目，希望对各位有所帮助，不要忘了收藏本站喔。

Python【数据分析第二阶段测试】

第一部分：SQL查询

数据库中有如下所示的表1（tb_user）和表2（tb_recharge），其中表2包含了2021年一季度的用户充值数据。

表1：tb_userpython自动化运维应用。

user_id	user_name
1	张三
2	李四
3	王五
4	赵六
5	周七

表2：tb_recharge。

pay_month	user_id	pay_amount
一月	1	1000
一月	3	2000
一月	1	3000
二月	5	800
二月	3	900
三月	3	700
三月	4	600
三月	5	500
三月	5	400

查询在2021年一季度充值最多的用户姓名。（5分）

select t1.user_name
 from tb_user as t1 natural join tb_recharge as t2
 where t2.pay_amount in 
	(select max(pay_amount) 
    from tb_recharge);

查询在2021年一季度没有充值的用户姓名。（5分）

select user_name
 from tb_user
 where user_id not in 
 (select user_id from tb_recharge);

查看在2021年一季度连续三个月有充值行为的用户姓名。（5分）

select t1.user_name
 from tb_user as t1 natural join tb_recharge as t2
group by t1.user_id
having count(distinct t2.pay_month) = 3;

查询每个用户在2021年一月、二月、三月的充值金额，如下表所示。（5分）

姓名	一月	二月	三月
张三	4000	0	0
李四	0	0	0
王五	2000	900	700
赵六	0	0	600
周七	0	800	900

select t1.user_name as '姓名',
	sum(case t2.pay_month when '一月' then t2.pay_amount else 0 end) as '一月',
	sum(case t2.pay_month when '二月' then t2.pay_amount else 0 end) as '二月',
	sum(case t2.pay_month when '三月' then t2.pay_amount else 0 end) as '三月'
 from tb_user as t1 natural join tb_recharge as t2
 group by user_id;

数据库中有如下所示的“用户登录日志表”（表中仅展示了部分数据以供参考），其中，log_id是自动编号（bigint类型），user_id是用户编号（varchar类型），login_date是用户的登录日期（date类型），用户每登录一次，该表中就有一条对应的记录，每天至少有一个用户有登录行为，请完成下面的查询。

表：tb_login_log

log_id	user_id	login_date
1	C10000	2021-5-20
2	C10001	2021-5-20
3	C10000	2021-5-20
4	C10002	2021-5-20
5	C10000	2021-5-20

查询单日登录次数超过3次的用户。（10分）

select user_id 
from tb_login_log 
group by user_id,login_date
having count(*) >3;

查询2021年12月每一天的活跃用户数。（10分）

select login_date,
	count(distinct user_id) as '活跃用户数'
from tb_login_log
where login_date >='2021-12-01' and login_date <='2021-12-31'
group by login_date;

第二部分：Pandas的应用

有如下所示的建库建表SQL语句，请在MySQL中完成数据库和表的创建，然后解决后续的问题。

create database my_exam default charset utf8mb4;

use my_exam;

create table tb_product
(
prod_id varchar(50) not null comment '商品号',
category varchar(50) not null comment '种类',
price integer not null comment '价格',
primary key (prod_id)
) engine=innodb comment='产品表';

insert into tb_product values 
	('prodA', 'cateA', 100),
    ('prodB', 'cateB', 200),
    ('prodC', 'cateC', 300),
    ('prodD', 'cateD', 400);

create table tb_order
(
id integer not null auto_increment,
order_no varchar(20) not null comment '订单号',
user_id varchar(50) not null comment '用户号',
order_date date not null comment '下单日期',
store varchar(50) not null comment '店铺号',
product varchar(50) not null comment '商品号',
quantity integer not null comment '购买数量',
primary key (id)
) engine=innodb comment='订单表';

insert into tb_order 
	(order_no, user_id, order_date, store, product, quantity) 
values 
	('D001', 'customerA', '2018-01-01', 'storeA', 'prodA', 1),
    ('D001', 'customerA', '2018-01-01', 'storeA', 'prodB', 1),
    ('D001', 'customerA', '2018-01-01', 'storeA', 'prodC', 1),
    ('D002', 'customerB', '2018-01-12', 'storeB', 'prodB', 1),
    ('D002', 'customerB', '2018-01-12', 'storeB', 'prodD', 1),
    ('D003', 'customerC', '2018-01-12', 'storeC', 'prodB', 1),
    ('D003', 'customerC', '2018-01-12', 'storeC', 'prodC', 1),
    ('D003', 'customerC', '2018-01-12', 'storeC', 'prodD', 1),
    ('D004', 'customerA', '2018-01-01', 'storeD', 'prodD', 2),
    ('D005', 'customerB', '2018-01-23', 'storeB', 'prodA', 1);
    
create table tb_store
(
store_id varchar(50) not null comment '店铺号',
city varchar(20) not null comment '城市',
primary key (store_id)
) engine=innodb comment='店铺表';

insert into tb_store values 
	('storeA', 'cityA'),
    ('storeB', 'cityA'),
    ('storeC', 'cityB'),
    ('storeD', 'cityC'),
    ('storeE', 'cityD'),
    ('storeF', 'cityB');

从数据库中读取三张表（产品表、订单表、店铺表），分别命名为product_df、order_df和store_df。（5分）

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

plt.rcParams['font.sans-serif'].insert(0, 'SimHei')
plt.rcParams['axes.unicode_minus'] = False
import pymysql
conn = pymysql.connect(
                        host='localhost',port=3307,
                        user='guest',password='zhang.123456',
                        database = 'my_exam',charset='utf8mb4')
product_df = pd.read_sql('select * from tb_product',conn,index_col="prod_id")
order_df = pd.read_sql('select * from tb_order',conn,index_col="order_no")
store_df = pd.read_sql('select * from tb_store',conn,index_col="store_id")

连接product_df和order_df两张表并将其命名为order_product_df。（5分）

order_product_df = order_df.merge(product_df,left_on = "product",right_on="prod_id",how = "left")

统计购买总金额不低于800的用户的总购买金额，总订单数和总购买商品数，订单号相同的算作一单。（10分）

order_product_df["orderAmount"]= order_product_df.price * order_product_df.quantity
temp1 = pd.pivot_table(
order_product_df,
    index = "user_id",
    values = ["orderAmount","quantity","order_no"],
    aggfunc = {
    "orderAmount":"sum",
        "quantity":"sum",
        "order_no":"nunique"
    }
).reindex(columns = ["orderAmount","quantity","order_no"]
).rename(
columns = {
    "orderAmount":"总购买金额",
    "quantity":"总购买商品数",
    "order_no":"总订单数"    
}
)
temp1 = temp1[temp1["总购买金额"]>=800]
temp1

统计所有城市（包含无购买记录的城市）的总店铺数，总购买人数和总购买金额。（10分）

order_product_store_df= store_df.merge(order_product_df,left_on = "store_id",right_on ="store",how = "left")
temp2 = pd.pivot_table(
order_product_store_df,
  index = "city",
    values = ("store_id","user_id","orderAmount"),
    aggfunc = {
    "store_id":"nunique",
    "user_id":"nunique",
        "orderAmount":"sum"
    }
).rename(columns = {
"store_id":"总店铺数",
 "user_id":"总购买人数",
   "orderAmount":"总购买金额" 
})
temp2

统计购买过"cateA"产品的用户和他们的平均订单金额，订单号相同的算作一单。（10分）

cateA=order_product_df[order_product_df["category"] == "cateA"]
temp3 = pd.pivot_table(
cateA,
   index =  "user_id",
    values = ("order_no","orderAmount"),
    aggfunc = {
    "order_no":"nunique",
        "orderAmount":"sum"
    }
).rename(columns = {
    "order_no":"总订单数",
   "orderAmount":"总购买金额" 
})
temp3['平均订单金额'] = temp3["总购买金额"]/temp3["总订单数"]

第三部分：分析思维和分析工具

产品发布新版本后，DAU出现严重下滑，作为数据分析师你会如何分析下滑原因。（5分）

答：DAU是日活用户数，在发生DAU数据严重下滑时，首先查看异常数据的真实性，在BI看板上看到的数据去和数据库或者是数仓或者EXCEL多方进行检验，确定异常数据的真实性；其次，对周期性进行衡量判断，数据是否具有周期性规律，如周，月，季度，年度等规律而产生的异常；第三，外部数据的情况，是否是竞争对手，行业信息，政策变动等影响了指标异常；第四，进行指标拆解，将大指标层层拆解，定位异常问题；最后，监控效果，跟踪和监控更新的效果，发现改进的地方并持续更新。

写出5个你曾经使用过的DAX函数并说明其用法。（5分）

答：1.Filter函数：筛选数据,从下个集合中拿出一个子集;
2.Calculate函数：筛选器修改上下文中的表达式;
3.Switch函数：根据值列表计算表达式；
4.Related函数：从另一个表返回一个相关值；
5.sum函数：求和；

请说明你用过哪些Tableau中的快速表计算功能，并说明应用场景。（5分）

答：
1.建立北极星指标时，需要求出销售额时，进行‘汇总’；
2.使用到快速表计算功能的‘计数'选项可以查看每个类别中的数量;
3.可以使用‘百分比差异’来计算每个类别所占的百分比；
4.分析排名时，可以使用‘排序’；
5.观察二八分布的时候可以使用‘合计百分比’；

根据你现在的理解，说一下数据分析师这个岗位的日常工作和现实意义。（5分）

答：数据分析师日常工作包括数据收集、清理、分析和报告。主要负责业务部门的常规数据支撑工作，通过数据分析，针对业务工作中的薄弱环节和漏洞，做出分析报告并提出高校且可以落地的方案，资深的数据分析师可以透过数据发掘潜在的价值，为企业提供洞察，用专业知识和技能来帮助管理、发现和利用业务数据，实现更高的效率和业务增长。

天启AI社区

GitCode 天启AI是一款由 GitCode 团队打造的智能助手，基于先进的LLM（大语言模型）与多智能体 Agent 技术构建，致力于为用户提供高效、智能、多模态的创作与开发支持。它不仅支持自然语言对话，还具备处理文件、生成 PPT、撰写分析报告、开发 Web 应用等多项能力，真正做到“一句话，让 Al帮你完成复杂任务”。

更多推荐