提交 aec806ca authored 作者: qhz's avatar qhz

【开发】增加扫码优化方案

上级
# 背景
1. ydd_fw_scan_awards, ydd_fw_scans 数据已经达到千万级
2. ydd_fw_scan_awards, ydd_fw_scans 是个大宽表,字段极多,不同的业务使用不同字段。但对于特定业务来说有点累赘
3. ydd_fw_scan_awards, ydd_fw_scan 索引很多,但是 扫码统计很少能得上
# 修改点
1. 第2版,增加扫描时间和出货时间的关联。以`scans_day(每天)、out_day, dealer_id、store_id, product_id` 做分组, 但数据量会暴增,具体可看`数量扫描量`
# 业务员端、经销商端、门店端SQL
## 业务员端
### 数据概况
```
select
count(distinct(a.member_id)) as user_num,
SUM(if(a.is_point=1,a.point_num,0)) as grant_point_num,
SUM(if(a.is_bonus=1,a.bonus_num,0)) as grant_bonus_num,
SUM(if(a.is_gift=1,1,0)) as grant_gift_number,
SUM(if(a.is_many_awards=1 and a.is_award=1,1,0)) as is_many_awards_num,
SUM(if(s.is_first=1 and a.is_award=1,1,0)) as no_many_awards_num
from
ydd_fw_scan_awards as a
left join
ydd_fw_scans as s on a.scan_id = s.id
where
a.day BETWEEN 20240101 AND 20240530 and
s.product_id in (130433, 153727, 163915, 195850, 195852, 195860, 195861) and
s.dealer_id in (280, 259)
limit 1;
```
### 数据明细
```
select a.member_name,a.created_time,a.is_many_awards,a.activity_name,a.award_name,a.award_content,a.is_award,b.headimgurl,s.province,s.city,s.district,s.area
from
ydd_fw_scans as s
left join ydd_fw_scan_awards as a on s.id = a.scan_id
left join ydd_member as b on a.member_id = b.id
where
a.day BETWEEN 20240101 AND 20240630 and
s.product_id in (130433, 153727, 163915, 195850, 195852, 195860, 195861) and
s.dealer_id in (280, 259)
order by a.id desc
limit 0, 20
;
```
## 经销商端
### 数据概况
```
select
count(distinct(a.member_id)) as user_num,
SUM(if(a.is_point=1,a.point_num,0)) as grant_point_num,
SUM(if(a.is_bonus=1,a.bonus_num,0)) as grant_bonus_num,
SUM(if(a.is_gift=1,1,0)) as grant_gift_number,
SUM(if(a.is_many_awards=1 and a.is_award=1,1,0)) as is_many_awards_num,
SUM(if(s.is_first=1 and a.is_award=1,1,0)) as no_many_awards_num
from
ydd_fw_scan_awards as a
left join
ydd_fw_scans as s on a.scan_id = s.id
where
a.created_time BETWEEN 20240101 AND 20240530 and
s.province_id in (130433, 153727, 163915, 195850, 195852, 195860, 195861) and
s.city_id in (280, 259) and
s.district_id in (111, 222)
limit 1;
```
### 数据明细
```
select a.member_name,a.created_time,a.is_many_awards,a.activity_name,a.award_name,a.award_content,a.is_award,b.headimgurl,s.province,s.city,s.district,s.area
from
ydd_fw_scans as s
left join ydd_fw_scan_awards as a on s.id = a.scan_id
left join ydd_member as b on a.member_id = b.id
where
a.created_time BETWEEN 20240101 AND 20240530 and
s.province_id in (130433, 153727, 163915, 195850, 195852, 195860, 195861) and
s.city_id in (280, 259) and
s.district_id in (111, 222)
order by a.id desc
limit 0, 20
;
```
## 门店端
### 数据概况
```
select
count(distinct(a.member_id)) as user_num,SUM(if(a.is_point=1,point_num,0)) as grant_point_num,
SUM(if(a.is_bonus=1,bonus_num,0)) as grant_bonus_num,SUM(if(a.is_gift=1,1,0)) as grant_gift_number,
SUM(if(a.is_many_awards=1 and a.is_award=1,1,0)) as is_many_awards_num,
SUM(if(a.is_many_awards=0 and a.is_award=1,1,0)) as no_many_awards_num,
count(if(s.is_first=1,1,0)) as first_scan_num
from
ydd_fw_scan_awards as a
left join
ydd_fw_scans as s on a.scan_id = s.id
where
a.day BETWEEN 20240101 AND 20240530 and
s.day BETWEEN 20240101 AND 20240530 and
a.out_day BETWEEN 20240101 AND 20240530 and
s.out_day BETWEEN 20240101 AND 20240530 and
(s.product_id = 111,
s.product_sku_sn = 'xxxxx') or ((s.product_id in (1111,2222))
a.created_time BETWEEN 20240101 AND 20240530 and
s.province_id in (130433, 153727, 163915, 195850, 195852, 195860, 195861) and
s.city_id in (280, 259) and
s.district_id in (111, 222)
s. is_first =1
limit 1;
```
### 数据明细
```
select
a.member_name,a.created_time,a.is_many_awards,a.activity_name,a.award_name,a.award_content,a.is_award,b.headimgurl,s.province,s.city,s.district,s.area
from
ydd_fw_scans as s
left join ydd_fw_scan_awards as a on s.id = a.scan_id
left join ydd_member as b on a.member_id = b.id
where
a.day BETWEEN 20240101 AND 20240530 and
s.day BETWEEN 20240101 AND 20240530 and
a.out_day BETWEEN 20240101 AND 20240530 and
s.out_day BETWEEN 20240101 AND 20240530 and
(s.product_id = 111,
s.product_sku_sn = 'xxxxx') or ((s.product_id in (1111,2222))
a.created_time BETWEEN 20240101 AND 20240530 and
s.province_id in (130433, 153727, 163915, 195850, 195852, 195860, 195861) and
s.city_id in (280, 259) and
s.district_id in (111, 222)
s. is_first =1
order by a.id desc
limit 0, 20
;
```
# 二次统计方案
## 概述
- 把扫描数据拆分为两个表: 1. 扫描数据概述 2. 扫描数据明细
- 按日期:每天, 出货时间。进行数据写入。 写入的数据支持,覆写原有的数据
- 扫描数据概述的数据目的是进行每天统计写入
- 扫描数据明细为了缩减表字段和减少多表关联
## 数据写入SQL
### 扫描数据概况
以 scans_day(每天)、out_day, dealer_id、store_id, product_id 做唯一键。用于数据覆写
为了 把扫码时间和出货时间关联起来,必须要把out_day 作为分组的纬度
数据字段包括:
```
user_num,
grant_point_num,
grant_bonus_num,
grant_gift_number,
is_many_awards_num,
no_many_awards_num,
scans_day,
out_day,
province_id,
city_id,
district_id,
store_id,
dealer_id,
product_id
```
## 数量扫描量
`scans_day(每天), dealer_id、store_id, product_id``scans_day(每天)、out_day, dealer_id、store_id, product_id` 对比
```
select count(*) as k,day, dealer_id,store_id, product_id from ydd_fw_scans group by day, dealer_id, store_id, product_id having k > 100 order by k desc limit 10;
```
```
+-------+----------+-----------+----------+------------+
| k | day | dealer_id | store_id | product_id |
+-------+----------+-----------+----------+------------+
| 28856 | 20240212 | 0 | 0 | 195853 |
| 27812 | 20240212 | 0 | 0 | 0 |
| 27127 | 20240211 | 0 | 0 | 0 |
| 26914 | 20240211 | 0 | 0 | 195853 |
| 26675 | 20240215 | 0 | 0 | 195853 |
| 25655 | 20240213 | 0 | 0 | 195853 |
| 23430 | 20240214 | 0 | 0 | 195853 |
| 22789 | 20240213 | 0 | 0 | 0 |
| 22514 | 20240209 | 0 | 0 | 195853 |
| 21629 | 20240209 | 0 | 0 | 0 |
+-------+----------+-----------+----------+------------+
```
```
select count(*) from (select day, product_id, dealer_id, count(distinct(member_id)) as user_num, count(if(is_first=1,1,0)) as first_scan_num from ydd_fw_scans where day = 20240212 group by day, product_id, dealer_id) as a;
+----------+
| count(*) |
+----------+
| 201 |
+----------+
```
```
select count(*) from (select out_day, day, dealer_id, product_id, count(distinct(member_id)) as user_num, count(if(is_first=1,1,0)) as first_scan_num from ydd_fw_scans where day = 20240212 group by day, out_day, dealer_id, product_id) as a;
+----------+
| count(*) |
+----------+
| 6722 |
+----------+
1 row in set (0.83 sec)
```
`数据量涨幅: 3244.3%`
计算 原数据 与 `scans_day(每天)、out_day, dealer_id、store_id, product_id` 的数据压缩量。 以day=20240212作为数据参考
```
+-------+----------+---------+-----------+----------+------------+
| k | day | out_day | dealer_id | store_id | product_id |
+-------+----------+---------+-----------+----------+------------+
| 28856 | 20240212 | 0 | 0 | 0 | 195853 |
| 27812 | 20240212 | 0 | 0 | 0 | 0 |
| 27127 | 20240211 | 0 | 0 | 0 | 0 |
| 26914 | 20240211 | 0 | 0 | 0 | 195853 |
| 26675 | 20240215 | 0 | 0 | 0 | 195853 |
| 25655 | 20240213 | 0 | 0 | 0 | 195853 |
| 23430 | 20240214 | 0 | 0 | 0 | 195853 |
| 22789 | 20240213 | 0 | 0 | 0 | 0 |
| 22514 | 20240209 | 0 | 0 | 0 | 195853 |
| 21629 | 20240209 | 0 | 0 | 0 | 0 |
+-------+----------+---------+-----------+----------+------------+
```
`数据压缩率: 76.71%, 压缩后的数据量是原始数据量的大约23.29%`
按照每天2w-3w的数据增幅,同时以这个数据压缩率. 一年数据量约: 240w
## 扫描数据明细
暂时可以可用原有明细表查询
<!-- 以 scans_day(每天)、dealer_id、product_id 做唯一键。用于数据覆写
所需要的字段跟之前的 `业务务员端、经销商端、门店端SQL` 数据明细差不多 -->
\ No newline at end of file
# 背景
1. ydd_fw_scan_awards, ydd_fw_scans 数据已经达到千万级
2. ydd_fw_scan_awards, ydd_fw_scans 是个大宽表,字段极多,不同的业务使用不同字段。但对于特定业务来说有点累赘
3. ydd_fw_scan_awards, ydd_fw_scan 索引很多,但是 扫码统计很少能得上
# 业务员端、经销商端、门店端SQL
## 业务员端
### 数据概况
```
select
count(distinct(a.member_id)) as user_num,
SUM(if(a.is_point=1,a.point_num,0)) as grant_point_num,
SUM(if(a.is_bonus=1,a.bonus_num,0)) as grant_bonus_num,
SUM(if(a.is_gift=1,1,0)) as grant_gift_number,
SUM(if(a.is_many_awards=1 and a.is_award=1,1,0)) as is_many_awards_num,
SUM(if(s.is_first=1 and a.is_award=1,1,0)) as no_many_awards_num
from
ydd_fw_scan_awards as a
left join
ydd_fw_scans as s on a.scan_id = s.id
where
a.day BETWEEN 20240101 AND 20240530 and
s.product_id in (130433, 153727, 163915, 195850, 195852, 195860, 195861) and
s.dealer_id in (280, 259)
limit 1;
```
### 数据明细
```
select a.member_name,a.created_time,a.is_many_awards,a.activity_name,a.award_name,a.award_content,a.is_award,b.headimgurl,s.province,s.city,s.district,s.area
from
ydd_fw_scans as s
left join ydd_fw_scan_awards as a on s.id = a.scan_id
left join ydd_member as b on a.member_id = b.id
where
a.day BETWEEN 20240101 AND 20240630 and
s.product_id in (130433, 153727, 163915, 195850, 195852, 195860, 195861) and
s.dealer_id in (280, 259)
order by a.id desc
limit 0, 20
;
```
## 经销商端
### 数据概况
```
select
count(distinct(a.member_id)) as user_num,
SUM(if(a.is_point=1,a.point_num,0)) as grant_point_num,
SUM(if(a.is_bonus=1,a.bonus_num,0)) as grant_bonus_num,
SUM(if(a.is_gift=1,1,0)) as grant_gift_number,
SUM(if(a.is_many_awards=1 and a.is_award=1,1,0)) as is_many_awards_num,
SUM(if(s.is_first=1 and a.is_award=1,1,0)) as no_many_awards_num
from
ydd_fw_scan_awards as a
left join
ydd_fw_scans as s on a.scan_id = s.id
where
a.created_time BETWEEN 20240101 AND 20240530 and
s.province_id in (130433, 153727, 163915, 195850, 195852, 195860, 195861) and
s.city_id in (280, 259) and
s.district_id in (111, 222)
limit 1;
```
### 数据明细
```
select a.member_name,a.created_time,a.is_many_awards,a.activity_name,a.award_name,a.award_content,a.is_award,b.headimgurl,s.province,s.city,s.district,s.area
from
ydd_fw_scans as s
left join ydd_fw_scan_awards as a on s.id = a.scan_id
left join ydd_member as b on a.member_id = b.id
where
a.created_time BETWEEN 20240101 AND 20240530 and
s.province_id in (130433, 153727, 163915, 195850, 195852, 195860, 195861) and
s.city_id in (280, 259) and
s.district_id in (111, 222)
order by a.id desc
limit 0, 20
;
```
## 门店端
### 数据概况
```
select
count(distinct(a.member_id)) as user_num,SUM(if(a.is_point=1,point_num,0)) as grant_point_num,
SUM(if(a.is_bonus=1,bonus_num,0)) as grant_bonus_num,SUM(if(a.is_gift=1,1,0)) as grant_gift_number,
SUM(if(a.is_many_awards=1 and a.is_award=1,1,0)) as is_many_awards_num,
SUM(if(a.is_many_awards=0 and a.is_award=1,1,0)) as no_many_awards_num,
count(if(s.is_first=1,1,0)) as first_scan_num
from
ydd_fw_scan_awards as a
left join
ydd_fw_scans as s on a.scan_id = s.id
where
a.day BETWEEN 20240101 AND 20240530 and
s.day BETWEEN 20240101 AND 20240530 and
a.out_day BETWEEN 20240101 AND 20240530 and
s.out_day BETWEEN 20240101 AND 20240530 and
(s.product_id = 111,
s.product_sku_sn = 'xxxxx') or ((s.product_id in (1111,2222))
a.created_time BETWEEN 20240101 AND 20240530 and
s.province_id in (130433, 153727, 163915, 195850, 195852, 195860, 195861) and
s.city_id in (280, 259) and
s.district_id in (111, 222)
s. is_first =1
limit 1;
```
### 数据明细
```
select
a.member_name,a.created_time,a.is_many_awards,a.activity_name,a.award_name,a.award_content,a.is_award,b.headimgurl,s.province,s.city,s.district,s.area
from
ydd_fw_scans as s
left join ydd_fw_scan_awards as a on s.id = a.scan_id
left join ydd_member as b on a.member_id = b.id
where
a.day BETWEEN 20240101 AND 20240530 and
s.day BETWEEN 20240101 AND 20240530 and
a.out_day BETWEEN 20240101 AND 20240530 and
s.out_day BETWEEN 20240101 AND 20240530 and
(s.product_id = 111,
s.product_sku_sn = 'xxxxx') or ((s.product_id in (1111,2222))
a.created_time BETWEEN 20240101 AND 20240530 and
s.province_id in (130433, 153727, 163915, 195850, 195852, 195860, 195861) and
s.city_id in (280, 259) and
s.district_id in (111, 222)
s. is_first =1
order by a.id desc
limit 0, 20
;
```
# 二次统计方案
## 概述
- 把扫描数据拆分为两个表: 1. 扫描数据概述 2. 扫描数据明细
- 按日期:每天。进行数据写入。 写入的数据支持,覆写原有的数据
- 扫描数据概述的数据目的是进行每天统计写入
- 扫描数据明细为了缩减表字段和减少多表关联
## 数据写入SQL
### 扫描数据概况
以 scans_day(每天)、dealer_id、product_id 做唯一键。用于数据覆写
数据字段包括:
```
user_num,
grant_point_num,
grant_bonus_num,
grant_gift_number,
is_many_awards_num,
no_many_awards_num,
scans_day,
awards_day,
scans_outday,
awards_outday,
s.province_id
s.city_id
s.district_id
dealer_id、
product_id
```
## 数量扫描量
```
select count(*) as k,day, product_id, dealer_id from ydd_fw_scans group by day, product_id, dealer_id having k > 100 order by k desc limit 10;
```
```
+-------+----------+------------+-----------+
| k | day | product_id | dealer_id |
+-------+----------+------------+-----------+
| 28856 | 20240212 | 195853 | 0 |
| 27812 | 20240212 | 0 | 0 |
| 27127 | 20240211 | 0 | 0 |
| 26914 | 20240211 | 195853 | 0 |
| 26675 | 20240215 | 195853 | 0 |
| 25655 | 20240213 | 195853 | 0 |
| 23430 | 20240214 | 195853 | 0 |
| 22789 | 20240213 | 0 | 0 |
| 22514 | 20240209 | 195853 | 0 |
| 21629 | 20240209 | 0 | 0 |
+-------+----------+------------+-----------+
```
```
select count(*) from (select day, product_id, dealer_id, count(distinct(member_id)) as user_num, count(if(is_first=1,1,0)) as first_scan_num from ydd_fw_scans where day = 20240212 group by day, product_id, dealer_id) as a;
```
```
+----------+
| count(*) |
+----------+
| 201 |
+----------+
```
## 扫描数据明细
以 scans_day(每天)、dealer_id、product_id 做唯一键。用于数据覆写
所需要的字段跟之前的 `务员端、经销商端、门店端SQL` 数据明细差不多
\ No newline at end of file
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论