Skip to content

[fix](cloud) batch process ttl cache block gc to limit lock held time once in a time #50387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 25, 2025

Conversation

freemandealer
Copy link
Contributor

@freemandealer freemandealer commented Apr 24, 2025

too many ttl cache blocks gc will burst the cache lock latency and thus affect the query latency. limit them into batches to unleash the lock.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

too many ttl cache blocks gc will burst the cache lock latency
and thus affect the query latency. limit them into batches to
unleash the lock.

Signed-off-by: zhengyu <[email protected]>
@Thearas
Copy link
Contributor

Thearas commented Apr 24, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@freemandealer
Copy link
Contributor Author

run buildall

@freemandealer freemandealer changed the title [fix](cloud) limit amount of ttl cache blocks gc once in a time [fix](cloud) batch process ttl cache block gc to limit lock held time once in a time Apr 24, 2025
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 24, 2025
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 34231 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1b3cfeeefe009b77e10656104eec6f3853f7f3e7, data reload: false

------ Round 1 ----------------------------------
q1	26445	5104	5038	5038
q2	2084	275	188	188
q3	10499	1248	703	703
q4	10239	1017	545	545
q5	7667	2441	2377	2377
q6	193	168	136	136
q7	938	741	626	626
q8	9340	1344	1195	1195
q9	6883	5130	5173	5130
q10	6824	2312	1890	1890
q11	488	280	277	277
q12	347	353	211	211
q13	17789	3736	3049	3049
q14	230	225	214	214
q15	542	480	486	480
q16	452	444	403	403
q17	604	888	374	374
q18	7623	7239	7219	7219
q19	1364	954	571	571
q20	336	336	220	220
q21	3908	3386	2426	2426
q22	1074	1038	959	959
Total cold run time: 115869 ms
Total hot run time: 34231 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5191	5165	5109	5109
q2	241	328	238	238
q3	2148	2657	2306	2306
q4	1440	1826	1514	1514
q5	4577	4494	4413	4413
q6	250	167	128	128
q7	2036	1923	1741	1741
q8	2597	2629	2513	2513
q9	7308	7263	7175	7175
q10	3017	3216	2755	2755
q11	577	510	493	493
q12	709	762	599	599
q13	3638	4048	3318	3318
q14	316	309	281	281
q15	521	491	472	472
q16	449	495	462	462
q17	1169	1600	1341	1341
q18	7915	7499	7519	7499
q19	802	831	810	810
q20	1996	1954	1797	1797
q21	5206	4782	4667	4667
q22	1053	1044	1004	1004
Total cold run time: 53156 ms
Total hot run time: 50635 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185546 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1b3cfeeefe009b77e10656104eec6f3853f7f3e7, data reload: false

query1	1025	493	496	493
query2	6572	1886	1804	1804
query3	6747	222	214	214
query4	26463	23434	23189	23189
query5	4324	614	486	486
query6	298	193	179	179
query7	4610	476	287	287
query8	293	243	231	231
query9	8619	2575	2575	2575
query10	447	317	296	296
query11	15265	15247	14861	14861
query12	177	109	109	109
query13	1645	514	400	400
query14	8773	6233	6157	6157
query15	214	188	167	167
query16	7149	616	451	451
query17	1140	781	550	550
query18	1970	392	291	291
query19	182	178	153	153
query20	120	121	116	116
query21	214	121	103	103
query22	4122	4178	4060	4060
query23	34027	32952	32833	32833
query24	8454	2354	2340	2340
query25	542	455	380	380
query26	1227	263	150	150
query27	2774	497	326	326
query28	4326	2120	2096	2096
query29	738	558	422	422
query30	279	216	196	196
query31	939	875	762	762
query32	73	71	65	65
query33	571	375	316	316
query34	804	834	514	514
query35	770	822	728	728
query36	973	983	898	898
query37	119	111	78	78
query38	4264	4097	4039	4039
query39	1440	1408	1407	1407
query40	216	115	109	109
query41	57	60	53	53
query42	120	104	111	104
query43	481	507	480	480
query44	1276	788	779	779
query45	176	170	171	170
query46	832	1020	614	614
query47	1795	1825	1760	1760
query48	365	404	301	301
query49	779	532	421	421
query50	639	656	401	401
query51	4081	4154	4028	4028
query52	107	101	92	92
query53	214	252	184	184
query54	581	571	504	504
query55	90	85	92	85
query56	312	293	303	293
query57	1160	1169	1081	1081
query58	263	264	242	242
query59	2643	2668	2617	2617
query60	320	317	320	317
query61	130	125	127	125
query62	813	723	701	701
query63	223	186	178	178
query64	4351	987	663	663
query65	4304	4284	4320	4284
query66	1139	403	308	308
query67	16003	15397	15511	15397
query68	7813	864	502	502
query69	469	289	262	262
query70	1177	1174	1081	1081
query71	417	309	289	289
query72	5565	4782	4981	4782
query73	709	661	339	339
query74	8989	8890	8944	8890
query75	3389	3242	2749	2749
query76	3342	1186	750	750
query77	666	459	280	280
query78	10096	10379	9269	9269
query79	2449	821	573	573
query80	645	526	424	424
query81	494	259	216	216
query82	220	125	99	99
query83	253	247	236	236
query84	278	102	82	82
query85	760	425	308	308
query86	373	313	296	296
query87	4328	4576	4231	4231
query88	3790	2189	2175	2175
query89	374	316	284	284
query90	2023	219	208	208
query91	141	141	121	121
query92	79	63	59	59
query93	2248	935	564	564
query94	655	415	301	301
query95	368	285	290	285
query96	485	571	273	273
query97	3076	3200	3126	3126
query98	226	210	207	207
query99	1355	1417	1272	1272
Total cold run time: 273411 ms
Total hot run time: 185546 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.84 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1b3cfeeefe009b77e10656104eec6f3853f7f3e7, data reload: false

query1	0.04	0.04	0.03
query2	0.13	0.11	0.11
query3	0.26	0.21	0.19
query4	1.88	0.20	0.20
query5	0.60	0.59	0.60
query6	1.32	0.73	0.72
query7	0.02	0.02	0.01
query8	0.05	0.04	0.03
query9	0.60	0.52	0.52
query10	0.57	0.57	0.57
query11	0.16	0.11	0.12
query12	0.15	0.11	0.11
query13	0.60	0.60	0.59
query14	1.16	1.21	1.20
query15	0.87	0.83	0.84
query16	0.38	0.38	0.38
query17	1.08	1.01	1.04
query18	0.21	0.20	0.20
query19	1.89	1.78	1.78
query20	0.02	0.01	0.01
query21	15.41	0.93	0.55
query22	0.74	1.19	0.69
query23	14.87	1.41	0.60
query24	7.50	0.97	1.32
query25	0.44	0.26	0.09
query26	0.62	0.16	0.14
query27	0.05	0.05	0.04
query28	9.67	0.82	0.45
query29	12.59	3.94	3.30
query30	0.25	0.09	0.06
query31	2.81	0.58	0.37
query32	3.24	0.56	0.47
query33	3.15	3.05	3.11
query34	15.71	5.08	4.52
query35	4.52	4.57	4.54
query36	0.65	0.49	0.49
query37	0.09	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.03	0.03
query40	0.17	0.15	0.13
query41	0.09	0.03	0.02
query42	0.03	0.03	0.02
query43	0.04	0.04	0.03
Total cold run time: 104.71 s
Total hot run time: 29.84 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 100.00% (10/10) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.75% (14595/27151)
Line Coverage 42.58% (126691/297530)
Region Coverage 41.38% (64768/156505)
Branch Coverage 35.93% (32558/90608)

@hello-stephen
Copy link
Contributor

BE Regression P0 && UT Coverage Report

Increment line coverage 20.00% (2/10) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage
Line Coverage
Region Coverage
Branch Coverage

@dataroaring dataroaring added dev/3.0.x usercase Important user case type label labels Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.6-merged reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants