Skip to content

Commit 51e10be

Browse files
committed
Merge PR#4 into main as a single commit
Commands: pushd /tmp; git clone https://github.com/letsencrypt/mariadb-sequential-partition-manager-py.git; popd pushd /tmp/mariadb-sequential-partition-manager-py; git checkout -b pr-branch origin/pr-branch; popd git checkout main; git fetch origin; git reset --hard main cp -a /tmp/mariadb-sequential-partition-manager-py/* . git commit -a
1 parent 9b707e1 commit 51e10be

File tree

13 files changed

+1057
-710
lines changed

13 files changed

+1057
-710
lines changed

README.md

Lines changed: 67 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
[![Build Status](https://circleci.com/gh/letsencrypt/mariadb-sequential-partition-manager-py.svg?style=shield)](https://circleci.com/gh/letsencrypt/mariadb-sequential-partition-manager-py)
22
![Maturity Level: Beta](https://img.shields.io/badge/maturity-beta-blue.svg)
33

4+
# Partman
5+
46
This tool partitions and manages MariaDB tables by sequential IDs.
57

68
This is primarily a mechanism for dropping large numbers of rows of data without using `DELETE` statements.
@@ -11,24 +13,60 @@ Similar tools:
1113
* https://github.com/davidburger/gomypartition, intended for tables with date-based partitions
1214
* https://github.com/yahoo/mysql_partition_manager, which is archived and in pure SQL
1315

14-
# Usage
16+
## Usage
17+
18+
```sh
19+
→ git clone https://github.com/letsencrypt/mariadb-sequential-partition-manager-py.git
20+
cd mariadb-sequential-partition-manager-py
21+
→ python3 -m venv .venv
22+
. .venv/bin/activate
23+
→ python3 -m pip install .
24+
→ tee /tmp/partman.conf.yml <<EOF
25+
partitionmanager:
26+
num_empty: 2
27+
partition_period:
28+
days: 90
29+
dburl: "sql://user:password@localhost3306:/test_db"
30+
tables:
31+
cats: {}
32+
dogs:
33+
partition_period:
34+
days: 30
35+
prometheus_stats: "/tmp/prometheus-textcollect-partition-manager.prom"
36+
EOF
37+
→ partition-manager --config /tmp/partman.conf.yml maintain --noop
38+
INFO:root:No-op mode
39+
INFO:partition:Evaluating Table dogs (duration=30 days, 0:00:00) (pos={'id': 150})
40+
INFO:partition:Table dogs planned SQL: ALTER TABLE `dogs` REORGANIZE PARTITION `p_20201204` INTO (PARTITION `p_20210422` VALUES LESS THAN (221), PARTITION `p_20210522` VALUES LESS THAN MAXVALUE);
41+
42+
dogs:
43+
sql: ALTER TABLE `dogs` REORGANIZE PARTITION `p_20201204` INTO (PARTITION `p_20210422` VALUES LESS THAN (221), PARTITION `p_20210522` VALUES LESS THAN MAXVALUE);
44+
noop: True
45+
```
46+
47+
### Running `partman` in your development environment
1548

1649
```sh
17-
→ pip install --editable .
50+
→ git clone https://github.com/letsencrypt/mariadb-sequential-partition-manager-py.git
51+
cd mariadb-sequential-partition-manager-py
52+
→ python3 -m venv .venv
53+
. .venv/bin/activate
54+
→ python3 -m pip install --editable .
1855
→ partition-manager --log-level=debug \
1956
--mariadb test_tools/fake_mariadb.sh \
20-
add --noop --table tablename
57+
maintain --noop --table tablename
2158
DEBUG:root:Auto_Increment column identified as id
2259
DEBUG:root:Partition range column identified as id
2360
DEBUG:root:Found partition before = (100)
2461
DEBUG:root:Found tail partition named p_20201204
2562
INFO:root:No-op mode
2663

2764
ALTER TABLE `dbname`.`tablename` REORGANIZE PARTITION `p_20201204` INTO (PARTITION `p_20201204` VALUES LESS THAN (3101009), PARTITION `p_20210122` VALUES LESS THAN MAXVALUE);
28-
2965
```
3066

31-
You can also use a yaml configuration file with the `--config` parameter of the form:
67+
## Configuration
68+
You can use a yaml configuration file with the `--config` parameter of the form:
69+
3270
```yaml
3371
partitionmanager:
3472
dburl: sql://user:password@localhost/db-name
@@ -48,6 +86,7 @@ partitionmanager:
4886
table3:
4987
retention:
5088
days: 14
89+
table4: {}
5190
```
5291
5392
For tables which are either partitioned but not yet using this tool's schema, or which have no empty partitions, the `bootstrap` command can be useful for proposing alterations to run manually. Note that `bootstrap` proposes commands that are likely to require partial copies of each table, so likely they will require a maintenance period.
@@ -63,12 +102,20 @@ INFO:calculate_sql_alters:Reading prior state information
63102
INFO:calculate_sql_alters:Table orders, 24.0 hours, [9236] - [29236], [20000] pos_change, [832.706363653845]/hour
64103
orders:
65104
- ALTER TABLE `orders` REORGANIZE PARTITION `p_20210405` INTO (PARTITION `p_20210416` VALUES LESS THAN (30901), PARTITION `p_20210516` VALUES LESS THAN (630449), PARTITION `p_20210615` VALUES LESS THAN MAXVALUE);
66-
67105
```
68106

69-
# Algorithm
107+
## Getting started
108+
109+
### Configuring `partman`
110+
111+
- At start, if any configuration file specified as a CLI argument, read that configuration file to set all other values.
112+
- Then, process all remaining command line arguments, overriding values loaded from the configuration file in case of conflicts.
113+
- From those command-line arguments, determine whether to collect statistics `stats`, determine an initial partition layout `bootstrap`, or operate in the normal `maintain` mode.
114+
- Use the configuration information as inputs to the required algorithm.
70115

71-
The core algorithm is implemented in a method `plan_partition_changes` in `table_append_partition.py`. That algorithm is:
116+
### How does `partman` determine when an additional partition is needed?
117+
118+
The core algorithm is implemented in a method `get_pending_sql_reorganize_partition_commands` in `table_append_partition.py`. That algorithm is:
72119

73120
For a given table and that table's intended partition period, desired end-state is to have:
74121
- All the existing partitions containing data,
@@ -105,9 +152,17 @@ Procedure:
105152
- Append the new partition to the intended empty partition list.
106153
- Return the lists of non-empty partitions, the current empty partitions, and the post-algorithm intended empty partitions.
107154

108-
# TODOs
155+
#### How do I run `partman` in `noop` mode?
156+
157+
The results of the algorithm are converted into `ALTER` statements; if the user configured `--noop` they're emitted to console and the logs for each table. If not set to `--noop`, the application will execute the ALTERs at the database server and emit the results, including execution time as prometheus statistics if so configured.
158+
159+
#### "Bootstrap" algorithm
160+
161+
The bootstrap mode is a limited form of the "Maintain" Algorithm, using a temporary state file to determine rates-of-change. The bootstrap mode also does not limit itself to only affecting empty partitions, it can and will request changes that will prompt row copies, in order to prepare a table for future use of the "Maintain" algorithm.
162+
163+
## TODOs
109164

110165
Lots:
111-
[X] Support for tables with partitions across multiple columns.
112-
[ ] A drop mechanism, for one. Initially it should take a retention period and log proposed `DROP` statements, not perform them.
113-
[ ] Yet more tests, particularly live integration tests with a test DB.
166+
- [x] Support for tables with partitions across multiple columns.
167+
- [ ] A drop mechanism, for one. Initially it should take a retention period and log proposed `DROP` statements, not perform them.
168+
- [ ] Yet more tests, particularly live integration tests with a test DB.

partitionmanager/bootstrap.py

Lines changed: 33 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -8,18 +8,9 @@
88
import operator
99
import yaml
1010

11-
from partitionmanager.types import (
12-
ChangePlannedPartition,
13-
MaxValuePartition,
14-
NewPlannedPartition,
15-
)
16-
from partitionmanager.table_append_partition import (
17-
table_is_compatible,
18-
get_current_positions,
19-
get_partition_map,
20-
generate_sql_reorganize_partition_commands,
21-
)
22-
from .tools import iter_show_end
11+
import partitionmanager.table_append_partition as pm_tap
12+
import partitionmanager.tools
13+
import partitionmanager.types
2314

2415
RATE_UNIT = timedelta(hours=1)
2516
MINIMUM_FUTURE_DELTA = timedelta(hours=2)
@@ -35,12 +26,14 @@ def write_state_info(conf, out_fp):
3526
log.info("Writing current state information")
3627
state_info = {"time": conf.curtime, "tables": dict()}
3728
for table in conf.tables:
38-
problem = table_is_compatible(conf.dbcmd, table)
39-
if problem:
40-
raise Exception(problem)
29+
problems = pm_tap.get_table_compatibility_problems(conf.dbcmd, table)
30+
if problems:
31+
raise Exception("; ".join(problems))
4132

42-
map_data = get_partition_map(conf.dbcmd, table)
43-
positions = get_current_positions(conf.dbcmd, table, map_data["range_cols"])
33+
map_data = pm_tap.get_partition_map(conf.dbcmd, table)
34+
positions = pm_tap.get_current_positions(
35+
conf.dbcmd, table, map_data["range_cols"]
36+
)
4437

4538
log.info(f'(Table("{table.name}"): {positions}),')
4639
state_info["tables"][str(table.name)] = positions
@@ -60,21 +53,26 @@ def _get_time_offsets(num_entries, first_delta, subseq_delta):
6053
while len(time_units) < num_entries:
6154
prev = time_units[-1]
6255
time_units.append(prev + subseq_delta)
63-
6456
return time_units
6557

6658

6759
def _plan_partitions_for_time_offsets(
6860
now_time, time_offsets, rate_of_change, ordered_current_pos, max_val_part
6961
):
7062
"""
71-
Return a list of PlannedPartitions, starting from now, corresponding to
72-
each supplied offset that will represent the positions then from the
73-
supplied current positions and the rate of change. The first planned
74-
partition will be altered out of the supplied MaxValue partition.
63+
Return a list of PlannedPartitions whose positions are predicted to
64+
lie upon the supplied time_offsets, given the initial conditions supplied
65+
in the other parameters.
66+
67+
types:
68+
time_offsets: an ordered list of timedeltas to plan to reach
69+
70+
rate_of_change: an ordered list of positions per RATE_UNIT.
7571
"""
7672
changes = list()
77-
for (i, offset), is_final in iter_show_end(enumerate(time_offsets)):
73+
for (i, offset), is_final in partitionmanager.tools.iter_show_end(
74+
enumerate(time_offsets)
75+
):
7876
increase = [x * offset / RATE_UNIT for x in rate_of_change]
7977
predicted_positions = [
8078
int(p + i) for p, i in zip(ordered_current_pos, increase)
@@ -84,13 +82,15 @@ def _plan_partitions_for_time_offsets(
8482
part = None
8583
if i == 0:
8684
part = (
87-
ChangePlannedPartition(max_val_part)
85+
partitionmanager.types.ChangePlannedPartition(max_val_part)
8886
.set_position(predicted_positions)
8987
.set_timestamp(predicted_time)
9088
)
9189

9290
else:
93-
part = NewPlannedPartition().set_timestamp(predicted_time)
91+
part = partitionmanager.types.NewPlannedPartition().set_timestamp(
92+
predicted_time
93+
)
9494

9595
if is_final:
9696
part.set_columns(len(predicted_positions))
@@ -130,12 +130,12 @@ def calculate_sql_alters_from_state_info(conf, in_fp):
130130
log.info(f"Skipping {table_name} as it is not in the current config")
131131
continue
132132

133-
problem = table_is_compatible(conf.dbcmd, table)
133+
problem = pm_tap.get_table_compatibility_problems(conf.dbcmd, table)
134134
if problem:
135135
raise Exception(problem)
136136

137-
map_data = get_partition_map(conf.dbcmd, table)
138-
current_positions = get_current_positions(
137+
map_data = pm_tap.get_partition_map(conf.dbcmd, table)
138+
current_positions = pm_tap.get_current_positions(
139139
conf.dbcmd, table, map_data["range_cols"]
140140
)
141141

@@ -150,7 +150,7 @@ def calculate_sql_alters_from_state_info(conf, in_fp):
150150
rate_of_change = list(map(lambda pos: pos / time_delta, delta_positions))
151151

152152
max_val_part = map_data["partitions"][-1]
153-
if not isinstance(max_val_part, MaxValuePartition):
153+
if not isinstance(max_val_part, partitionmanager.types.MaxValuePartition):
154154
log.error(f"Expected a MaxValue partition, got {max_val_part}")
155155
raise Exception("Unexpected part?")
156156

@@ -163,6 +163,9 @@ def calculate_sql_alters_from_state_info(conf, in_fp):
163163
if table.partition_period:
164164
part_duration = table.partition_period
165165

166+
# Choose the times for each partition that we are configured to
167+
# construct, beginning in the near future (see MINIMUM_FUTURE_DELTA),
168+
# to provide a quick changeover into the new partition schema.
166169
time_offsets = _get_time_offsets(
167170
1 + conf.num_empty, MINIMUM_FUTURE_DELTA, part_duration
168171
)
@@ -176,7 +179,6 @@ def calculate_sql_alters_from_state_info(conf, in_fp):
176179
)
177180

178181
commands[table.name] = list(
179-
generate_sql_reorganize_partition_commands(table, changes)
182+
pm_tap.generate_sql_reorganize_partition_commands(table, changes)
180183
)
181-
182184
return commands

partitionmanager/bootstrap_test.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,6 @@ def run(self, cmd):
3737

3838
if "SELECT" in cmd:
3939
return [{"id": 150}]
40-
4140
return self.response
4241

4342
def db_name(self):

0 commit comments

Comments
 (0)