An Information Theory of Finite Abstractions and their Fundamental Scalability Limits
Abstract
Finite abstractions are discrete approximations of dynamical systems, such that the set of abstraction trajectories contains, in a formal sense, all system trajectories. There is a consensus that abstractions suffer from the curse of dimensionality: for the same “accuracy” (how closely the abstraction represents the system), the abstraction size scales poorly with system dimensions. And, yet, after decades of research on abstractions, there are no formal results concerning their accuracy-size tradeoff. In this work, we derive a statistical, quantitative theory of abstractions’ accuracy-size tradeoff and uncover fundamental limits on their scalability, through rate-distortion theory – the branch of information theory studying lossy compression. Abstractions are viewed as encoder-decoder pairs, encoding trajectories of dynamical systems in a higher-dimensional ambient space. Rate represents abstraction size, while distortion describes abstraction accuracy, defined as the spatial average deviation between abstract trajectories and system ones. We obtain a fundamental lower bound on the minimum abstraction distortion, given the system dynamics and a threshold on abstraction size. The bound depends on the complexity of the dynamics, through generalized entropy. We demonstrate the bound’s tightness on certain dynamical systems. Finally, we showcase how the developed theory can be employed to construct optimal abstractions, in terms of the size-accuracy tradeoff, through an example on a chaotic system.
1 Introduction
Modern engineering systems are becoming more complex and must meet intricate specifications in safety-critical situations. For instance, a self-driving car must follow traffic rules, avoid collisions, and optimize speed and fuel consumption. Due to the complexity of these systems, traditional analytic methods for verification and control are intractable. For over two decades, to address verification and control of complex dynamics and objectives, abstraction-based methods have flourished [1, 2]. Given a dynamical system, these methods construct a finite system – the abstraction –, arising from partitioning the state (and control) space of the original system, such that all trajectories of the original system are contained, in a formal sense, in the set of abstraction trajectories. Employing this property, one may solve an intractable verification or control problem for the original system over the finite abstraction, with formal guarantees of correctness. Over the years, research on abstractions has spanned deterministic systems [3, 4, 5], stochastic systems [6, 7, 8], and, lately, data-driven scenarios [9, 10, 11, 12].
Despite their immense success, there is a consensus that abstractions suffer from the curse of dimensionality, limiting their practical relevance; for a given accuracy (how closely the abstraction describes the true dynamics), the abstraction size scales poorly with system dimensions. And even though abstractions have received considerable interest in the past decades, there are still no formal results concerning their curse of dimensionality and accuracy-size tradeoff.
Contributions
In this work, we derive a statistical, quantitative theory of abstractions’ accuracy-size tradeoff and uncover fundamental limits on their scalability. To that end, we establish connections with rate-distortion theory – the branch of information theory studying lossy compression [13, Chapter 10]. The key observation for the whole theory is that abstractions are information-theoretic encoder-decoder pairs, encoding trajectories of dynamical systems, in a higher-dimensional, ambient space. Rate represents abstraction size, while distortion is defined as the spatial average deviation between abstract trajectories and system ones, thus capturing the average accuracy of an abstraction. Then, building on recent developments in rate-distortion theory for generalized measurable sets [14, 15], we derive fundamental limits of abstractions’ accuracy-size tradeoff: for given system dynamics, we obtain a fundamental lower bound on the minimum abstraction distortion, for a given threshold on abstraction size. The fundamental lower bound depends on the complexity of the system’s dynamics, through generalized entropy. We demonstrate the tightness of the bound on certain dynamical systems. Finally, we showcase how the developed theory can be employed to construct optimal abstractions, in terms of the size-accuracy tradeoff, through an example on a chaotic system, and we provide a discussion towards a general procedure for constructing optimal abstractions.
Related work
Through decades of research, there has been considerable effort to construct scalable abstractions. Indicatively, [16, 17, 18] adapt the partition’s resolution depending on the local uncertainty a given state-space region induces to the abstraction. Further, [19] constructs multi-resolution abstractions, employing feedback-refinement relations. The work [20] employs optimal control, such that the generated trajectories result in smaller abstraction cells and only a portion of the state space needs to be partitioned. Although the above methods result in more scalable abstractions, they neither provide quantitative results on the accuracy-size tradeoff, nor optimize some metric describing it. Another approach to derive more accurate abstractions is introducing memory [21, 22], based on sequences of outputs. In [23], it is shown that the size of such memory-based abstractions increases exponentially with the sequence length for deterministic chaotic systems. Apart from adaptive-partitioning techniques, compositional methods [5, 24] decompose the system to smaller ones, that are abstracted more efficiently. However, they do not address scalability issues of abstracting each subsystem. Further, it is also worth mentioning [25], which, for a particular class of stochastic abstractions, demonstrates that partitioning the control space is unnecessary.
The connection between information theory and symbolic dynamics is well-known [26]; listing the whole literature on the topic is impossible. Worth mentioning is the work in [27], which employs rate-distortion theory to characterize complexity of dynamical systems and their relationship with so-called shifts111A class of discrete systems. Abstractions can be cast as shifts.. Nonetheless, this work does not consider the deviation between a shift and a dynamical system, but rather focuses on asymptotic results (arbitrarily large partition size, steady-state trajectories) and the qualitative question of if a system can be embedded into a shift. Thus, it does not provide a quantitative theory of the accuracy-size tradeoff. Finally, the works [28, 29, 30] employ rate-distortion theory, to compress models that are already discrete and do not focus on abstracting continuous dynamics with formal guarantees.
2 Preliminaries
2.1 Measure spaces, Hausdorff measure, generalized entropy
For our purposes, we make use of information theory over general measurable spaces, based on [14, 15]. Thus, we first recall some related notions. We denote the -dimensional Hausdorff measure222The Hausdorff measure is a generalization of the Lebesgue measure, and measures the size of a given set. E.g., , where is the unit circle embedded in . by . Denote the restriction of to the compact set by . Consider a measure over the measure space , where is the Borel -algebra of . When is absolutely continuous w.r.t. (denoted by ), we denote the Radon–Nikodym derivative by . When (assuming is dimensional), then is the probability distribution associated to . Absolute continuity suggests that is not concentrated in arbitrarily small balls in . We denote the volume of the unit ball in as , where the Gamma function .
Let be a finite union of compact, -dimensional, -manifolds. Denote by the constant such that
| (1) |
where . The constant always exists and is finite, as per [14, Lemma 1].
Consider a random variable , distributed over the measure space with probability measure . The generalized entropy of (w.r.t the Hausdorff measure) is
| (2) |
where denotes the expectation operator w.r.t. the random variable . The generalized entropy is the extension of the classical Shannon entropy to continuous spaces, and is a measure of uncertainty or complexity of a random variable. For a measure space with : a) is maximized for the uniform distribution , b) is bounded, when . Finally, employing a similar generalization as in (2), let us denote the generalized Rényi entropy with parameter by .
The example below shows how the above apply to computing the entropy a dynamical system’s trajectories.
Example 2.1 (Entropy of trajectories of the doubling map).
Consider the dynamical system , where the doubling map . Consider the set of -length trajectories of the system . Notice that is the union of 4 straight-line segments:
Further, consider random initial conditions , where is the uniform distribution over . It is well known that is invariant under the doubling map. Thus, the random variable , that is the system trajectories, is uniformly distributed across ; i.e., the probability measure . Its generalized entropy is
where we have used that the total length of the line segments is , and that , as is a probability measure.
2.2 Rate-distortion theory on measurable spaces
A typical setting in information theory is source coding, see Fig. 1. A source emits a message , which is a random variable over the measure space , with associated probability measure , where is assumed to be -dimensional. The encoder , where is finite, outputs the coded message . Finally, the decoder , upon receiving , decodes it into . Compression takes place by encoding the continuous message into a low-dimensional, finite coded message . The encoder cardinality determines the compression, and the compression rate is defined by . A distortion function measures the deviation of from the original message . A typical distortion function, when , is the squared error .333Here, we present a simplified setting of source coding, where the encoder-decoder is deterministic, and the source emits a single message. For the general theory, see [13, 15].
Of particular interest is the fundamental limit of the rate-distortion tradeoff, i.e. the following quantity:
where the expectation is taken w.r.t. the random variable . In words, is the minimum achievable average distortion, for a given compression rate threshold . The function has an inverse, , which is the minimum compression rate, for a given maximum expected distortion threshold . The following result provides a fundamental lower bound on and .
Theorem 2.1 (Generalized Shannon lower bound [15, Thm. 3.1, simplified]).
Let be a finite union of compact, -dimensional, -manifolds, and . Assume that and that is measurable. Consider the Euclidean distortion fucntion . Then
| (3) | ||||
| (4) |
Proof Sketch.
This is the special case of [15, Thm. 3.1] for (finite unions of) compact, -manifolds and Euclidean distortion. ∎
2.3 Transition systems
Definition 2.2 (Transition system).
A transition system is a tuple , where is the state space and is a transition relation.
A transition system is deterministic if, for any , there exists at most one , such that . Given a transition system , its -length behavior is defined as . That is, the -length behavior is the set of -long trajectories. Notice that .
3 Abstractions and the curse of dimensionality
3.1 Finite abstractions of dynamical systems
Throughout this work, we consider deterministic dynamical systems , with . Dynamical systems obtain the transition-system representation , where . We make the following assumption.
Assumption 1 (The state space).
The set is -dimensional, connected and compact.
Under this assumption, is an -dimensional subset of .
Let us introduce abstractions of dynamical systems.
Definition 3.1 (Measurable Partition).
Given a set , a finite collection of measurable, disjoint sets , such that , is a measurable partition of .
Definition 3.2 (Abstraction).
Given a dynamical system with transition-system representation and a measurable partition of , a transition system is an abstraction of if, for any and , such that and , we have .
Although the dynamical system is deterministic, the abstraction is generally non-deterministic. With a slight abuse of formality, we often treat trajectories of the abstraction (with ) as subsets of , that is .
Theorem 3.3 (Behavioral inclusion [1, Theorem 4.18, simplified]).
For a system , a partition of and an abstraction of , the following holds for any : .
In fact, is -dimensional, and covers the -dimensional set of system trajectories . This observation is instrumental in this work. Through behavioral inclusion, abstractions encode information about the infinite, continuous system behavior into the finite abstraction behavior set . While this enables computational methods to verification problems for dynamical systems, it also generally entails information loss, as the following section explains.
3.2 Abstraction-based verification and information loss
In typical verification problems, we are given a set of initial conditions for the system and we have to check if the corresponding set of system trajectories satisfies a given property. For example, in the case of safety, we have to check if , where is an unsafe set. Computing the exact reachable set is generally impossible. Abstractions address this problem by computing the corresponding set of abstract state trajectories , which is tractable, as the abstraction is finite. Notice that, by behavioral inclusion, we have . Finally, for safety verification, if , then one may safely deduce that the system is safe.
As abstractions group system states in sets , information loss is inevitable. In general, the partition needs to have a relatively high resolution, to recover a meaningful verification answer. E.g., in the extreme case of , for any set of initial conditions , the abstraction returns , i.e. the whole ambient space of -length trajectories. As such, for small , the abstraction does not accurately represent the system . On the other hand, for large , where the abstraction is more accurate, the computations on the abstraction become heavier – even intractable. Thus, there is a trade-off between abstraction accuracy and partition size . In what follows, we provide a statistical, quantitative theory of the accuracy-size tradeoff, based on rate-distortion theory, and provide bounds on the accuracy-size tradeoff.
4 Information-theoretic framework for finite abstractions
In what follows, consider the dynamical system , with , under Assumption 1. The dynamical system admits the transition system representation . Towards deriving a statistical quantification on the accuracy-size tradeoff of abstractions, we impose a probability distribution on the system’s initial conditions. Verification, then, becomes: sampling an initial condition , with , and afterward employing the abstraction to give a verification answer.444To be mathematically precise, is a random variable over , where is the -dimensional Lebesgue measure, with probability measure such that .
Let us show how an abstraction can be viewed as an encoder-decoder pair of system trajectories . For the following, we refer the reader to Figure 2. The system (source) samples an initial condition and generates the trajectory (the message). The encoder looks at the initial condition and returns the corresponding abstract initial condition:555Abstractions only use (sets of) initial conditions, for verification, as explained in Section 3. Nonetheless, from an information-theoretic perspective, and are equivalent, as is one-to-one; that is, and carry the exact same information when the dynamics is known.
| (5) |
The decoder , upon receiving the initial condition , outputs the set of all abstract state trajectories corresponding to . That is, for the decoder we have with
| (6) |
The compression rate, determined by the encoder’s size, is . Indeed, notice that the abstraction encodes the system’s trajectories into exactly outcomes, that is .
To capture the accuracy of the abstraction, and compare the message and output , we employ a distortion function defined by
| (7) |
In words, returns the worst possible distortion between system trajectories and abstract trajectories , averaged over the time horizon . This is in-line with abstraction-based verification, where the worst-case outcome is considered.
Let us now explain what “expected (or average) distortion”, for a given abstraction , means in the context of verification. The expected distortion is taken w.r.t. the initial-condition distribution . Thus, for verification problems, where the initial condition , is the average distortion. As measures the distance between system trajectories and abstract state trajectories, the expected distortion is thus the spatial, statistical average of the deviation between system trajectories and abstract state trajectories, over initial conditions in with distribution .
Remark 1 (Initial-condition distribution).
The distribution weights how much each initial condition contributes to the average distortion . Arguably, the most suitable choice for is the uniform distribution, as, when constructing an abstraction, the initial condition is unknown and all initial conditions are considered equally likely.
Finally, the optimal abstraction accuracy-size tradeoff is captured by the following rate-distortion quantity:
That is, the minimum average deviation of abstract state trajectories and system trajectories, over all possible abstractions with a given upper-bound on partition size. Likewise, we also consider the inverse , which is the ( of the) minimum partition size for a given upper threshold on the average deviation of abstract state trajectories and system trajectories.
Remark 2 (Statistics of abstractions’ accuracy and size).
The proposed theory does not aim at providing (probabilistic) guarantees on the correctness of abstractions. These are a-priori provided by Definition 3.2, through behavioral inclusion or related properties. Instead, the theory developed here provides (guarantees on the) statistical quantification of abstractions’ accuracy and size.
Remark 3 (The message space is ).
Even though we have reduced everything thus far to the initial condition distribution , the message space is , i.e. the system trajectories. Indeed, although the expectation can be taken either w.r.t. or w.r.t. the random variable (as is one-to-one), the distortion considers the whole . As such, in the coming section, to derive bounds on and , employing the theory presented in Section 2.2, we reason about the random variable and its associated probability measure over , which is solely determined by the initial condition distribution and the system dynamics . Hence, we take expectations and interchangeably.
5 Rate-distortion theory and a fundamental limit for abstractions
5.1 A fundamental limit on abstracting dynamical systems
Having modeled the statistics of abstraction-based verification as a source coding problem, we now proceed to probing the fundamental limits of the abstraction accuracy-size tradeoff, by providing lower bounds on and .
Note that abstractions, given the message, output sets and the associated distortion (7) is set-based. This is in contrast to typical encoder-decoder pairs considered in Thm. 2.1, which output points and the distortion function is the Euclidean distance. Thus, the results from Section 2.2 do not straightforwardly apply, to derive bounds on and . In what follows, we derive said bounds, both employing Thm. 2.1 and quantifying the aforementioned distortion disparity. This enables a rate-distortion theory for abstractions. First, we present an intermediate, purely geometric result, providing a lower bound on the average distortion of a given abstraction.
Proposition 5.1 (Abstraction vs. encoder distortion).
Consider a dynamical system with transition system representation , and let Assumption 1 hold. Let be a trajectory of , with . Consider a measurable partition of and an associated abstraction , and let , where are given by (5) and (6). Consider an encoder-decoder pair , where and , where is the Chebyshev center of the set . Denote the Chebyshev radius of set , by . Let . The following lower bound holds for the average distortion of the abstraction:
| (8) |
where is the distortion function in (7).
Prop. 5.1 suggests that the average distortion of an abstraction is lower bounded by the expected distortion of a particular encoder-decoder pair (the one outputting the Chebyshev centers of the abstractions outputs) plus a term depending on the size of the abstraction’s outputs. Employing Prop. 5.1, in Theorem 5.2 below, we derive fundamental lower bounds on and , by lower-bounding each of the two terms in the right-hand side of (8) separately, over all abstractions with the same rate (or the same expected distortion). The first term in the right-hand side of (8) can be lower bounded as in Thm. 2.1, being the expected distortion of an encoder-decoder pair with the same rate as the abstraction. To bound the second term, we observe that the abstraction’s outputs define an -dimensional cover666This cover is precisely and note that takes values in the set . Thus . of , and the cover’s size is equal to the abstraction’s size; the bound is then obtained by lower-bounding over all possible -dimensional covers of , using geometric measure theory (see Lemma 8.1). For an illustrative example of the above, see Fig. 3.
Theorem 5.2 (Shannon lower bound for abstractions).
Consider a dynamical system with transition system representation , and let Assumption 1 hold. Let be a trajectory of , with . Assume that:
-
1.
is a finite union of bounded, -dimensional -manifolds,
-
2.
, with .
The average distortion of any abstraction with partition size , where , is lower bounded as follows
| (9) | ||||
where is defined by (1).
Notice that a valid lower bound is obtained for any value of in the right-hand side of (9); maximization over provides the tightest bound. In the numerical examples in Section 6, we compute the bound for multiple values of . Further, one may recover a lower bound on numerically, by fixing in the left-hand side of (9) and solving numerically for (as the right-hand side is a decreasing function of , this is trivially computed by, e.g., bisection methods). The above theorem, thus, provides fundamental limits on the accuracy-size tradeoff, or the scalability, of abstractions, for given dynamics .
Remark 4 (On the assumptions of Thm. 5.2).
The first assumption of Thm. 5.2, requiring to be a union of smooth manifolds, is satisfied whenever the dynamics is piecewise continuously-differentiable. The second assumption is satisfied whenever is piecewise continuous and the initial condition distribution is such that , where the Lebesgue measure.
5.2 Interpretation and calculus for Theorem 5.2
Before we proceed with the interpretation of Thm. 5.2, let us show how one may compute , and , which are required to compute the lower bound (9). Let us define the function by
| (10) |
which maps an initial state into its -long trajectory.
Proposition 5.3 (Computing and ).
Consider a system with measurable and piecewise Lipschitz777That is, is a countable union of Lebesgue-measurable sets such that the restriction of to each is Lipschitz. This condition may be relaxed to approximately Lipschitz, see [31, Thm. 3.1.8, Sec. 3.2.1], which also implies approximate differentiability. and differentiable. Let Assumption 1 hold, and . The following expressions hold
| (11) |
| (12) |
| (13) |
where denotes the Jacobian matrix of .
Proposition 5.4 (Computing ).
Consider a system , with differentiable a.e., and let Assumption 1 hold. The following facts on hold:
-
1.
, if is affine;
-
2.
, if is piecewise affine with modes;
-
3.
, if is Lipschitz continuous with constant .
Remark 5 ( at high rates).
As the partition size grows large, the Chebyshev balls of the abstraction outputs (c.f. Prop. 5.1 and Lemma 8.1) become small. Hence, in the case of smooth , their intersection with the manifold approaches the case of an affine system, with Similarly, in the piecewise affine case, for sufficiently small balls – at least –, these can be chosen to intersect with at most one piece each. Thus, to reduce conservatism of the bound in such high-rate cases, one can inspect the lower bound of Thm. 2.1 by using We demonstrate this in the numerical examples in Section 6.
We proceed to discussing Thm. 5.2. First, inspecting (9), systems with more complex dynamics lead to bigger abstraction distortion, for fixed abstraction size, since the right-hand side is increasing w.r.t. and the Rényi entropy ; equivalently, more complex systems require bigger abstraction size for the same distortion.
Regarding the effect of the time-horizon on the bound (9), we have to inspect the effect that has on and . Let us first demonstrate that, for the “simple” dynamics of exponentially stable systems, the abstraction distortion converges to 0 for .
Example 5.1 (Exponentially stable systems).
Consider a system whose origin is exponentially stable on a given compact set in . Then, there is a Lyapunov function satisfying for a given and, for all s.t. (w.l.o.g.), with This allows us to create an abstraction with the associated partition for and ; and transitions if and only if or . The abstraction encapsulates the fact that, after steps or less, all trajectories reach the sublevel set . We get that and are overapproximations of the diameters of and respectively. Then, recalling the distortion (7), for any trajectory of the system,
which can be made arbitrarily small by suitable choice of .
Indeed, as the following example shows, for Schur LTI systems, the bound (9) converges to 0, for , which demonstrates the bound’s tightness.
Example 5.2 (Schur LTI systems).
For a Schur LTI system, we have and
Thus, both and are finite, for , and the bound (9) converges to 0.
Conversely, the example below shows that, even for marginally stable systems, the bound may not vanish with .
Example 5.3 (Marginally stable LTI system).
6 Numerical Examples
6.1 The doubling map
Consider the doubling map from Example 2.1. For any trajectory length , its behavior is composed by line segments in described by uniformly distributed with giving for all Using Prop. 5.4 for piecewise affine systems, we obtain enabling us to compute the lower bound in Theorem 5.2. In light of Remark 5, we also determine the high-rate lower bound by picking The lower-bound curves can be seen in Fig. 4 for and . It is apparent that the tightest bound is obtained with and In this case, the bound is consistently half of that of the optimal distortion , which is remarkably close. As a comparison, the standard Shannon lower bound is 1.42 times smaller than the optimal quantizer distortion of a uniform random variable in , in the standard source-coding setting.
Let us explain how we were able to compute the actual optimal achievable abstraction distortion. Following the reasoning in Section 5, we first build an optimal cover for (afterwards, we show that this optimal cover admits a distortion that is equal to that of a specific abstraction with the same rate, and thus its rate-distortion curve is optimal, among all abstractions). For a given , consider , where is an arbitrary natural number. Since all segments are equiprobable and congruent, and probability is uniform among them, the optimal partition of is obtained by cutting each of the segments in equal pieces. The expected error between a trajectory and the Chebyshev center of its corresponding piece is which is obtained by computing the squared length of each segment, followed by using the variance of the uniform distribution, giving . To determine the corresponding abstraction lower bound, we use the distortion in (7) on the aforementioned pieces. For one dimensional line segments, . Its expected value is thus the second moment of a uniform from to . Using we obtain where we used , and is the optimal distortion among all -dimensional covers of .
Finally, we show that
Notice that, in general, , as abstractions are covers. However, the optimal cover built above determines an abstraction that gives the same distortion, and thus we have . First, is the uniform grid with segments of length . Each trajectory of the abstraction is a sequence of segments of lengths , thus giving a box in containing any related trajectory . For each box, the set of related trajectories is precisely a diagonal of the box. As such, the furthest edge along the diagonal is again a solution to . hence, the abstraction has the same distortion as the optimal cover. The above reasoning is illustrated in Fig. 5
6.2 A 3D nonlinear system and abstractions with uniform grids
Consider the nonlinear system where
and which is forward invariant under . This system has multiple equilibria, hence the origin is not stable in . For each in we build abstractions by using uniform partitioning of with grids of size and determining the transition map using interval arithmetic. Then, we compute the distortion lower bound from Theorem 5.2 using Prop. 5.3 and Prop. 5.4, case 3.888The entropies were computed using Monte-Carlo integration with 10000 samples, while Jacobians and the Lipschitz constant were determined using automatic differentiation. Furthermore, lower bounds were also computed by picking , in light of Remark 5. The resulting distortion lower bound curves can be seen in Fig. 6. In this case, as the abstraction we construct is not necessarily the optimal one, its expected distortion is generally 100x higher than the fundamental lower bound. Still, this demonstrates the validity of the lower bound, even in cases with nonlinear dynamics; even more importantly, it indicates how conservative standard abstractions with uniform grids might be.
7 Conclusion and Future Research:
Towards Minimal Abstractions
We have developed a statistical, quantitative theory on the accuracy-size tradeoff of finite abstractions of dynamical systems. Through this theory, we have uncovered fundamental limits on their scalability: given the system dynamics, we have obtained a fundamental bound on the achievable abstraction accuracy, for a given abstraction size. To that end, we have established connections with rate-distortion theory. From an information-theoretic perspective, we have developed rate-distortion theory for the particular class of encoder-decoder pairs that abstractions constitute: set-based, with set-based distortion. Overall, this novel theory quantifies scalability limits of abstractions, and provides insights on how the complexity of the dynamics to be abstracted dictates these limits.
Most importantly, the developed theory may be employed to construct minimal abstractions, harnessing their full scalability potential. From this work, it becomes clear that, to construct minimal abstractions, one has to solve the problem of encoding trajectories of dynamical systems, through coverings in a high-dimensional, ambient space. In fact, this has already been demonstrated, in Section 6.1, where we construct a minimal abstraction of the doubling-map dynamics. Future research will thus focus on the general problem of constructing minimal abstractions. Towards that goal, information-theoretic algorithms optimizing the rate-distortion tradeoff, such as the information bottleneck method (see [32]), could be adapted for abstractions.
8 Technical Results and Proofs
Proof of Prop. 5.1.
For any given , we will prove that
where note that and . Then, the proof is complete by applying the expectation operator to the above inequality.
Define . The function is convex, being the pointwise maximum of the convex quadratic maps . We have and .
Define the set of maximizers
The subdifferential of at is , where denotes the convex hull operator. Since minimizes , the optimality condition gives . Hence there exist finitely many points and coefficients , , such that
| (14) |
Towards proving Thm. 5.2, we introduce the following lemma.
Lemma 8.1.
Let be a finite union of bounded, disjoint, -dimensional -manifolds. Let be a random variable in with probability measure and density . Then, for any collection of measurable, -dimensional sets covering , the following holds for any :
| (15) | ||||
| , |
where is defined by (1), is the indicator function of set , denotes the Chebyshev radius of , and is the volume of the unit ball in .
Proof.
Define . Then forms a measurable -dimensional cover of . Let and . Because then , giving
Hence it suffices to lower bound over collections .
For a given , by definition, for some Chebyshev center . For any , we have:
where in the third step we used Hölder’s inequality, and in the final step we used the inequality (1). Defining , from the inequality above we have:
where . Multiplying by gives
| (16) |
Our job now is to find a lower bound to over discrete probabilities . First, notice that since . Therefore, the map is convex in . Thus, by Jensen’s inequality,
Substituting in (16) gives
Now, by definition of the Rényi entropy,
which by the properties of the Radon–Nikodym derivative gives
And, using gives
Therefore, for any , we have
∎
We proceed with the proof of Thm. 5.2.
Proof of Thm. 5.2.
We make use of Prop. 5.1. Take (8) and minimize both sides over all possible partitions with size and associated abstractions . We have
where recall that , and that for a given abstraction with corresponding encoder-decoder pair , we have with and , where is the Chebyshev center of the set ; and is the Chebyshev radius of . Thus, is the output of the encoder-decoder pair with rate and message . Hence, the first term in the left-hand side of the above inequality, can be lower bounded by employing Thm. 2.1, to obtain:
To bound the second term, we employ Lemma 8.1. Notice that the abstraction’s outputs are -dimensional and define a cover999This cover is precisely and note that takes values in the set . Thus . of (which is -dimensional) with cardinality (the same as the state-space partition). Thus, the term can be lower bounded as in (15), where we replace by , by , by , and by . ∎
Proof of Prop. 5.3.
Fix any measurable subset . Because the definitions of and imply that
But also, since is injective, the area formula [31, Thm. 3.2.5] gives
implying that, for almost all
| (17) |
Then, (2) becomes
Likewise, the area formula gives
and, applying (17) gives (12). Finally, in the particular case of we have
which gives the desired result by exploiting the fact that is monotonically increasing.
∎
Lemma 8.2.
Let and , be a bi-Lipschitz function satisfying
for some . Then for every and ,
Proof.
Fix and and define and its pre-image . We start by finding a ball in bounding .
For any , we have , so
By the lower Lipschitz bound , it follows that This implies that is contained in some -dimensional ball of radius . Therefore,
Proof of Prop. 5.4.
We again use the function , defined by (10). Since by assumption is full dimensional in , the tightest value for is Now we look at each case.
Case (1) follows trivially by the observation that is an -dimensional affine subset of , and that the intersection of an -ball of radius and a plane of dimension is a ball of dimension and radius Hence, , for all .
Case (2): If is piecewise affine, so is , which has at most disjoint pieces. Denote by each such piece of , which is a bounded, connected -dimensional subset of some affine subspace of Thus, , with . Then, for all and
where in the last inequality we have used case (1) and the fact that .
Case (3): It is easy to see that is bi-Lipschitz with
Hence the result comes from applying Lemma 8.2. ∎
References
- [1] P. Tabuada, Verification and control of hybrid systems: a symbolic approach. Springer Science & Business Media, 2009.
- [2] A. Lavaei, S. Soudjani, A. Abate, and M. Zamani, “Automated verification and synthesis of stochastic hybrid systems: A survey,” Automatica, vol. 146, p. 110617, 2022.
- [3] A. Girard, G. Pola, and P. Tabuada, “Approximately bisimilar symbolic models for incrementally stable switched systems,” IEEE Transactions on Automatic Control, vol. 55, no. 1, pp. 116–126, 2009.
- [4] M. Rungger and M. Zamani, “Scots: A tool for the synthesis of symbolic controllers,” in Proceedings of the 19th international conference on hybrid systems: Computation and control, 2016, pp. 99–104.
- [5] K. Mallik, A.-K. Schmuck, S. Soudjani, and R. Majumdar, “Compositional synthesis of finite-state abstractions,” IEEE Transactions on Automatic Control, vol. 64, no. 6, pp. 2629–2636, 2018.
- [6] M. Zamani, P. M. Esfahani, R. Majumdar, A. Abate, and J. Lygeros, “Symbolic control of stochastic systems via approximately bisimilar finite abstractions,” IEEE Transactions on Automatic Control, vol. 59, no. 12, pp. 3135–3150, 2014.
- [7] M. Lahijanian, S. B. Andersson, and C. Belta, “Formal verification and synthesis for discrete-time stochastic systems,” IEEE Transactions on Automatic Control, vol. 60, no. 8, pp. 2031–2045, 2015.
- [8] A. Abate, J.-P. Katoen, J. Lygeros, and M. Prandini, “Approximate model checking of stochastic hybrid systems,” European Journal of Control, vol. 16, no. 6, pp. 624–641, 2010.
- [9] R. Coppola, A. Peruffo, and M. Mazo, “Data-driven abstractions for verification of linear systems,” IEEE Control Systems Letters, vol. 7, pp. 2737–2742, 2023.
- [10] T. Badings, L. Romao, A. Abate, D. Parker, H. A. Poonawala, M. Stoelinga, and N. Jansen, “Robust control for dynamical systems with non-gaussian noise via formal abstractions,” Journal of Artificial Intelligence Research, vol. 76, pp. 341–391, 2023.
- [11] A. Devonport, A. Saoud, and M. Arcak, “Symbolic abstractions from data: A pac learning approach,” in 2021 60th IEEE Conference on Decision and Control (CDC). IEEE, 2021, pp. 599–604.
- [12] M. Kazemi, R. Majumdar, M. Salamati, S. Soudjani, and B. Wooding, “Data-driven abstraction-based control synthesis,” Nonlinear Analysis: Hybrid Systems, vol. 52, p. 101467, 2024.
- [13] T. M. Cover, Elements of information theory. John Wiley & Sons, 1999.
- [14] E. Riegler, H. Bölcskei, and G. Koliander, “Rate-distortion theory for general sets and measures,” in 2018 IEEE International Symposium on Information Theory (ISIT). IEEE, 2018, pp. 101–105.
- [15] E. Riegler, G. Koliander, and H. Bölcskei, “Lossy compression of general random variables,” Information and Inference: A Journal of the IMA, vol. 12, no. 3, pp. 1759–1829, 2023.
- [16] S. Esmaeil Zadeh Soudjani and A. Abate, “Adaptive and sequential gridding procedures for the abstraction and verification of stochastic processes,” SIAM Journal on Applied Dynamical Systems, vol. 12, no. 2, pp. 921–956, 2013.
- [17] S. Adams, M. Lahijanian, and L. Laurenti, “Formal control synthesis for stochastic neural network dynamic models,” IEEE Control Systems Letters, vol. 6, pp. 2858–2863, 2022.
- [18] Y. Tazaki and J.-i. Imura, “Discrete-state abstractions of nonlinear systems using multi-resolution quantizer,” in International Workshop on Hybrid Systems: Computation and Control. Springer, 2009, pp. 351–365.
- [19] K. Hsu, R. Majumdar, K. Mallik, and A.-K. Schmuck, “Multi-layered abstraction-based controller synthesis for continuous-time systems,” in Proceedings of the 21st International Conference on Hybrid Systems: Computation and Control (part of CPS Week), 2018, pp. 120–129.
- [20] J. Calbert, L. N. Egidio, and R. M. Jungers, “Smart abstraction based on iterative cover and non-uniform cells,” IEEE Control Systems Letters, vol. 8, pp. 2301–2306, 2024.
- [21] A.-K. Schmuck and J. Raisch, “Asynchronous l-complete approximations,” Systems & Control Letters, vol. 73, pp. 67–75, 2014.
- [22] A. Banse, G. Delimpaltadakis, L. Laurenti, M. Mazo Jr, and R. M. Jungers, “Memory-dependent abstractions of stochastic systems through the lens of transfer operators,” in Proceedings of the 28th ACM International Conference on Hybrid Systems: Computation and Control, 2025, pp. 1–12.
- [23] G. A. Gleizer and M. Mazo Jr, “Chaos and order in event-triggered control,” IEEE Transactions on Automatic Control, vol. 68, no. 11, pp. 6541–6556, 2023.
- [24] A. Lavaei, S. Soudjani, and M. Zamani, “Compositional abstraction of large-scale stochastic systems: A relaxed dissipativity approach,” Nonlinear Analysis: Hybrid Systems, vol. 36, p. 100880, 2020.
- [25] G. Delimpaltadakis, M. Lahijanian, M. Mazo Jr, and L. Laurenti, “Interval markov decision processes with continuous action-spaces,” in Proceedings of the 26th ACM International Conference on Hybrid Systems: Computation and Control, 2023, pp. 1–10.
- [26] D. Lind and B. Marcus, An introduction to symbolic dynamics and coding. Cambridge university press, 2021.
- [27] E. Lindenstrauss and M. Tsukamoto, “From rate distortion theory to metric mean dimension: variational principle,” IEEE Transactions on Information Theory, vol. 64, no. 5, pp. 3590–3609, 2018.
- [28] D. Abel, D. Arumugam, K. Asadi, Y. Jinnai, M. L. Littman, and L. L. Wong, “State abstraction as compression in apprenticeship learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 3134–3142.
- [29] O. Biza, R. Platt, J.-W. van de Meent, and L. L. Wong, “Learning discrete state abstractions with deep variational inference,” arXiv preprint arXiv:2003.04300, 2020.
- [30] D. T. Larsson, D. Maity, and P. Tsiotras, “A generalized information-theoretic framework for the emergence of hierarchical abstractions in resource-limited systems,” Entropy, vol. 24, no. 6, p. 809, 2022.
- [31] H. Federer, Geometric measure theory, ser. Grundlehren Math. Wiss. Springer, Cham, 1969, vol. 153.
- [32] N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” arXiv preprint physics/0004057, 2000.