An Information Theory of Finite Abstractions and their Fundamental Scalability Limits

\authorblockNGiannis Delimpaltadakis and Gabriel Gleizer Giannis Delimpaltadakis is with the Control Systems Technology group, Mechanical Engineering, Eindhoven University of Technology. Gabriel Gleizer is with the Delft Center for Systems and Control, Mechanical Engineering, Delft University of Technology. Emails: [email protected], [email protected].
This research is partially supported by the project “Chaotic sampling for secure and sustainable networks of control systems” with file number 21937 of the research programme VENI AES 2024 which is (partly) financed by the Dutch Research Council (NWO) under the grant https://doi.org/10.61686/WZNAX74774.
Abstract

Finite abstractions are discrete approximations of dynamical systems, such that the set of abstraction trajectories contains, in a formal sense, all system trajectories. There is a consensus that abstractions suffer from the curse of dimensionality: for the same “accuracy” (how closely the abstraction represents the system), the abstraction size scales poorly with system dimensions. And, yet, after decades of research on abstractions, there are no formal results concerning their accuracy-size tradeoff. In this work, we derive a statistical, quantitative theory of abstractions’ accuracy-size tradeoff and uncover fundamental limits on their scalability, through rate-distortion theory – the branch of information theory studying lossy compression. Abstractions are viewed as encoder-decoder pairs, encoding trajectories of dynamical systems in a higher-dimensional ambient space. Rate represents abstraction size, while distortion describes abstraction accuracy, defined as the spatial average deviation between abstract trajectories and system ones. We obtain a fundamental lower bound on the minimum abstraction distortion, given the system dynamics and a threshold on abstraction size. The bound depends on the complexity of the dynamics, through generalized entropy. We demonstrate the bound’s tightness on certain dynamical systems. Finally, we showcase how the developed theory can be employed to construct optimal abstractions, in terms of the size-accuracy tradeoff, through an example on a chaotic system.

1 Introduction

Modern engineering systems are becoming more complex and must meet intricate specifications in safety-critical situations. For instance, a self-driving car must follow traffic rules, avoid collisions, and optimize speed and fuel consumption. Due to the complexity of these systems, traditional analytic methods for verification and control are intractable. For over two decades, to address verification and control of complex dynamics and objectives, abstraction-based methods have flourished [1, 2]. Given a dynamical system, these methods construct a finite system – the abstraction –, arising from partitioning the state (and control) space of the original system, such that all trajectories of the original system are contained, in a formal sense, in the set of abstraction trajectories. Employing this property, one may solve an intractable verification or control problem for the original system over the finite abstraction, with formal guarantees of correctness. Over the years, research on abstractions has spanned deterministic systems [3, 4, 5], stochastic systems [6, 7, 8], and, lately, data-driven scenarios [9, 10, 11, 12].

Despite their immense success, there is a consensus that abstractions suffer from the curse of dimensionality, limiting their practical relevance; for a given accuracy (how closely the abstraction describes the true dynamics), the abstraction size scales poorly with system dimensions. And even though abstractions have received considerable interest in the past decades, there are still no formal results concerning their curse of dimensionality and accuracy-size tradeoff.

Contributions

In this work, we derive a statistical, quantitative theory of abstractions’ accuracy-size tradeoff and uncover fundamental limits on their scalability. To that end, we establish connections with rate-distortion theory – the branch of information theory studying lossy compression [13, Chapter 10]. The key observation for the whole theory is that abstractions are information-theoretic encoder-decoder pairs, encoding trajectories of dynamical systems, in a higher-dimensional, ambient space. Rate represents abstraction size, while distortion is defined as the spatial average deviation between abstract trajectories and system ones, thus capturing the average accuracy of an abstraction. Then, building on recent developments in rate-distortion theory for generalized measurable sets [14, 15], we derive fundamental limits of abstractions’ accuracy-size tradeoff: for given system dynamics, we obtain a fundamental lower bound on the minimum abstraction distortion, for a given threshold on abstraction size. The fundamental lower bound depends on the complexity of the system’s dynamics, through generalized entropy. We demonstrate the tightness of the bound on certain dynamical systems. Finally, we showcase how the developed theory can be employed to construct optimal abstractions, in terms of the size-accuracy tradeoff, through an example on a chaotic system, and we provide a discussion towards a general procedure for constructing optimal abstractions.

Related work

Through decades of research, there has been considerable effort to construct scalable abstractions. Indicatively, [16, 17, 18] adapt the partition’s resolution depending on the local uncertainty a given state-space region induces to the abstraction. Further, [19] constructs multi-resolution abstractions, employing feedback-refinement relations. The work [20] employs optimal control, such that the generated trajectories result in smaller abstraction cells and only a portion of the state space needs to be partitioned. Although the above methods result in more scalable abstractions, they neither provide quantitative results on the accuracy-size tradeoff, nor optimize some metric describing it. Another approach to derive more accurate abstractions is introducing memory [21, 22], based on sequences of outputs. In [23], it is shown that the size of such memory-based abstractions increases exponentially with the sequence length for deterministic chaotic systems. Apart from adaptive-partitioning techniques, compositional methods [5, 24] decompose the system to smaller ones, that are abstracted more efficiently. However, they do not address scalability issues of abstracting each subsystem. Further, it is also worth mentioning [25], which, for a particular class of stochastic abstractions, demonstrates that partitioning the control space is unnecessary.

The connection between information theory and symbolic dynamics is well-known [26]; listing the whole literature on the topic is impossible. Worth mentioning is the work in [27], which employs rate-distortion theory to characterize complexity of dynamical systems and their relationship with so-called shifts111A class of discrete systems. Abstractions can be cast as shifts.. Nonetheless, this work does not consider the deviation between a shift and a dynamical system, but rather focuses on asymptotic results (arbitrarily large partition size, steady-state trajectories) and the qualitative question of if a system can be embedded into a shift. Thus, it does not provide a quantitative theory of the accuracy-size tradeoff. Finally, the works [28, 29, 30] employ rate-distortion theory, to compress models that are already discrete and do not focus on abstracting continuous dynamics with formal guarantees.

2 Preliminaries

2.1 Measure spaces, Hausdorff measure, generalized entropy

For our purposes, we make use of information theory over general measurable spaces, based on [14, 15]. Thus, we first recall some related notions. We denote the mm-dimensional Hausdorff measure222The Hausdorff measure is a generalization of the Lebesgue measure, and measures the size of a given set. E.g., 1(𝒞)=2π\mathcal{H}^{1}(\mathcal{C})=2\pi, where 𝒞{\mathcal{C}} is the unit circle embedded in n\mathbb{R}^{n}. by m\mathcal{H}^{m}. Denote the restriction of m\mathcal{H}^{m} to the compact set 𝒦{\mathcal{K}} by 𝒦m\mathcal{H}^{m}_{{\mathcal{K}}}. Consider a measure μ\mu over the measure space (𝒳,Σ𝒳,ν)(\mathcal{X},\Sigma_{\mathcal{X}},\nu), where Σ𝒳\Sigma_{\mathcal{X}} is the Borel σ\sigma-algebra of 𝒳n\mathcal{X}\subseteq\mathbb{R}^{n}. When μ\mu is absolutely continuous w.r.t. ν\nu (denoted by μν\mu\ll\nu), we denote the Radon–Nikodym derivative by dμdν\frac{\mathrm{d}\mu}{\mathrm{d}\nu}. When ν=𝒳m\nu=\mathcal{H}^{m}_{\mathcal{X}} (assuming 𝒳\mathcal{X} is mm-dimensional), then dμd𝒳m\frac{\mathrm{d}\mu}{\mathrm{d}\mathcal{H}^{m}_{\mathcal{X}}} is the probability distribution associated to μ\mu. Absolute continuity μ𝒳m\mu\ll\mathcal{H}^{m}_{\mathcal{X}} suggests that μ\mu is not concentrated in arbitrarily small balls in 𝒳\mathcal{X}. We denote the volume of the unit ball in n\mathbb{R}^{n} as vn=πn/2Γ(n/2+1)v_{n}=\frac{\pi^{n/2}}{\Gamma(n/2+1)}, where the Gamma function Γ(a)=0ta1etdt\Gamma(a)=\int_{0}^{\infty}t^{a-1}e^{-t}\mathrm{d}t.

Let 𝒳n\mathcal{X}\subset\mathbb{R}^{n} be a finite union of compact, mm-dimensional, C1C^{1}-manifolds. Denote by c𝒳>0c_{\mathcal{X}}>0 the constant such that

𝒳m(B(x^,δ))c𝒳δm,for all x^nand δ>0,\mathcal{H}^{m}_{\mathcal{X}}(B(\hat{x},\delta))\leq c_{\mathcal{X}}\delta^{m},\quad\text{for all }\hat{x}\in\mathbb{R}^{n}\ \text{and }\delta>0, (1)

where B(x^,δ){x𝒳:xx^<δ}B(\hat{x},\delta)\coloneqq\{x\in\mathcal{X}:\ \|x-\hat{x}\|<\delta\}. The constant c𝒳c_{\mathcal{X}} always exists and is finite, as per [14, Lemma 1].

Consider a random variable xx, distributed over the measure space (𝒳,Σ𝒳,𝒳m)(\mathcal{X},\Sigma_{\mathcal{X}},\mathcal{H}^{m}_{\mathcal{X}}) with probability measure μx\mu_{x}. The generalized entropy of xx (w.r.t the Hausdorff measure) is

h(x)𝔼x[log(dμxd𝒳m)],h(x)\coloneqq-\mathbb{E}_{x}\!\left[\log\Big(\frac{\mathrm{d}\mu_{x}}{\mathrm{d}\mathcal{H}^{m}_{\mathcal{X}}}\Big)\right], (2)

where 𝔼x[]\mathbb{E}_{x}[\cdot] denotes the expectation operator w.r.t. the random variable xx. The generalized entropy is the extension of the classical Shannon entropy to continuous spaces, and is a measure of uncertainty or complexity of a random variable. For a measure space (𝒳,Σ𝒳,𝒳m)(\mathcal{X},\Sigma_{\mathcal{X}},\mathcal{H}^{m}_{\mathcal{X}}) with 0<𝒳m(𝒳)<0<\mathcal{H}^{m}_{\mathcal{X}}(\mathcal{X})<\infty: a) h(x)h(x) is maximized for the uniform distribution μx=𝒳1/𝒳1(𝒳)\mu_{x}=\mathcal{H}^{1}_{\mathcal{X}}/\mathcal{H}^{1}_{\mathcal{X}}(\mathcal{X}), b) h(x)h(x) is bounded, when μx𝒳m\mu_{x}\ll\mathcal{H}^{m}_{\mathcal{X}}. Finally, employing a similar generalization as in (2), let us denote the generalized Rényi entropy with parameter a+{1}a\in\mathbb{R}_{+}\setminus\{1\} by ha(x):=11alog𝔼x[(dμxd𝒳m)a1]h_{a}(x):=\frac{1}{1-a}\log\mathbb{E}_{x}[(\frac{\mathrm{d}\mu_{x}}{\mathrm{d}\mathcal{H}_{\mathcal{X}}^{m}})^{a-1}].

The example below shows how the above apply to computing the entropy a dynamical system’s trajectories.

Example 2.1 (Entropy of trajectories of the doubling map).

Consider the dynamical system x+=f(x)x^{+}=f(x), where the doubling map f:[0,1][0,1]:x2xmod 1f:[0,1]\to[0,1]:x\mapsto 2x\ \mathrm{mod}\ 1. Consider the set of 33-length trajectories of the system :={(x0,f(x0),f(f(x0))):x0[0,1]}[0,1]3{\mathcal{B}}:=\{(x_{0},f(x_{0}),\allowbreak f(f(x_{0}))):\ x_{0}\in[0,1]\}\subseteq[0,1]^{3}. Notice that {\mathcal{B}} is the union of 4 straight-line segments:

=\displaystyle{\mathcal{B}}= {(x0,2x0,4x0):x0[0,.25]}\displaystyle\{(x_{0},2x_{0},4x_{0}):x_{0}\in[0,.25]\}\cup
{(x0,2x0,4x01):x0[.25,.5]}\displaystyle\{(x_{0},2x_{0},4x_{0}-1):x_{0}\in[.25,.5]\}\cup
{(x0,2x01,4x02):x0[.5,.75]}\displaystyle\{(x_{0},2x_{0}-1,4x_{0}-2):x_{0}\in[.5,.75]\}\cup
{(x0,2x01,4x03):x0[.75,.1]}.\displaystyle\{(x_{0},2x_{0}-1,4x_{0}-3):x_{0}\in[.75,.1]\}.

Further, consider random initial conditions x0U[0,1]x_{0}\sim U[0,1], where U[0,1]U[0,1] is the uniform distribution over [0,1][0,1]. It is well known that U[0,1]U[0,1] is invariant under the doubling map. Thus, the random variable ξ(x0)=(x0,f(x0),f(f(x0)))\xi(x_{0})=(x_{0},f(x_{0}),\allowbreak f(f(x_{0})))\in{\mathcal{B}}, that is the system trajectories, is uniformly distributed across {\mathcal{B}}; i.e., the probability measure μξ=1/1()\mu_{\xi}=\mathcal{H}^{1}_{{\mathcal{B}}}/\mathcal{H}^{1}_{{\mathcal{B}}}({\mathcal{B}}). Its generalized entropy is

h(ξ)=𝔼ξ[log(dμξd1)]\displaystyle h(\xi)=-\mathbb{E}_{\xi}\!\left[\log\Big(\frac{\mathrm{d}\mu_{\xi}}{\mathrm{d}\mathcal{H}^{1}_{{\mathcal{B}}}}\Big)\right] =log(dμξd1)dμξ\displaystyle=-\int_{{\mathcal{B}}}\log\Big(\frac{\mathrm{d}\mu_{\xi}}{\mathrm{d}\mathcal{H}^{1}_{{\mathcal{B}}}}\Big)\mathrm{d}\mu_{\xi}
=log(11())dμξ\displaystyle=-\int_{{\mathcal{B}}}\log(\frac{1}{\mathcal{H}^{1}_{{\mathcal{B}}}({\mathcal{B}})})\mathrm{d}\mu_{\xi}
=log(21)dμξ=log(21),\displaystyle=\log(\sqrt{21})\int_{{\mathcal{B}}}\mathrm{d}\mu_{\xi}=\log(\sqrt{21}),

where we have used that the total length of the line segments is 1()=21\mathcal{H}^{1}_{{\mathcal{B}}}({\mathcal{B}})=\sqrt{21}, and that dμξ=1\int_{{\mathcal{B}}}\mathrm{d}\mu_{\xi}=1, as μξ\mu_{\xi} is a probability measure.

2.2 Rate-distortion theory on measurable spaces

Refer to caption
Figure 1: The typical source coding setting.

A typical setting in information theory is source coding, see Fig. 1. A source emits a message x𝒳nx\in\mathcal{X}\subseteq\mathbb{R}^{n}, which is a random variable over the measure space (𝒳,Σ𝒳,𝒳m)(\mathcal{X},\Sigma_{\mathcal{X}},\mathcal{H}^{m}_{\mathcal{X}}), with associated probability measure μx\mu_{x}, where XX is assumed to be mm-dimensional. The encoder s:𝒳𝒴s:\mathcal{X}\to\mathcal{Y}, where 𝒴\mathcal{Y} is finite, outputs the coded message s(x)=y𝒴s(x)=y\in\mathcal{Y}. Finally, the decoder g:𝒴𝒳^g:\mathcal{Y}\to\hat{\mathcal{X}}, upon receiving yy, decodes it into g(y)=x^g(y)=\hat{x}. Compression takes place by encoding the continuous message xx into a low-dimensional, finite coded message zz. The encoder cardinality |𝒴||\mathcal{Y}| determines the compression, and the compression rate is defined by log|𝒴|\log|\mathcal{Y}|. A distortion function d:𝒳×𝒳^+d:\mathcal{X}\times\hat{\mathcal{X}}\to\mathbb{R}_{+} measures the deviation of x^\hat{x} from the original message xx. A typical distortion function, when 𝒳^=n\hat{\mathcal{X}}=\mathbb{R}^{n}, is the squared error d(x,x^)=xx^2d(x,\hat{x})=\|x-\hat{x}\|^{2}.333Here, we present a simplified setting of source coding, where the encoder-decoder is deterministic, and the source emits a single message. For the general theory, see [13, 15].

Of particular interest is the fundamental limit of the rate-distortion tradeoff, i.e. the following quantity:

D(R):=infs,g\displaystyle D(R)=\inf_{s,g} 𝔼x[d(x,x^)s,g]\displaystyle\ \mathbb{E}_{x}[d(x,\hat{x})\mid s,g]
s.t.\displaystyle\mathrm{s.t.} s:𝒳𝒴,g:𝒴𝒳^,\displaystyle\ s:\mathcal{X}\to\mathcal{Y},\ g:\mathcal{Y}\to\hat{\mathcal{X}},
log|𝒴|R,y=s(x),x^=g(y),\displaystyle\ \log|\mathcal{Y}|\leq R,\ y=s(x),\ \hat{x}=g(y),

where the expectation is taken w.r.t. the random variable xx. In words, D(R)D(R) is the minimum achievable average distortion, for a given compression rate threshold RR. The function D(R)D(R) has an inverse, R(D)R(D), which is the minimum compression rate, for a given maximum expected distortion threshold DD. The following result provides a fundamental lower bound on R(D)R(D) and D(R)D(R).

Theorem 2.1 (Generalized Shannon lower bound [15, Thm. 3.1, simplified]).

Let 𝒳n\mathcal{X}\subseteq\mathbb{R}^{n} be a finite union of compact, mm-dimensional, C1C^{1}-manifolds, and μx𝒳m\mu_{x}\ll\mathcal{H}^{m}_{\mathcal{X}}. Assume that 𝒳^n\hat{\mathcal{X}}\subseteq\mathbb{R}^{n} and that (𝒳^,Σ𝒳^)(\hat{\mathcal{X}},\Sigma_{\hat{\mathcal{X}}}) is measurable. Consider the Euclidean distortion fucntion d:𝒳×𝒳^+:(x,x^)xx^2d:\mathcal{X}\times\hat{\mathcal{X}}\to\mathbb{R}_{+}:(x,\hat{x})\mapsto\|x-\hat{x}\|^{2}. Then

R(D)\displaystyle R(D) R(D)h(x)m2log(c𝒳Dm/2Γ(1+m/2)(m2)m/2),\displaystyle\geq R_{*}(D)\coloneqq h(x)-\frac{m}{2}-\log\Big(\frac{c_{\mathcal{X}}D^{m/2}\Gamma(1+m/2)}{(\frac{m}{2})^{m/2}}\Big), (3)
D(R)\displaystyle D(R) D(R)m2(eR+h(x)m/2c𝒳Γ(1+m/2))2/m.\displaystyle\geq D_{*}(R)\coloneqq\frac{m}{2}\Big(\frac{e^{-R+h(x)-m/2}}{c_{\mathcal{X}}\Gamma(1+m/2)}\Big)^{2/m}. (4)
Proof Sketch.

This is the special case of [15, Thm. 3.1] for (finite unions of) compact, C1C^{1}-manifolds and Euclidean distortion. ∎

2.3 Transition systems

Definition 2.2 (Transition system).

A transition system SS is a tuple S=(𝒳,𝑆)S=(\mathcal{X},\underset{S}{\rightarrow}), where 𝒳\mathcal{X} is the state space and 𝑆𝒳×𝒳\underset{S}{\rightarrow}\subseteq\mathcal{X}\times\mathcal{X} is a transition relation.

A transition system S=(𝒳,𝑆)S=(\mathcal{X},\underset{S}{\rightarrow}) is deterministic if, for any x𝒳x\in\mathcal{X}, there exists at most one x𝒳x^{\prime}\in\mathcal{X}, such that (x,x)𝑆(x,x^{\prime})\in\underset{S}{\rightarrow}. Given a transition system S=(𝒳,𝑆)S=(\mathcal{X},\underset{S}{\rightarrow}), its ll-length behavior lS{\mathcal{B}}_{l}^{S} is defined as lS:={ξ:ξ={xi}i=0l1,(xk,xk+1)𝑆,k=0,1,,l1}{\mathcal{B}}_{l}^{S}:=\Big\{\xi:\ \xi=\{x_{i}\}_{i=0}^{l-1},\ (x_{k},x_{k+1})\in\underset{S}{\rightarrow},\ k=0,1,\dots,l-1\Big\}. That is, the ll-length behavior is the set of ll-long trajectories. Notice that lS𝒳l{\mathcal{B}}_{l}^{S}\subseteq\mathcal{X}^{l}.

3 Abstractions and the curse of dimensionality

3.1 Finite abstractions of dynamical systems

Throughout this work, we consider deterministic dynamical systems x+=f(x)x^{+}=f(x), with f:𝒳𝒳f:\mathcal{X}\to\mathcal{X}. Dynamical systems obtain the transition-system representation S=(𝒳,𝑆)S=(\mathcal{X},\underset{S}{\rightarrow}), where 𝑆:={(x,y):y=f(x),x𝒳}\underset{S}{\rightarrow}:=\{(x,y):y=f(x),\ x\in\mathcal{X}\}. We make the following assumption.

Assumption 1 (The state space).

The set 𝒳n\mathcal{X}\subseteq\mathbb{R}^{n} is nn-dimensional, connected and compact.

Under this assumption, lS{\mathcal{B}}_{l}^{S} is an nn-dimensional subset of 𝒳Lnl\mathcal{X}^{L}\subseteq\mathbb{R}^{nl}.

Let us introduce abstractions of dynamical systems.

Definition 3.1 (Measurable Partition).

Given a set 𝒳\mathcal{X}, a finite collection of measurable, disjoint sets 𝒴={Yi}\mathcal{Y}=\{Y_{i}\}, such that iYi𝒳\bigcup_{i}Y_{i}\supseteq\mathcal{X}, is a measurable partition of 𝒳\mathcal{X}.

Definition 3.2 (Abstraction).

Given a dynamical system with transition-system representation S=(𝒳,𝑆)S=(\mathcal{X},\underset{S}{\rightarrow}) and a measurable partition 𝒴={Yi}\mathcal{Y}=\{Y_{i}\} of 𝒳\mathcal{X}, a transition system A=(𝒴,𝐴)A=(\mathcal{Y},\underset{A}{\rightarrow}\nolinebreak) is an abstraction of SS if, for any x,x𝒳x,x^{\prime}\in\mathcal{X} and Y,Y𝒴Y,Y^{\prime}\in\mathcal{Y}, such that xYx\in Y and xYx^{\prime}\in Y^{\prime}, we have (x,x)𝑆(x,x^{\prime})\in\underset{S}{\rightarrow} \implies (Y,Y)𝐴(Y,Y^{\prime})\in\underset{A}{\rightarrow}.

Although the dynamical system SS is deterministic, the abstraction AA is generally non-deterministic. With a slight abuse of formality, we often treat trajectories {ωi}i=1l\{\omega_{i}\}_{i=1}^{l} of the abstraction (with ωi𝒴\omega_{i}\in\mathcal{Y}) as subsets of 𝒳l\mathcal{X}^{l}, that is {ωi}i=1lω0×ω1××ωl𝒳l\{\omega_{i}\}_{i=1}^{l}\equiv\omega_{0}\times\omega_{1}\times\dots\times\omega_{l}\subseteq\mathcal{X}^{l}.

Theorem 3.3 (Behavioral inclusion [1, Theorem 4.18, simplified]).

For a system S=(𝒳,𝑆)S=(\mathcal{X},\underset{S}{\rightarrow}), a partition 𝒴\mathcal{Y} of 𝒳\mathcal{X} and an abstraction A=(𝒴,𝐴)A=(\mathcal{Y},\underset{A}{\rightarrow}) of SS, the following holds for any ll: lSlA{\mathcal{B}}_{l}^{S}\subseteq{\mathcal{B}}_{l}^{A}.

In fact, lA{\mathcal{B}}_{l}^{A} is nlnl-dimensional, and covers the nn-dimensional set of system trajectories lS{\mathcal{B}}_{l}^{S}. This observation is instrumental in this work. Through behavioral inclusion, abstractions encode information about the infinite, continuous system behavior Sl{\mathcal{B}}_{S}^{l} into the finite abstraction behavior set lA{\mathcal{B}}_{l}^{A}. While this enables computational methods to verification problems for dynamical systems, it also generally entails information loss, as the following section explains.

3.2 Abstraction-based verification and information loss

In typical verification problems, we are given a set of initial conditions Ξ0𝒳\Xi_{0}\subseteq\mathcal{X} for the system SS and we have to check if the corresponding set of system trajectories Ξ={ξlS:ξ0Ξ0}\Xi=\{\xi\in{\mathcal{B}}_{l}^{S}:\ \xi_{0}\in\Xi_{0}\} satisfies a given property. For example, in the case of safety, we have to check if Ξ𝒰l=\Xi\cap\mathcal{U}^{l}=\emptyset, where 𝒰𝒳\mathcal{U}\subseteq\mathcal{X} is an unsafe set. Computing the exact reachable set Ξ\Xi is generally impossible. Abstractions AA address this problem by computing the corresponding set of abstract state trajectories ΩA=ωlA,ω0Ξ0ω\Omega_{A}=\bigcup\limits_{\omega\in{\mathcal{B}}_{l}^{A},\ \omega_{0}\cap\Xi_{0}\neq\emptyset}\omega, which is tractable, as the abstraction is finite. Notice that, by behavioral inclusion, we have ΞΩA\Xi\subseteq\Omega_{A}. Finally, for safety verification, if ΩA𝒰l=\Omega_{A}\cap\mathcal{U}^{l}=\emptyset, then one may safely deduce that the system is safe.

As abstractions group system states x𝒳x\in\mathcal{X} in sets Y𝒳Y\subseteq\mathcal{X}, information loss is inevitable. In general, the partition 𝒴\mathcal{Y} needs to have a relatively high resolution, to recover a meaningful verification answer. E.g., in the extreme case of |𝒴|=1|\mathcal{Y}|=1, for any set of initial conditions Ξ0𝒳\Xi_{0}\subseteq\mathcal{X}, the abstraction returns ΩA=𝒳l\Omega_{A}=\mathcal{X}^{l}, i.e. the whole ambient space of ll-length trajectories. As such, for small |𝒴||\mathcal{Y}|, the abstraction AA does not accurately represent the system SS. On the other hand, for large |𝒴||\mathcal{Y}|, where the abstraction is more accurate, the computations on the abstraction become heavier – even intractable. Thus, there is a trade-off between abstraction accuracy and partition size |𝒴||\mathcal{Y}|. In what follows, we provide a statistical, quantitative theory of the accuracy-size tradeoff, based on rate-distortion theory, and provide bounds on the accuracy-size tradeoff.

4 Information-theoretic framework for finite abstractions

Refer to caption
Figure 2: Abstractions as a source coding scheme. From the left: 1) a sample trajectory ξ\xi of system SS with its initial state ξ0\xi_{0} highlighted in blue; 2) the state-space partition 𝒴\mathcal{Y}, and the corresponding abstract initial condition in cyan; 3) the set of abstract trajectories ΩA\Omega_{A}, in red; 4) the specific trajectory ξΩA\xi^{\prime}_{*}\in\Omega_{A} that deviates the most from the true system trajectory ξ\xi.

In what follows, consider the dynamical system x+=f(x)x^{+}=f(x), with x𝒳nx\in\mathcal{X}\subseteq\mathbb{R}^{n}, under Assumption 1. The dynamical system admits the transition system representation S=(𝒳,𝑆)S=(\mathcal{X},\underset{S}{\rightarrow}). Towards deriving a statistical quantification on the accuracy-size tradeoff of abstractions, we impose a probability distribution pξ0:𝒳+p_{\xi_{0}}:\mathcal{X}\to\mathbb{R}_{+} on the system’s initial conditions. Verification, then, becomes: sampling an initial condition ξ0𝒳\xi_{0}\in\mathcal{X}, with ξ0pξ0\xi_{0}\sim p_{\xi_{0}}, and afterward employing the abstraction to give a verification answer.444To be mathematically precise, ξ0\xi_{0} is a random variable over (𝒳,Σ𝒳,𝒳n)(\mathcal{X},\Sigma_{\mathcal{X}},{\mathcal{L}}^{n}_{\mathcal{X}}), where n{\mathcal{L}}^{n} is the nn-dimensional Lebesgue measure, with probability measure μξ0\mu_{\xi_{0}} such that dμξ0d𝒳n=pξ0\frac{\mathrm{d}\mu_{\xi_{0}}}{\mathrm{d}{\mathcal{L}}^{n}_{\mathcal{X}}}=p_{\xi_{0}}.

Let us show how an abstraction AA can be viewed as an encoder-decoder pair of system trajectories ξlS\xi\in{\mathcal{B}}_{l}^{S}. For the following, we refer the reader to Figure 2. The system (source) samples an initial condition ξ0pξ0\xi_{0}\sim p_{\xi_{0}} and generates the trajectory ξ=(ξ0,,ξl1)lS\xi=(\xi_{0},\dots,\xi_{l-1})\in{\mathcal{B}}_{l}^{S} (the message). The encoder sA:lS𝒴s_{A}:{\mathcal{B}}_{l}^{S}\to\mathcal{Y} looks at the initial condition ξ0\xi_{0} and returns the corresponding abstract initial condition:555Abstractions only use (sets of) initial conditions, for verification, as explained in Section 3. Nonetheless, from an information-theoretic perspective, ξ0\xi_{0} and ξ\xi are equivalent, as ξ0ξ\xi_{0}\mapsto\xi is one-to-one; that is, ξ0\xi_{0} and ξ\xi carry the exact same information when the dynamics ff is known.

sA(ξ):=Y,s.t.ξ0Ys_{A}(\xi):=Y,\ \text{s.t.}\ \xi_{0}\in Y (5)

The decoder gAg_{A}, upon receiving the initial condition ωA0=sA(ξ)\omega_{A_{0}}=s_{A}(\xi), outputs the set of all abstract state trajectories corresponding to ωA0\omega_{A_{0}}. That is, for the decoder we have gA:𝒴2𝒳lg_{A}:\mathcal{Y}\to 2^{\mathcal{X}^{l}} with

gA(y):=ωlA,ω0=yω.g_{A}(y):=\bigcup\limits_{\omega\in{\mathcal{B}}_{l}^{A},\ \omega_{0}=y}\hskip-8.53581pt\omega. (6)

The compression rate, determined by the encoder’s size, is log(|𝒴|)\log(|\mathcal{Y}|). Indeed, notice that the abstraction encodes the system’s trajectories lS{\mathcal{B}}_{l}^{S} into exactly |𝒴||\mathcal{Y}| outcomes, that is {gA(z):z𝒴}\{g_{A}(z):\ z\in\mathcal{Y}\}.

To capture the accuracy of the abstraction, and compare the message ξ\xi and output ΩA=gA(sA(ξ))\Omega_{A}=g_{A}(s_{A}(\xi)), we employ a distortion function d:lS×2𝒳l+d:{\mathcal{B}}_{l}^{S}\times 2^{\mathcal{X}^{l}}\to\mathbb{R}^{+} defined by

d(ξ,ΩA):=supξΩA1lξξ2.d(\xi,\Omega_{A}):=\sup_{\xi^{\prime}\in\Omega_{A}}\frac{1}{l}\|\xi-\xi^{\prime}\|^{2}. (7)

In words, d(ξ,ΩA)d(\xi,\Omega_{A}) returns the worst possible distortion between system trajectories ξ\xi and abstract trajectories ΩA\Omega_{A}, averaged over the time horizon ll. This is in-line with abstraction-based verification, where the worst-case outcome is considered.

Let us now explain what “expected (or average) distortion”, for a given abstraction AA, means in the context of verification. The expected distortion 𝔼ξ0[d(ξ,ΩA)A]\mathbb{E}_{\xi_{0}}[d(\xi,\Omega_{A})\mid A] is taken w.r.t. the initial-condition distribution pξ0p_{\xi_{0}}. Thus, for NN\to\infty verification problems, where the initial condition ξ0pξ0\xi_{0}\sim p_{\xi_{0}}, 𝔼ξ0[d(ξ,ΩA)A]\mathbb{E}_{\xi_{0}}[d(\xi,\Omega_{A})\mid A] is the average distortion. As dd measures the distance between system trajectories and abstract state trajectories, the expected distortion 𝔼ξ0[d(ξ,ΩA)A]\mathbb{E}_{\xi_{0}}[d(\xi,\Omega_{A})\mid A] is thus the spatial, statistical average of the deviation between system trajectories and abstract state trajectories, over initial conditions in 𝒳\mathcal{X} with distribution pξ0p_{\xi_{0}}.

Remark 1 (Initial-condition distribution).

The distribution pξ0p_{\xi_{0}} weights how much each initial condition ξ0𝒳\xi_{0}\in\mathcal{X} contributes to the average distortion 𝔼ξ0[d(ξ,ΩA)A]\mathbb{E}_{\xi_{0}}[d(\xi,\Omega_{A})\mid A]. Arguably, the most suitable choice for pξ0p_{\xi_{0}} is the uniform distribution, as, when constructing an abstraction, the initial condition is unknown and all initial conditions are considered equally likely.

Finally, the optimal abstraction accuracy-size tradeoff is captured by the following rate-distortion quantity:

Dabs(R):=infA\displaystyle D_{abs}(R):=\inf_{A} 𝔼ξ0[d(ξ,ΩA)A]\displaystyle\ \mathbb{E}_{\xi_{0}}[d(\xi,\Omega_{A})\mid A]
s.t.\displaystyle\mathrm{s.t.} A is an abstraction of S,\displaystyle\ A\text{ is an abstraction of }S,
(5), (6) hold,\displaystyle\ \text{\eqref{eq:abstraction_encoder}, \eqref{eq:abstraction_decoder} hold},
log|𝒴|R,ΩA=gA(sA(ξ)).\displaystyle\ \log|\mathcal{Y}|\leq R,\ \Omega_{A}=g_{A}(s_{A}(\xi)).

That is, the minimum average deviation of abstract state trajectories and system trajectories, over all possible abstractions with a given upper-bound eRe^{R} on partition size. Likewise, we also consider the inverse Rabs(D)R_{abs}(D), which is the (log\log of the) minimum partition size for a given upper threshold DD on the average deviation of abstract state trajectories and system trajectories.

Remark 2 (Statistics of abstractions’ accuracy and size).

The proposed theory does not aim at providing (probabilistic) guarantees on the correctness of abstractions. These are a-priori provided by Definition 3.2, through behavioral inclusion or related properties. Instead, the theory developed here provides (guarantees on the) statistical quantification of abstractions’ accuracy and size.

Remark 3 (The message space is lS{\mathcal{B}}_{l}^{S}).

Even though we have reduced everything thus far to the initial condition distribution pξ0p_{\xi_{0}}, the message space is lS{\mathcal{B}}_{l}^{S}, i.e. the system trajectories. Indeed, although the expectation 𝔼[d(ξ,ΩA)A]\mathbb{E}[d(\xi,\Omega_{A})\mid A] can be taken either w.r.t. ξ0pξ0\xi_{0}\sim p_{\xi_{0}} or w.r.t. the random variable ξlS\xi\in{\mathcal{B}}_{l}^{S} (as ξ0ξ\xi_{0}\mapsto\xi is one-to-one), the distortion dd considers the whole ξlS\xi\in{\mathcal{B}}_{l}^{S}. As such, in the coming section, to derive bounds on Dabs(R)D_{abs}(R) and Rabs(D)R_{abs}(D), employing the theory presented in Section 2.2, we reason about the random variable ξlS\xi\in{\mathcal{B}}_{l}^{S} and its associated probability measure μξ\mu_{\xi} over (lS,ΣlS,lSn)({\mathcal{B}}_{l}^{S},\Sigma_{{\mathcal{B}}_{l}^{S}},\mathcal{H}^{n}_{{\mathcal{B}}_{l}^{S}}), which is solely determined by the initial condition distribution pξ0:𝒳+p_{\xi_{0}}:\mathcal{X}\to\mathbb{R}_{+} and the system dynamics f:𝒳𝒳f:\mathcal{X}\to\mathcal{X}. Hence, we take expectations 𝔼ξ\mathbb{E}_{\xi} and 𝔼ξ0\mathbb{E}_{\xi_{0}} interchangeably.

5 Rate-distortion theory and a fundamental limit for abstractions

5.1 A fundamental limit on abstracting dynamical systems

Having modeled the statistics of abstraction-based verification as a source coding problem, we now proceed to probing the fundamental limits of the abstraction accuracy-size tradeoff, by providing lower bounds on Rabs(D)R_{abs}(D) and Dabs(R)D_{abs}(R).

Note that abstractions, given the message, output sets and the associated distortion (7) is set-based. This is in contrast to typical encoder-decoder pairs considered in Thm. 2.1, which output points and the distortion function is the Euclidean distance. Thus, the results from Section 2.2 do not straightforwardly apply, to derive bounds on DabsD_{abs} and RabsR_{abs}. In what follows, we derive said bounds, both employing Thm. 2.1 and quantifying the aforementioned distortion disparity. This enables a rate-distortion theory for abstractions. First, we present an intermediate, purely geometric result, providing a lower bound on the average distortion of a given abstraction.

Proposition 5.1 (Abstraction vs. encoder distortion).

Consider a dynamical system x+=f(x)x^{+}=f(x) with transition system representation S=(𝒳,𝑆)S=(\mathcal{X},\underset{S}{\rightarrow}), and let Assumption 1 hold. Let ξlS\xi\in{\mathcal{B}}_{l}^{S} be a trajectory of SS, with ξ0pξ0:𝒳+\xi_{0}\sim p_{\xi_{0}}:\mathcal{X}\to\mathbb{R}_{+}. Consider a measurable partition 𝒴\mathcal{Y} of 𝒳\mathcal{X} and an associated abstraction AA, and let ΩA=gA(sA(ξ))\Omega_{A}=g_{A}(s_{A}(\xi)), where s,gs,g are given by (5) and (6). Consider an encoder-decoder pair (sqA,gqA)(s_{q_{A}},g_{q_{A}}), where sqA(ξ)=gA(sA(ξ))s_{q_{A}}(\xi)=g_{A}(s_{A}(\xi)) and gqA(z)=xc(z)g_{q_{A}}(z)=x_{c}(z), where xc(z):=argminymaxyzyy2x_{c}(z):=\operatorname*{arg\,min}_{y}\max_{y^{\prime}\in z}\|y-y^{\prime}\|^{2} is the Chebyshev center of the set zz. Denote the Chebyshev radius of set zz, by rc(z):=minymaxyzyy2r_{c}(z):=\min_{y}\max_{y^{\prime}\in z}\|y-y^{\prime}\|^{2}. Let ξqA=gqA(sqA(ξ))\xi_{q_{A}}=g_{q_{A}}(s_{q_{A}}(\xi)). The following lower bound holds for the average distortion of the abstraction:

𝔼ξ0[d(ξ,ΩA)]1l𝔼ξ0[ξξqA2]+1l𝔼ξ0[rc2(ΩA)],\mathbb{E}_{\xi_{0}}[d(\xi,\Omega_{A})]\geq\frac{1}{l}\mathbb{E}_{\xi_{0}}[\|\xi-\xi_{q_{A}}\|^{2}]+\frac{1}{l}\mathbb{E}_{\xi_{0}}[r_{c}^{2}(\Omega_{A})], (8)

where dd is the distortion function in (7).

Prop. 5.1 suggests that the average distortion of an abstraction is lower bounded by the expected distortion of a particular encoder-decoder pair (the one outputting the Chebyshev centers of the abstractions outputs) plus a term depending on the size of the abstraction’s outputs. Employing Prop. 5.1, in Theorem 5.2 below, we derive fundamental lower bounds on Dabs(R)D_{abs}(R) and Rabs(D)R_{abs}(D), by lower-bounding each of the two terms in the right-hand side of (8) separately, over all abstractions with the same rate (or the same expected distortion). The first term in the right-hand side of (8) can be lower bounded as in Thm. 2.1, being the expected distortion of an encoder-decoder pair with the same rate as the abstraction. To bound the second term, we observe that the abstraction’s outputs ΩA\Omega_{A} define an nlnl-dimensional cover666This cover is precisely 𝒵:={Z:Z=gA(sA(x0)),x0𝒳}\mathcal{Z}:=\{Z:Z=g_{A}(s_{A}(x_{0})),x_{0}\in\mathcal{X}\} and note that sA(x)s_{A}(x) takes values in the set |𝒴||\mathcal{Y}|. Thus |𝒵|=|𝒴||\mathcal{Z}|=|\mathcal{Y}|. of lS{\mathcal{B}}_{l}^{S}, and the cover’s size is equal to the abstraction’s size; the bound is then obtained by lower-bounding over all possible nlnl-dimensional covers of lS{\mathcal{B}}_{l}^{S}, using geometric measure theory (see Lemma 8.1). For an illustrative example of the above, see Fig. 3.

Refer to caption
Figure 3: Consider the dynamical system x+=x2x^{+}=x^{2} with state-space 𝒳=[0,1]\mathcal{X}=[0,1]. The parabola depicts the set of trajectories 2S{\mathcal{B}}_{2}^{S}, embedded in [0,1]2[0,1]^{2}. Consider an abstraction AA with associated partition sets Yi=[0.2(i1), 0.2i)Y_{i}=[0.2(i-1),\,0.2i) for i=1,,4i=1,\dots,4 and Y5=[0.8,1]Y_{5}=[0.8,1]. The abstraction transitions, thus, are 𝐴={(Y1,Y1),(Y2,Y1),(Y3,Y1),(Y3,Y2),(Y4,Y2),(Y4,Y3),(Y4,Y4),(Y5,Y4),(Y5,Y5)}\underset{A}{\rightarrow}=\{(Y_{1},Y_{1}),(Y_{2},Y_{1}),(Y_{3},Y_{1}),(Y_{3},Y_{2}),(Y_{4},Y_{2}),(Y_{4},Y_{3}),\allowbreak(Y_{4},Y_{4}),(Y_{5},Y_{4}),(Y_{5},Y_{5})\}. The colored rectangles represent the abstraction outputs ΩA\Omega_{A}, depending on the initial condition ξ0\xi_{0}. For example, if ξ0Y4\xi_{0}\in Y_{4}, then the abstraction output ΩA\Omega_{A} is the green rectangle {(x0,x1):x0Y4,x1Y3Y4Y5}\{(x_{0},x_{1}):\,x_{0}\in Y_{4},x_{1}\in Y_{3}\cup Y_{4}\cup Y_{5}\}. Observe how the abstraction’s outputs define a 22-dimensional cover of the curve 2S{\mathcal{B}}_{2}^{S}. The dots represent the Chebyshev centers for each different abstraction output, and the circles are the corresponding Chebyshev balls. E.g, when ξ0Y4\xi_{0}\in Y_{4}, we have xc(ΩA)=(.5  .5)x_{c}(\Omega_{A})=(.5\,\,.5) and rc(ΩA)=0.32r_{c}(\Omega_{A})=0.3\sqrt{2}. The abstraction’s expected distortion is lower bounded as per (8).
Theorem 5.2 (Shannon lower bound for abstractions).

Consider a dynamical system x+=f(x)x^{+}=f(x) with transition system representation S=(𝒳,𝑆)S=(\mathcal{X},\underset{S}{\rightarrow}), and let Assumption 1 hold. Let ξlS\xi\in{\mathcal{B}}_{l}^{S} be a trajectory of SS, with ξ0pξ0:𝒳+\xi_{0}\sim p_{\xi_{0}}:\mathcal{X}\to\mathbb{R}_{+}. Assume that:

  1. 1.

    lS{\mathcal{B}}^{S}_{l} is a finite union of bounded, nn-dimensional C1C^{1}-manifolds,

  2. 2.

    μξlSn\mu_{\xi}\ll\mathcal{H}^{n}_{{\mathcal{B}}_{l}^{S}}, with pξdμξdlSnp_{\xi}\coloneqq\frac{\mathrm{d}\mu_{\xi}}{\mathrm{d}\mathcal{H}^{n}_{{\mathcal{B}}_{l}^{S}}}.

The average distortion of any abstraction AA with partition size |𝒴|eR|\mathcal{Y}|\leq e^{R}, where R>0R>0, is lower bounded as follows

Dabs(R)\displaystyle D_{abs}(R)\geq n2l(eR+h(ξ)n/2clSΓ(1+n/2))2/n\displaystyle\frac{n}{2l}\Big(\frac{e^{-R+h(\xi)-n/2}}{c_{{\mathcal{B}}_{l}^{S}}\Gamma(1+n/2)}\Big)^{2/n} (9)
+1lclS2/nmaxs(1,]e2n(ss1R+hs(ξ)),\displaystyle+\frac{1}{l}c_{{\mathcal{B}}_{l}^{S}}^{-2/n}\max_{s\in(1,\infty]}e^{\frac{2}{n}(-\frac{s}{s-1}R+h_{s}(\xi))},

where clSc_{{\mathcal{B}}_{l}^{S}} is defined by (1).

Notice that a valid lower bound is obtained for any value of s(1,]s\in(1,\infty] in the right-hand side of (9); maximization over ss provides the tightest bound. In the numerical examples in Section 6, we compute the bound for multiple values of ss. Further, one may recover a lower bound on Rabs(D)R_{abs}(D) numerically, by fixing DD in the left-hand side of (9) and solving numerically for RR (as the right-hand side is a decreasing function of RR, this is trivially computed by, e.g., bisection methods). The above theorem, thus, provides fundamental limits on the accuracy-size tradeoff, or the scalability, of abstractions, for given dynamics x+=f(x)x^{+}=f(x).

Remark 4 (On the assumptions of Thm. 5.2).

The first assumption of Thm. 5.2, requiring lS{\mathcal{B}}^{S}_{l} to be a union of smooth manifolds, is satisfied whenever the dynamics ff is piecewise continuously-differentiable. The second assumption is satisfied whenever ff is piecewise continuous and the initial condition distribution pξ0p_{\xi_{0}} is such that μξ0n\mu_{\xi_{0}}\ll{\mathcal{L}}^{n}, where n{\mathcal{L}}^{n} the Lebesgue measure.

In the coming section, we provide: a) closed-form expressions on h(ξ)h(\xi), hs(ξ)h_{s}(\xi) and clSc_{{\mathcal{B}}_{l}^{S}}, for certain classes of dynamics x+=f(x)x^{+}=f(x), and b) an interpretation on how the complexity of the dynamics and the time-horizon ll affect the fundamental lower bound (9) in Thm. 5.2.

5.2 Interpretation and calculus for Theorem 5.2

Before we proceed with the interpretation of Thm. 5.2, let us show how one may compute h(ξ)h(\xi), hs(ξ)h_{s}(\xi) and clSc_{{\mathcal{B}}_{l}^{S}}, which are required to compute the lower bound (9). Let us define the function bl:𝒳lSb_{l}:\mathcal{X}\to{\mathcal{B}}_{l}^{S} by

bl(x)[xTf(x)Tf(f(x))Tf(l1)T(x)]T,b_{l}(x)\coloneqq\big[\,\begin{matrix}x^{\mkern-1.5mu\mathrm{T}}\!&f(x)^{\mkern-1.5mu\mathrm{T}}\!&f(f(x))^{\mkern-1.5mu\mathrm{T}}\!&\cdots&f^{(l-1)^{\mkern-1.5mu\mathrm{T}}\!}(x)\end{matrix}\,\big]^{\mkern-1.5mu\mathrm{T}}\!, (10)

which maps an initial state into its ll-long trajectory.

Proposition 5.3 (Computing h(ξ)h(\xi) and hs(ξ)h_{s}(\xi)).

Consider a system x+=f(x)x^{+}=f(x) with f:𝒳𝒳f:\mathcal{X}\to\mathcal{X} measurable and piecewise Lipschitz777That is, 𝒳\mathcal{X} is a countable union i𝒳i\cup_{i}\mathcal{X}_{i} of Lebesgue-measurable sets such that the restriction of ff to each 𝒳i\mathcal{X}_{i} is Lipschitz. This condition may be relaxed to ff approximately Lipschitz, see [31, Thm. 3.1.8, Sec. 3.2.1], which also implies approximate differentiability. and differentiable. Let Assumption 1 hold, and ξ0pξ0:𝒳+\xi_{0}\sim p_{\xi_{0}}:\mathcal{X}\to\mathbb{R}_{+}. The following expressions hold

h(ξ)=h(ξ0)+12𝒳pξ0(x)logdet(Jbl(x)TJbl(x))dx,h(\xi)=h(\xi_{0})+\frac{1}{2}\int_{\mathcal{X}}p_{\xi_{0}}(x)\log\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))\mathrm{d}x, (11)
hs(ξ)=11slognpξ0(x)sdet(Jbl(x)TJbl(x))s12dx,s>1,h_{s}(\xi)=\frac{1}{1-s}\log\int_{\mathbb{R}^{n}}\frac{p_{\xi_{0}}(x)^{s}}{\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))^{\frac{s-1}{2}}}\,\mathrm{d}x,\quad s>1, (12)
h(ξ)=esssup𝒳(12logdet(Jbl(x)TJbl(x))logpξ0(x)),h_{\infty}(\xi)=\mathrm{ess}\sup_{\mathcal{X}}\Big(\frac{1}{2}\log\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))-\log p_{\xi_{0}}(x)\Big), (13)

where JblJ_{b_{l}} denotes the Jacobian matrix of blb_{l}.

Proposition 5.4 (Computing clSc_{{\mathcal{B}}_{l}^{S}}).

Consider a system x+=f(x)x^{+}=f(x), with f:𝒳𝒳f:\mathcal{X}\to\mathcal{X} differentiable a.e., and let Assumption 1 hold. The following facts on clSc_{{\mathcal{B}}_{l}^{S}} hold:

  1. 1.

    clSvnc_{{\mathcal{B}}_{l}^{S}}\leq v_{n}, if ff is affine;

  2. 2.

    clSMlvnc_{{\mathcal{B}}_{l}^{S}}\leq M^{l}v_{n}, if ff is piecewise affine with MM modes;

  3. 3.

    clSvn(i=0l1L2i)n/2c_{{\mathcal{B}}_{l}^{S}}\leq v_{n}\big(\sum_{i=0}^{l-1}L^{2i}\big)^{n/2}, if ff is Lipschitz continuous with constant LL.

Remark 5 (clSc_{{\mathcal{B}}_{l}^{S}} at high rates).

As the partition size |𝒴||\mathcal{Y}| grows large, the Chebyshev balls of the abstraction outputs (c.f. Prop. 5.1 and Lemma 8.1) become small. Hence, in the case of smooth ff, their intersection with the manifold approaches the case of an affine system, with clSvn.c_{{\mathcal{B}}_{l}^{S}}\leq v_{n}. Similarly, in the piecewise affine case, for sufficiently small balls – at least |𝒴|Ml|\mathcal{Y}|\geq M^{l} –, these can be chosen to intersect with at most one piece each. Thus, to reduce conservatism of the bound in such high-rate cases, one can inspect the lower bound of Thm. 2.1 by using clS=vn.c_{{\mathcal{B}}_{l}^{S}}=v_{n}. We demonstrate this in the numerical examples in Section 6.

We proceed to discussing Thm. 5.2. First, inspecting (9), systems with more complex dynamics lead to bigger abstraction distortion, for fixed abstraction size, since the right-hand side is increasing w.r.t. h(ξ)h(\xi) and the Rényi entropy hs(ξ)h_{s}(\xi); equivalently, more complex systems require bigger abstraction size for the same distortion.

Regarding the effect of the time-horizon ll on the bound (9), we have to inspect the effect that ll has on h(ξ)h(\xi) and hs(ξ)h_{s}(\xi). Let us first demonstrate that, for the “simple” dynamics of exponentially stable systems, the abstraction distortion converges to 0 for ll\to\infty.

Example 5.1 (Exponentially stable systems).

Consider a system x+=f(x)x^{+}=f(x) whose origin is exponentially stable on a given compact set in n\mathbb{R}^{n}. Then, there is a Lyapunov function V:n+V:\mathbb{R}^{n}\mapsto\mathbb{R}_{+} satisfying V(x)1rx2V(x)\geq\frac{1}{r}\|x\|^{2} for a given r>0r>0 and, for all xx s.t. V(x)1V(x)\leq 1 (w.l.o.g.), V(f(x))aV(x),V(f(x))\leq aV(x), with a[0,1).a\in[0,1). This allows us to create an abstraction AA with the associated partition Yi={xnai<V(x)ai1}Y_{i}=\{x\in\mathbb{R}^{n}\mid a^{i}<V(x)\leq a^{i-1}\} for i=1,,N,i=1,...,N, and YN+1={xnV(x)aN}Y_{N+1}=\{x\in\mathbb{R}^{n}\mid V(x)\leq a^{N}\}; and transitions Yi𝐴YjY_{i}\xrightarrow[A]{}Y_{j} if and only if j<ij<i or j=i=N+1j=i=N+1. The abstraction encapsulates the fact that, after NN steps or less, all trajectories reach the sublevel set V(x)aNV(x)\leq a^{N}. We get that 2r2r and 2anr2a^{n}r are overapproximations of the diameters of Yi,iN,Y_{i},~~i\leq N, and YN+1,Y_{N+1}, respectively. Then, recalling the distortion (7), for any trajectory ξ\xi of the system, l>N,l>N,

d(ξ,ΩA)1l(2Nr2+2(lN)a2Nr2)=l2a2Nr2,d(\xi,\Omega_{A})\leq\frac{1}{l}(2Nr^{2}+2(l-N)a^{2N}r^{2})\underset{l\to\infty}{=}2a^{2N}r^{2},

which can be made arbitrarily small by suitable choice of NN.

Indeed, as the following example shows, for Schur LTI systems, the bound (9) converges to 0, for ll\to\infty, which demonstrates the bound’s tightness.

Example 5.2 (Schur LTI systems).

For a Schur LTI system, we have clSvnc_{{\mathcal{B}}^{S}_{l}}\leq v_{n} and

Jbl(x)TJbl(x)\displaystyle J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x) =i=0l1(ATA)i\displaystyle=\sum_{i=0}^{l-1}(A^{\mkern-1.5mu\mathrm{T}}\!A)^{i}
=(IATA)1(I(ATA)l)=l(IATA)1.\displaystyle=(I-A^{\mkern-1.5mu\mathrm{T}}\!A)^{-1}(I-(A^{\mkern-1.5mu\mathrm{T}}\!A)^{l})\underset{l\to\infty}{=}(I-A^{\mkern-1.5mu\mathrm{T}}\!A)^{-1}.

Thus, both h(ξ)h(\xi) and hs(ξ)h_{s}(\xi) are finite, for ll\to\infty, and the bound (9) converges to 0.

Conversely, the example below shows that, even for marginally stable systems, the bound may not vanish with ll\to\infty.

Example 5.3 (Marginally stable LTI system).

For the simple system x+=xx^{+}=x, we have that det(Jbl(x)TJbl(x))=det(lI)=ln\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))=\det(lI)=l^{n} and, by Prop. 5.3, h(ξ)=h(x0)+nlogl2h(\xi)=h(x_{0})+\frac{n\log l}{2} and

hs(ξ)\displaystyle h_{s}(\xi) =11slog(1ln(s1)/2np0sdx)\displaystyle=\frac{1}{1-s}\log\Big(\frac{1}{l^{n(s-1)/2}}\int_{\mathbb{R}^{n}}p_{0}^{s}\,\mathrm{d}x\Big)
=n(s1)logl2(1s)+11slognp0sdx\displaystyle=-\frac{n(s-1)\log l}{2(1-s)}+\frac{1}{1-s}\log\int_{\mathbb{R}^{n}}p_{0}^{s}\,\mathrm{d}x
=nlogl2+hs(ξ0).\displaystyle=\frac{n\log l}{2}+h_{s}(\xi_{0}).

Replaced in the distortion bound (9), the ll in the denominator is canceled out, indicating a positive lower bound for any ll. This independence on ll is expected, as abstracting x+=xx^{+}=x is the same as encoding the initial condition ξ0\xi_{0}.

6 Numerical Examples

6.1 The doubling map

Consider the doubling map from Example 2.1. For any trajectory length ll, its behavior lS{\mathcal{B}}_{l}^{S} is composed by 2l12^{l-1} line segments in l\mathbb{R}^{l} described by (x0,2x0,4x0,,2l1x0)mod1,(x_{0},2x_{0},4x_{0},...,2^{l-1}x_{0})\bmod 1, uniformly distributed with pξ(ξ)=1/1+4++4l1=3/(4l1),p_{\xi}(\xi)=1/\sqrt{1+4+...+4^{l-1}}=\sqrt{3/(4^{l}-1)}, giving h(ξ)=hs(ξ)=12(log(4l1)log3)h(\xi)=h_{s}(\xi)=\frac{1}{2}(\log(4^{l}-1)-\log{3}) for all s(1,].s\in(1,\infty]. Using Prop. 5.4 for piecewise affine systems, we obtain clS(4l1)/3vn,c_{{\mathcal{B}}_{l}^{S}}\leq\sqrt{(4^{l}-1)/3}v_{n}, enabling us to compute the lower bound in Theorem 5.2. In light of Remark 5, we also determine the high-rate lower bound by picking clS=v1=2.c_{{\mathcal{B}}_{l}^{S}}=v_{1}=2. The lower-bound curves can be seen in Fig. 4 for s=2s=2 and s=s=\infty. It is apparent that the tightest bound is obtained with s=s=\infty and clS=v1.c_{{\mathcal{B}}_{l}^{S}}=v_{1}. In this case, the bound is consistently half of that of the optimal distortion Dabs(R)D_{abs}(R), which is remarkably close. As a comparison, the standard Shannon lower bound is  1.42 times smaller than the optimal quantizer distortion of a uniform random variable in 1\mathbb{R}^{1}, in the standard source-coding setting.

Refer to caption
Figure 4: Section 6.1: Comparison between Dabs(R)D_{abs}(R) and the fundamental lower bound from Theorem 5.2, for l=5l=5.

Let us explain how we were able to compute the actual optimal achievable abstraction distortion. Following the reasoning in Section 5, we first build an optimal cover for lS{\mathcal{B}}_{l}^{S} (afterwards, we show that this optimal cover admits a distortion that is equal to that of a specific abstraction with the same rate, and thus its rate-distortion curve is optimal, among all abstractions). For a given ll, consider R=log(k2l1)R=\log(k2^{l-1}), where kk is an arbitrary natural number. Since all segments are equiprobable and congruent, and probability is uniform among them, the optimal partition of lS{\mathcal{B}}_{l}^{S} is obtained by cutting each of the 2l12^{l-1} segments in kk equal pieces. The expected error between a trajectory ξ\xi and the Chebyshev center of its corresponding piece is 𝔼[ξξqA2]=14l9k2,\mathbb{E}[\|\xi-\xi_{q_{A}}\|^{2}]=\frac{1-4^{-l}}{9k^{2}}, which is obtained by computing the squared length of each segment, L2=1/(2l1k)2(1+22++(2l1)2)=(4l1)/(3k24l1)=4(14l)/3k2,L^{2}=1/(2^{l-1}k)^{2}(1+2^{2}+...+(2^{l-1})^{2})=(4^{l}-1)/(3k^{2}4^{l-1})=4(1-4^{-l})/3k^{2}, followed by using the variance of the uniform distribution, giving L2/12L^{2}/12. To determine the corresponding abstraction lower bound, we use the distortion dd in (7) on the aforementioned pieces. For one dimensional line segments, maxξΩA(ξ)ξξ2=(ξξqA+L/2)2\max_{\xi^{\prime}\in\Omega_{A}(\xi)}\|\xi-\xi^{\prime}\|^{2}=(\|\xi-\xi_{q_{A}}\|+L/2)^{2}. Its expected value is thus the second moment of a uniform from L/2L/2 to LL. Using 𝔼[X2]=Var(X)+𝔼[X]2=L2/48+9L2/16=7L2/12,\mathbb{E}[X^{2}]=\mathrm{Var}(X)+\mathbb{E}[X]^{2}=L^{2}/48+9L^{2}/16=7L^{2}/12, we obtain Dcover(R)=7l4l2(4l1)e2R,D_{cover}(R)=\frac{7}{l}4^{l-2}(4^{l}-1)e^{-2R}, where we used R=log(k2l1)R=\log(k2^{l-1}), and Dcover(R)D_{cover}(R) is the optimal distortion among all ll-dimensional covers of lS{\mathcal{B}}_{l}^{S}.

Finally, we show that

Dabs(R)=Dcover(R)=7l4l2(4l1)e2R.D_{abs}(R)=D_{cover}(R)=\frac{7}{l}4^{l-2}(4^{l}-1)e^{-2R}.

Notice that, in general, Dabs(R)Dcover(R)D_{abs}(R)\geq D_{cover}(R), as abstractions are covers. However, the optimal cover built above determines an abstraction AA that gives the same distortion, and thus we have Dabs(R)=Dcover(R)D_{abs}(R)=D_{cover}(R). First, 𝒴\mathcal{Y} is the uniform grid with segments of length 1/k2l11/k2^{l-1}. Each trajectory of the abstraction is a sequence of segments of lengths 1/k2l1,1/k2l2,,1/k1/k2^{l-1},1/k2^{l-2},...,1/k, thus giving a box in l\mathbb{R}^{l} containing any related trajectory ξ\xi. For each box, the set of related trajectories is precisely a diagonal of the box. As such, the furthest edge along the diagonal is again a solution to argsupξΩA(ξ)ξξ2\arg\sup_{\xi^{\prime}\in\Omega_{A}(\xi)}\|\xi-\xi^{\prime}\|^{2}. hence, the abstraction has the same distortion as the optimal cover. The above reasoning is illustrated in Fig. 5

Refer to caption
Figure 5: Section 6.1: Optimal cover of lS,l=3{\mathcal{B}}_{l}^{S},\ l=3 with k=2k=2 (left) and corresponding abstraction trajectories (right, blue boxes). On the right, the lines inside the boxes represent 3S{\mathcal{B}}_{3}^{S}, with their Chebyshev centers marked in red. The maximal distance between any point in the 3S{\mathcal{B}}_{3}^{S} and its corresponding blue box is obtained at one of the edges intersecting with the trajectory.

6.2 A 3D nonlinear system and abstractions with uniform grids

Consider the nonlinear system f:33f:\mathbb{R}^{3}\to\mathbb{R}^{3} where

f(x)=[0.9x1+0.1sinx22x23x20.9x3+0.1x1x2,]f(x)=\begin{bmatrix}0.9x_{1}+0.1\sin{x_{2}}\\ 2x_{2}^{3}-x_{2}\\ 0.9x_{3}+0.1x_{1}x_{2},\end{bmatrix}

and 𝒳=[1,1]3,\mathcal{X}=[-1,1]^{3}, which is forward invariant under ff. This system has multiple equilibria, hence the origin is not stable in 𝒳\mathcal{X}. For each NN in {10,20,50,100},\{10,20,50,100\}, we build abstractions ANA_{N} by using uniform partitioning of 𝒳\mathcal{X} with grids of size N×N×NN\times N\times N and determining the transition map using interval arithmetic. Then, we compute the distortion lower bound from Theorem 5.2 using Prop. 5.3 and Prop. 5.4, case 3.888The entropies were computed using Monte-Carlo integration with 10000 samples, while Jacobians and the Lipschitz constant were determined using automatic differentiation. Furthermore, lower bounds were also computed by picking clS=v3c_{{\mathcal{B}}_{l}^{S}}=v_{3}, in light of Remark 5. The resulting distortion lower bound curves can be seen in Fig. 6. In this case, as the abstraction we construct is not necessarily the optimal one, its expected distortion is generally 100x higher than the fundamental lower bound. Still, this demonstrates the validity of the lower bound, even in cases with nonlinear dynamics; even more importantly, it indicates how conservative standard abstractions with uniform grids might be.

Refer to caption
Figure 6: Section 6.2: Comparison between expected distortion from the constructed abstractions and the fundamental lower bound from Theorem 5.2.

7 Conclusion and Future Research:
Towards Minimal Abstractions

We have developed a statistical, quantitative theory on the accuracy-size tradeoff of finite abstractions of dynamical systems. Through this theory, we have uncovered fundamental limits on their scalability: given the system dynamics, we have obtained a fundamental bound on the achievable abstraction accuracy, for a given abstraction size. To that end, we have established connections with rate-distortion theory. From an information-theoretic perspective, we have developed rate-distortion theory for the particular class of encoder-decoder pairs that abstractions constitute: set-based, with set-based distortion. Overall, this novel theory quantifies scalability limits of abstractions, and provides insights on how the complexity of the dynamics to be abstracted dictates these limits.

Most importantly, the developed theory may be employed to construct minimal abstractions, harnessing their full scalability potential. From this work, it becomes clear that, to construct minimal abstractions, one has to solve the problem of encoding trajectories of dynamical systems, through coverings in a high-dimensional, ambient space. In fact, this has already been demonstrated, in Section 6.1, where we construct a minimal abstraction of the doubling-map dynamics. Future research will thus focus on the general problem of constructing minimal abstractions. Towards that goal, information-theoretic algorithms optimizing the rate-distortion tradeoff, such as the information bottleneck method (see [32]), could be adapted for abstractions.

8 Technical Results and Proofs

Proof of Prop. 5.1.

For any given ξlS\xi\in{\mathcal{B}}_{l}^{S}, we will prove that

d(ξ,ΩA)1lξξqA2+1lrc2(ΩA),d(\xi,\Omega_{A})\geq\frac{1}{l}\|\xi-\xi_{q_{A}}\|^{2}+\frac{1}{l}r_{c}^{2}(\Omega_{A}),

where note that ξΩA\xi\in\Omega_{A} and ξqA=xc(ΩA)\xi_{q_{A}}=x_{c}(\Omega_{A}). Then, the proof is complete by applying the expectation operator to the above inequality.

Define w(x)=maxyΩAyx2w(x^{\prime})=\max_{y\in\Omega_{A}}\|y-x^{\prime}\|^{2}. The function ww is convex, being the pointwise maximum of the convex quadratic maps xyx2x^{\prime}\mapsto\|y-x^{\prime}\|^{2}. We have xc(ΩA)=argminxnlw(x)x_{c}(\Omega_{A})=\arg\min_{x^{\prime}\in\mathbb{R}^{nl}}w(x^{\prime}) and rc2(ΩA)=w(xc(ΩA))=maxyΩAyxc(ΩA)2r_{c}^{2}(\Omega_{A})=w(x_{c}(\Omega_{A}))=\max_{y\in\Omega_{A}}\|y-x_{c}(\Omega_{A})\|^{2}.

Define the set of maximizers

M:={yΩA:yxc(ΩA)=rc(ΩA)}.M:=\{y\in\Omega_{A}:\|y-x_{c}(\Omega_{A})\|=r_{c}(\Omega_{A})\}.

The subdifferential of ww at xx^{\prime} is w(x)=conv{xy:yM}\partial w(x^{\prime})=\operatorname{conv}\{x^{\prime}-y:\ y\in M\}, where conv\operatorname{conv} denotes the convex hull operator. Since xc(ΩA)x_{c}(\Omega_{A}) minimizes ww, the optimality condition 0w(xc(ΩA))0\in\partial w(x_{c}(\Omega_{A})) gives 0conv{xc(ΩA)y:yM}0\in\operatorname{conv}\{\,x_{c}(\Omega_{A})-y:y\in M\,\}. Hence there exist finitely many points y1,,ymMy_{1},\dots,y_{m}\in M and coefficients λi0\lambda_{i}\geq 0, iλi=1\sum_{i}\lambda_{i}=1, such that

i=1mλi(yixc(ΩA))=0.\sum_{i=1}^{m}\lambda_{i}(y_{i}-x_{c}(\Omega_{A}))=0. (14)

Now, fix ξ𝒳l\xi\in\mathcal{X}^{l} and let ξargmaxyxabsyξ2\xi_{*}\in\arg\max_{y\in x_{\mathrm{abs}}}\|y-\xi\|^{2}. By definition of ξ\xi_{*}, for every yiMy_{i}\in M we have yiξ2ξξ2\|y_{i}-\xi\|^{2}\leq\|\xi_{*}-\xi\|^{2}. Taking the convex combination with the λi\lambda_{i} and expanding gives

i=1mλi(yiξ2ξξ2)0.\sum_{i=1}^{m}\lambda_{i}\big(\|y_{i}-\xi\|^{2}-\|\xi_{*}-\xi\|^{2}\big)\leq 0.

Since yiξ2=yixc(ΩA)2+xc(ΩA)x2+2(yixc(ΩA))T(xc(ΩA)ξ)\|y_{i}-\xi\|^{2}=\|y_{i}-x_{c}(\Omega_{A})\|^{2}+\|x_{c}(\Omega_{A})-x\|^{2}+\allowbreak 2(y_{i}-x_{c}(\Omega_{A}))^{\mkern-1.5mu\mathrm{T}}\!(x_{c}(\Omega_{A})-\xi) and yixc(ΩA)2=rc(ΩA)2\|y_{i}-x_{c}(\Omega_{A})\|^{2}=r_{c}(\Omega_{A})^{2}, for the above inequality we have

rc(ΩA)2+xc(ΩA)ξ2ξξ20,r_{c}(\Omega_{A})^{2}+\|x_{c}(\Omega_{A})-\xi\|^{2}-\|\xi_{*}-\xi\|^{2}\leq 0,

where, using (14), the cross term has vanished. Finally, using (7),

rc(xabs)2+xqAx2xx2=ld(x,xA).r_{c}(x_{\mathrm{abs}})^{2}+\|x_{q_{A}}-x\|^{2}\leq\|x_{*}-x\|^{2}=l\cdot d(x,x_{A}).

Towards proving Thm. 5.2, we introduce the following lemma.

Lemma 8.1.

Let MnM\subset\mathbb{R}^{n} be a finite union of bounded, disjoint, mm-dimensional C1C^{1}-manifolds. Let XX be a random variable in MM with probability measure μXMm\mu_{X}\ll\mathcal{H}^{m}_{M} and density p=dμxdMmp=\frac{\mathrm{d}\mu_{x}}{\mathrm{d}\mathcal{H}^{m}_{M}}. Then, for any collection 𝒴{Yi}i=1N\mathcal{Y}\coloneqq\{Y_{i}\}_{i=1}^{N} of NN measurable, nn-dimensional sets YinY_{i}\subseteq\mathbb{R}^{n} covering MM, the following holds for any s(1,]s\in(1,\infty]:

inf𝒴𝔼X[i=1N𝟏Yi(X)rc(Yi)2]\displaystyle\inf_{\mathcal{Y}}\mathbb{E}_{X}\Big[\sum_{i=1}^{N}\mathbf{1}_{Y_{i}}(X)\,r_{c}(Y_{i})^{2}\Big]\geq (15)
cM2/mmaxs(1,]e2mhs(X)N2m(11/s)\displaystyle c_{M}^{-2/m}\max_{s\in(1,\infty]}e^{\frac{2}{m}h_{s}(X)}\,N^{-\frac{2}{m(1-1/s)}} ,

where cMc_{M} is defined by (1), 𝟏Yi()\mathbf{1}_{Y_{i}}(\cdot) is the indicator function of set YiY_{i}, rc(Yi)r_{c}(Y_{i}) denotes the Chebyshev radius of YiY_{i}, and vm=πm/2Γ(m/2+1)v_{m}=\frac{\pi^{m/2}}{\Gamma(m/2+1)} is the volume of the unit ball in m\mathbb{R}^{m}.

Proof.

Define Si:=YiMMS_{i}:=Y_{i}\cap M\subset M. Then {Si}i=1N\{S_{i}\}_{i=1}^{N} forms a measurable mm-dimensional cover of MM. Let pi:=μX(Si)=Sip𝑑mp_{i}:=\mu_{X}(S_{i})=\int_{S_{i}}pd\mathcal{H}^{m} and ri:=rc(Si)r_{i}:=r_{c}(S_{i}). Because SiYi,S_{i}\subset Y_{i}, then rirc(Yi)r_{i}\leq r_{c}(Y_{i}), giving

𝔼X[i=1N𝟏Yi(X)rc(Yi)2]=i=1Npirc(Yi)2i=1Npiri2.\mathbb{E}_{X}\Big[\sum_{i=1}^{N}\mathbf{1}_{Y_{i}}(X)r_{c}(Y_{i})^{2}\Big]=\sum_{i=1}^{N}p_{i}r_{c}(Y_{i})^{2}\geq\sum_{i=1}^{N}p_{i}r_{i}^{2}.

Hence it suffices to lower bound i=1Npiri2\sum_{i=1}^{N}p_{i}r_{i}^{2} over collections 𝒴\mathcal{Y}.

For a given ii, by definition, SiB(ci,ri)MS_{i}\subset B(c_{i},r_{i})\cap M for some Chebyshev center cic_{i}. For any s>1s>1, we have:

pi\displaystyle p_{i} B(ci,ri)Mp𝑑m\displaystyle\leq\int_{B(c_{i},r_{i})\cap M}p\,d\mathcal{H}^{m}
=Mp 1B(ci,ri)M𝑑m\displaystyle=\int_{M}p\,\mathbf{1}_{B(c_{i},r_{i})\cap M}\,d\mathcal{H}^{m}
(Mps𝑑m)1/s(M(𝟏B(ci,ri)M)ss1𝑑m)11/s\displaystyle\leq\Big(\int_{M}p^{s}d\mathcal{H}^{m}\Big)^{1/s}\Big(\int_{M}(\mathbf{1}_{B(c_{i},r_{i})\cap M})^{\frac{s}{s-1}}\,d\mathcal{H}^{m}\Big)^{1-1/s}
pLs(M)(m(B(ci,ri)M))11/s\displaystyle\leq\|p\|_{L^{s}(M)}\,(\mathcal{H}^{m}(B(c_{i},r_{i})\cap M))^{1-1/s}
pLs(M)(cMrim)11/s,\displaystyle\leq\|p\|_{L^{s}(M)}\,(c_{M}\,r_{i}^{m})^{1-1/s},

where pLs(M)(Mps𝑑m)1/s,\|p\|_{L^{s}(M)}\coloneqq\Big(\int_{M}p^{s}d\mathcal{H}^{m}\Big)^{1/s}, in the third step we used Hölder’s inequality, and in the final step we used the inequality (1). Defining KspLs(M)cM11/sK_{s}\coloneqq\|p\|_{L^{s}(M)}\,c_{M}^{1-1/s}, from the inequality above we have:

ri(piKs)1m(11/s)ri2(piKs)α,r_{i}\geq\left(\frac{p_{i}}{K_{s}}\right)^{\frac{1}{m(1-1/s)}}\implies r_{i}^{2}\geq\left(\frac{p_{i}}{K_{s}}\right)^{\alpha},

where α:=2/(m(11/s))=2s/(m(s1))\alpha:=2/(m(1-1/s))=2s/(m(s-1)). Multiplying by pip_{i} gives

piri2Ksαpi1+αi=1Npiri2Ksαi=1Npi1+α.p_{i}r_{i}^{2}\geq K_{s}^{-\alpha}\,p_{i}^{1+\alpha}\implies\sum_{i=1}^{N}p_{i}r_{i}^{2}\geq K_{s}^{-\alpha}\sum_{i=1}^{N}p_{i}^{1+\alpha}. (16)

Our job now is to find a lower bound to i=1Npi1+α\sum_{i=1}^{N}p_{i}^{1+\alpha} over discrete probabilities pip_{i}. First, notice that α>0\alpha>0 since s>1s>1. Therefore, the map tt1+αt\mapsto t^{1+\alpha} is convex in t[0,+)t\in[0,+\infty). Thus, by Jensen’s inequality,

i=1Npi1+αN(1Ni=1Npi)1+α=N(1N)1+α=Nα.\sum_{i=1}^{N}p_{i}^{1+\alpha}\geq N\bigg(\frac{1}{N}\sum_{i=1}^{N}p_{i}\bigg)^{1+\alpha}=N\left(\frac{1}{N}\right)^{1+\alpha}=N^{-\alpha}.

Substituting in (16) gives

i=1Npiri2(KsN)α=(Mps𝑑m)α/sNαcM2/m.\sum_{i=1}^{N}p_{i}r_{i}^{2}\geq(K_{s}N)^{-\alpha}=\Big(\int_{M}p^{s}d\mathcal{H}^{m}\Big)^{-\alpha/s}N^{-\alpha}\,c_{M}^{-2/m}.

Now, by definition of the Rényi entropy,

hs(x)=11slog𝔼[p(x)s1]=11slog(dμxd𝒳m)s1dμx,h_{s}(x)=\frac{1}{1-s}\log\mathbb{E}\Big[p(x)^{s-1}\Big]=\frac{1}{1-s}\log\int\Big(\frac{\mathrm{d}\mu_{x}}{\mathrm{d}\mathcal{H}_{\mathcal{X}}^{m}}\Big)^{s-1}\mathrm{d}\mu_{x},

which by the properties of the Radon–Nikodym derivative gives

hs(x)\displaystyle h_{s}(x) =11slog(dμxd𝒳m)sd𝒳m\displaystyle=\frac{1}{1-s}\log\int\Big(\frac{\mathrm{d}\mu_{x}}{\mathrm{d}\mathcal{H}_{\mathcal{X}}^{m}}\Big)^{s}\mathrm{d}\mathcal{H}_{\mathcal{X}}^{m}
=11ssααslogpsd𝒳m\displaystyle=\frac{1}{1-s}\frac{s}{\alpha}\frac{\alpha}{s}\log\int p^{s}\mathrm{d}\mathcal{H}_{\mathcal{X}}^{m}
=sα(1s)log(psd𝒳m)α/s.\displaystyle=-\frac{s}{\alpha(1-s)}\log\Big(\int p^{s}\mathrm{d}\mathcal{H}_{\mathcal{X}}^{m}\Big)^{-\alpha/s}.

And, using α(1s)=2s/m\alpha(1-s)=-2s/m gives

hs(x)=m2log(psd𝒳m)α/s\displaystyle h_{s}(x)=\frac{m}{2}\log\Big(\int p^{s}\mathrm{d}\mathcal{H}_{\mathcal{X}}^{m}\Big)^{-\alpha/s}
(psd𝒳m)α/s=e2mhs(x).\displaystyle\iff\Big(\int p^{s}\mathrm{d}\mathcal{H}_{\mathcal{X}}^{m}\Big)^{-\alpha/s}=\mathrm{e}^{\frac{2}{m}h_{s}(x)}.

Therefore, for any s>1s>1, we have

inf𝒴𝔼X[i=1N𝟏Yi(X)rc(Yi)2]cM2/me2mhs(X)N2m(11/s).\inf_{\mathcal{Y}}\mathbb{E}_{X}\Big[\sum_{i=1}^{N}\mathbf{1}_{Y_{i}}(X)\,r_{c}(Y_{i})^{2}\Big]\geq c_{M}^{-2/m}\,e^{\frac{2}{m}h_{s}(X)}\,N^{-\frac{2}{m(1-1/s)}}.

We proceed with the proof of Thm. 5.2.

Proof of Thm. 5.2.

We make use of Prop. 5.1. Take (8) and minimize both sides over all possible partitions 𝒴\mathcal{Y} with size |𝒴|eR|\mathcal{Y}|\leq e^{R} and associated abstractions AA. We have

Dabs(R)infA,|𝒴|eR1l𝔼ξ[ξξqA2]+1l𝔼ξ[rc2(ΩA)],D_{abs}(R)\geq\inf_{A,|\mathcal{Y}|\leq e^{R}}\frac{1}{l}\mathbb{E}_{\xi}[\|\xi-\xi_{q_{A}}\|^{2}]+\frac{1}{l}\mathbb{E}_{\xi}[r_{c}^{2}(\Omega_{A})],

where recall that 𝔼ξ0[]=𝔼ξ[]\mathbb{E}_{\xi_{0}}[\cdot]=\mathbb{E}_{\xi}[\cdot], and that for a given abstraction AA with corresponding encoder-decoder pair sA,gAs_{A},g_{A}, we have ξqA=gqA(sqA(ξ))\xi_{q_{A}}=g_{q_{A}}(s_{q_{A}}(\xi)) with sqA(ξ)=gA(sA(ξ))s_{q_{A}}(\xi)=g_{A}(s_{A}(\xi)) and gqA(z)=xc(z)g_{q_{A}}(z)=x_{c}(z), where xc(z)x_{c}(z) is the Chebyshev center of the set zz; and rc(ΩA)r_{c}(\Omega_{A}) is the Chebyshev radius of ΩA\Omega_{A}. Thus, ξqA\xi_{q_{A}} is the output of the encoder-decoder pair (sqA,gqA)(s_{q_{A}},g_{q_{A}}) with rate RR and message ξ\xi. Hence, the first term in the left-hand side of the above inequality, can be lower bounded by employing Thm. 2.1, to obtain:

Dabs(R)n2l(eR+h(ξ)n/2clSΓ(1+n/2))2/n+1linfA,|𝒴|eR𝔼ξ[rc2(ΩA)].D_{abs}(R)\geq\frac{n}{2l}\Big(\frac{e^{-R+h(\xi)-n/2}}{c_{{\mathcal{B}}_{l}^{S}}\Gamma(1+n/2)}\Big)^{2/n}+\frac{1}{l}\inf_{A,|\mathcal{Y}|\leq e^{R}}\mathbb{E}_{\xi}[r_{c}^{2}(\Omega_{A})].

To bound the second term, we employ Lemma 8.1. Notice that the abstraction’s outputs ΩA\Omega_{A} are nlnl-dimensional and define a cover999This cover is precisely 𝒵:={Z:Z=gA(sA(x0)),x0𝒳}\mathcal{Z}:=\{Z:Z=g_{A}(s_{A}(x_{0})),x_{0}\in\mathcal{X}\} and note that sA(x)s_{A}(x) takes values in the set |𝒴||\mathcal{Y}|. Thus |𝒵|=|𝒴||\mathcal{Z}|=|\mathcal{Y}|. of lS{\mathcal{B}}_{l}^{S} (which is nn-dimensional) with cardinality |𝒴|eR|\mathcal{Y}|\leq e^{R} (the same as the state-space partition). Thus, the term infA,|𝒴|eR𝔼[rc2(ΩA)]\inf_{A,|\mathcal{Y}|\leq e^{R}}\mathbb{E}[r_{c}^{2}(\Omega_{A})] can be lower bounded as in (15), where we replace mm by nn, MM by lS{\mathcal{B}}_{l}^{S}, NN by eRe^{R}, and μX\mu_{X} by μξ\mu_{\xi}. ∎

Proof of Prop. 5.3.

Fix any measurable subset 𝒜𝒳{\mathcal{A}}\subseteq\mathcal{X}. Because μξ0(𝒜)=μξ(bl(𝒜)),\mu_{\xi_{0}}({\mathcal{A}})=\mu_{\xi}(b_{l}({\mathcal{A}})), the definitions of pξp_{\xi} and pξ0p_{\xi_{0}} imply that

bl(A)pξ(y)dlSn(y)=𝒜pξ0(x)dn(x).\int_{b_{l}(A)}p_{\xi}(y)\mathrm{d}\mathcal{H}^{n}_{{\mathcal{B}}^{S}_{l}}(y)=\int_{{\mathcal{A}}}p_{\xi_{0}}(x)\mathrm{d}{\mathcal{L}}^{n}(x).

But also, since blb_{l} is injective, the area formula [31, Thm. 3.2.5] gives

bl(𝒜)pξ(y)dlSn(y)=𝒜pξ(bl(x))det(Jbl(x)TJbl(x))dn\int_{b_{l}({\mathcal{A}})}\hskip-5.69054ptp_{\xi}(y)\mathrm{d}\mathcal{H}^{n}_{{\mathcal{B}}^{S}_{l}}(y)=\int_{\mathcal{A}}p_{\xi}(b_{l}(x))\sqrt{\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))}\mathrm{d}{\mathcal{L}}^{n}

implying that, for almost all x𝒳,x\in\mathcal{X},

pξ(bl(x))=pξ0(x)det(Jbl(x)TJbl(x)).p_{\xi}(b_{l}(x))=\frac{p_{\xi_{0}}(x)}{\sqrt{\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))}}. (17)

Then, (2) becomes

h(ξ)=\displaystyle h(\xi)= npξ0(x)log(pξ0(x))dn\displaystyle-\int_{\mathbb{R}^{n}}p_{\xi_{0}}(x)\log(p_{\xi_{0}}(x))\mathrm{d}{\mathcal{L}}^{n}
+12nlogdet(Jbl(x)TJbl(x))pξ0(x)dn.\displaystyle+\frac{1}{2}\int_{\mathbb{R}^{n}}\log\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))p_{\xi_{0}}(x)\mathrm{d}{\mathcal{L}}^{n}.

Likewise, the area formula gives

hs(ξ)=\displaystyle h_{s}(\xi)= 11sloglSpξsdn\displaystyle\frac{1}{1-s}\log\int_{{\mathcal{B}}_{l}^{S}}p_{\xi}^{s}\,\mathrm{d}\mathcal{H}^{n}
=\displaystyle= 11slognpξ(bl(x))sdet(Jbl(x)TJbl(x))12dn,\displaystyle\frac{1}{1-s}\log\int_{\mathbb{R}^{n}}p_{\xi}(b_{l}(x))^{s}\,\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))^{\frac{1}{2}}\mathrm{d}{\mathcal{L}}^{n},

and, applying (17) gives (12). Finally, in the particular case of s=,s=\infty, we have

hs(ξ)\displaystyle h_{s}(\xi) =s1slog(npξ0(x)sdet(Jbl(x)TJbl(x))s12dx)1/s\displaystyle=\frac{s}{1-s}\log\Big(\int_{\mathbb{R}^{n}}\frac{p_{\xi_{0}}(x)^{s}}{\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))^{\frac{s-1}{2}}}\,\mathrm{d}x\Big)^{1/s}
=slogesssupdet(Jbl(x)TJbl(x))pξ0(x),\displaystyle\underset{s\to\infty}{=}\log\mathrm{ess}\sup\frac{\sqrt{\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))}}{p_{\xi_{0}}(x)},

which gives the desired result by exploiting the fact that log\log is monotonically increasing.

Lemma 8.2.

Let Xn,X\subset\mathbb{R}^{n}, and f:XNf:X\to\mathbb{R}^{N}, Nn,N\geq n, be a bi-Lipschitz function satisfying

xxf(x)f(x)Lxx,x,xX,\|x-x^{\prime}\|\leq\|f(x)-f(x^{\prime})\|\leq L\|x-x^{\prime}\|,\quad\forall x,x^{\prime}\in X,

for some L1L\geq 1. Then for every yNy\in\mathbb{R}^{N} and δ>0\delta>0,

n(f(X)B(y,δ))Lnvnδn.\mathcal{H}^{n}(f(X)\cap B(y,\delta))\leq L^{n}v_{n}\delta^{n}.
Proof.

Fix yNy\in\mathbb{R}^{N} and δ>0\delta>0 and define Zf(X)B(y,δ)Z\coloneqq f(X)\cap B(y,\delta) and its pre-image Ef1(Z)nE\coloneqq f^{-1}(Z)\subset\mathbb{R}^{n}. We start by finding a ball in n\mathbb{R}^{n} bounding EE.

For any x1,x2Ex_{1},x_{2}\in E, we have f(x1),f(x2)B(y,δ)f(x_{1}),f(x_{2})\in B(y,\delta), so

f(x1)f(x2)f(x1)y+f(x2)y<2δ.\|f(x_{1})-f(x_{2})\|\leq\|f(x_{1})-y\|+\|f(x_{2})-y\|<2\delta.

By the lower Lipschitz bound x1x2f(x1)f(x2)\|x_{1}-x_{2}\|\leq\|f(x_{1})-f(x_{2})\|, it follows that x1x22δ.\|x_{1}-x_{2}\|\leq 2\delta. This implies that EE is contained in some nn-dimensional ball of radius δ\delta. Therefore, n(E)vnδn.\mathcal{H}^{n}(E)\leq v_{n}\delta^{n}.

Since ff is LL-Lipschitz, by fundamental properties of the Hausdorff measure [31, Sec. 2.10.11]

n(Z)=n(f(E))Lnn(E)Lnvnδn.\mathcal{H}^{n}(Z)=\mathcal{H}^{n}(f(E))\leq L^{n}\mathcal{H}^{n}(E)\leq L^{n}v_{n}\delta^{n}.

Proof of Prop. 5.4.

We again use the function bl:𝒳lSb_{l}:\mathcal{X}\to{\mathcal{B}}_{l}^{S}, defined by (10). Since by assumption 𝒳\mathcal{X} is full dimensional in n\mathbb{R}^{n}, the tightest value for c𝒳c_{\mathcal{X}} is cn=vn.c_{\mathbb{R}^{n}}=v_{n}. Now we look at each case.

Case (1) follows trivially by the observation that lS{\mathcal{B}}_{l}^{S} is an nn-dimensional affine subset of nl\mathbb{R}^{nl}, and that the intersection of an nlnl-ball of radius rr and a plane of dimension nn is a ball of dimension nn and radius r.\leq r. Hence, lS(B(z,δ))vnδn\mathcal{H}_{{\mathcal{B}}_{l}^{S}}(B(z,\delta))\leq v_{n}\delta^{n}, for all znlz\in\mathbb{R}^{nl}.

Case (2): If ff is piecewise affine, so is lS{\mathcal{B}}_{l}^{S}, which has at most MlM^{l} disjoint pieces. Denote by ZiZ_{i} each such piece of lS{\mathcal{B}}_{l}^{S}, which is a bounded, connected nn-dimensional subset of some affine subspace of nl.\mathbb{R}^{nl}. Thus, lS=i=1NZi{\mathcal{B}}_{l}^{S}=\bigcup_{i=1}^{N}Z_{i}, with NMlN\leq M^{l}. Then, for all znlz\in\mathbb{R}^{nl} and δ>0,\delta>0,

n(iZiB(z,δ))=i=1Nn(ZiB(z,δ))Mlvnδn,\mathcal{H}^{n}\Big(\bigcup_{i}Z_{i}\cap B(z,\delta)\Big)=\sum_{i=1}^{N}\mathcal{H}^{n}(Z_{i}\cap B(z,\delta))\leq M^{l}v_{n}\delta^{n},

where in the last inequality we have used case (1) and the fact that NMlN\leq M^{l}.

Case (3): It is easy to see that blb_{l} is bi-Lipschitz with

xybl(x)bl(y)(i=0lL2i)1/2xy.\|x-y\|\leq\|b_{l}(x)-b_{l}(y)\|\leq\Big(\sum_{i=0}^{l}L^{2i}\Big)^{1/2}\|x-y\|.

Hence the result comes from applying Lemma 8.2. ∎

References

  • [1] P. Tabuada, Verification and control of hybrid systems: a symbolic approach. Springer Science & Business Media, 2009.
  • [2] A. Lavaei, S. Soudjani, A. Abate, and M. Zamani, “Automated verification and synthesis of stochastic hybrid systems: A survey,” Automatica, vol. 146, p. 110617, 2022.
  • [3] A. Girard, G. Pola, and P. Tabuada, “Approximately bisimilar symbolic models for incrementally stable switched systems,” IEEE Transactions on Automatic Control, vol. 55, no. 1, pp. 116–126, 2009.
  • [4] M. Rungger and M. Zamani, “Scots: A tool for the synthesis of symbolic controllers,” in Proceedings of the 19th international conference on hybrid systems: Computation and control, 2016, pp. 99–104.
  • [5] K. Mallik, A.-K. Schmuck, S. Soudjani, and R. Majumdar, “Compositional synthesis of finite-state abstractions,” IEEE Transactions on Automatic Control, vol. 64, no. 6, pp. 2629–2636, 2018.
  • [6] M. Zamani, P. M. Esfahani, R. Majumdar, A. Abate, and J. Lygeros, “Symbolic control of stochastic systems via approximately bisimilar finite abstractions,” IEEE Transactions on Automatic Control, vol. 59, no. 12, pp. 3135–3150, 2014.
  • [7] M. Lahijanian, S. B. Andersson, and C. Belta, “Formal verification and synthesis for discrete-time stochastic systems,” IEEE Transactions on Automatic Control, vol. 60, no. 8, pp. 2031–2045, 2015.
  • [8] A. Abate, J.-P. Katoen, J. Lygeros, and M. Prandini, “Approximate model checking of stochastic hybrid systems,” European Journal of Control, vol. 16, no. 6, pp. 624–641, 2010.
  • [9] R. Coppola, A. Peruffo, and M. Mazo, “Data-driven abstractions for verification of linear systems,” IEEE Control Systems Letters, vol. 7, pp. 2737–2742, 2023.
  • [10] T. Badings, L. Romao, A. Abate, D. Parker, H. A. Poonawala, M. Stoelinga, and N. Jansen, “Robust control for dynamical systems with non-gaussian noise via formal abstractions,” Journal of Artificial Intelligence Research, vol. 76, pp. 341–391, 2023.
  • [11] A. Devonport, A. Saoud, and M. Arcak, “Symbolic abstractions from data: A pac learning approach,” in 2021 60th IEEE Conference on Decision and Control (CDC). IEEE, 2021, pp. 599–604.
  • [12] M. Kazemi, R. Majumdar, M. Salamati, S. Soudjani, and B. Wooding, “Data-driven abstraction-based control synthesis,” Nonlinear Analysis: Hybrid Systems, vol. 52, p. 101467, 2024.
  • [13] T. M. Cover, Elements of information theory. John Wiley & Sons, 1999.
  • [14] E. Riegler, H. Bölcskei, and G. Koliander, “Rate-distortion theory for general sets and measures,” in 2018 IEEE International Symposium on Information Theory (ISIT). IEEE, 2018, pp. 101–105.
  • [15] E. Riegler, G. Koliander, and H. Bölcskei, “Lossy compression of general random variables,” Information and Inference: A Journal of the IMA, vol. 12, no. 3, pp. 1759–1829, 2023.
  • [16] S. Esmaeil Zadeh Soudjani and A. Abate, “Adaptive and sequential gridding procedures for the abstraction and verification of stochastic processes,” SIAM Journal on Applied Dynamical Systems, vol. 12, no. 2, pp. 921–956, 2013.
  • [17] S. Adams, M. Lahijanian, and L. Laurenti, “Formal control synthesis for stochastic neural network dynamic models,” IEEE Control Systems Letters, vol. 6, pp. 2858–2863, 2022.
  • [18] Y. Tazaki and J.-i. Imura, “Discrete-state abstractions of nonlinear systems using multi-resolution quantizer,” in International Workshop on Hybrid Systems: Computation and Control. Springer, 2009, pp. 351–365.
  • [19] K. Hsu, R. Majumdar, K. Mallik, and A.-K. Schmuck, “Multi-layered abstraction-based controller synthesis for continuous-time systems,” in Proceedings of the 21st International Conference on Hybrid Systems: Computation and Control (part of CPS Week), 2018, pp. 120–129.
  • [20] J. Calbert, L. N. Egidio, and R. M. Jungers, “Smart abstraction based on iterative cover and non-uniform cells,” IEEE Control Systems Letters, vol. 8, pp. 2301–2306, 2024.
  • [21] A.-K. Schmuck and J. Raisch, “Asynchronous l-complete approximations,” Systems & Control Letters, vol. 73, pp. 67–75, 2014.
  • [22] A. Banse, G. Delimpaltadakis, L. Laurenti, M. Mazo Jr, and R. M. Jungers, “Memory-dependent abstractions of stochastic systems through the lens of transfer operators,” in Proceedings of the 28th ACM International Conference on Hybrid Systems: Computation and Control, 2025, pp. 1–12.
  • [23] G. A. Gleizer and M. Mazo Jr, “Chaos and order in event-triggered control,” IEEE Transactions on Automatic Control, vol. 68, no. 11, pp. 6541–6556, 2023.
  • [24] A. Lavaei, S. Soudjani, and M. Zamani, “Compositional abstraction of large-scale stochastic systems: A relaxed dissipativity approach,” Nonlinear Analysis: Hybrid Systems, vol. 36, p. 100880, 2020.
  • [25] G. Delimpaltadakis, M. Lahijanian, M. Mazo Jr, and L. Laurenti, “Interval markov decision processes with continuous action-spaces,” in Proceedings of the 26th ACM International Conference on Hybrid Systems: Computation and Control, 2023, pp. 1–10.
  • [26] D. Lind and B. Marcus, An introduction to symbolic dynamics and coding. Cambridge university press, 2021.
  • [27] E. Lindenstrauss and M. Tsukamoto, “From rate distortion theory to metric mean dimension: variational principle,” IEEE Transactions on Information Theory, vol. 64, no. 5, pp. 3590–3609, 2018.
  • [28] D. Abel, D. Arumugam, K. Asadi, Y. Jinnai, M. L. Littman, and L. L. Wong, “State abstraction as compression in apprenticeship learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 3134–3142.
  • [29] O. Biza, R. Platt, J.-W. van de Meent, and L. L. Wong, “Learning discrete state abstractions with deep variational inference,” arXiv preprint arXiv:2003.04300, 2020.
  • [30] D. T. Larsson, D. Maity, and P. Tsiotras, “A generalized information-theoretic framework for the emergence of hierarchical abstractions in resource-limited systems,” Entropy, vol. 24, no. 6, p. 809, 2022.
  • [31] H. Federer, Geometric measure theory, ser. Grundlehren Math. Wiss. Springer, Cham, 1969, vol. 153.
  • [32] N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” arXiv preprint physics/0004057, 2000.