An Information Theory of Finite Abstractions and their Fundamental Scalability Limits

\authorblockNGiannis Delimpaltadakis and Gabriel Gleizer Giannis Delimpaltadakis is with the Control Systems Technology group, Mechanical Engineering, Eindhoven University of Technology. Gabriel Gleizer is with the Delft Center for Systems and Control, Mechanical Engineering, Delft University of Technology. Emails: [email protected], [email protected].
This research is partially supported by the project “Chaotic sampling for secure and sustainable networks of control systems” with file number 21937 of the research programme VENI AES 2024 which is (partly) financed by the Dutch Research Council (NWO) under the grant https://doi.org/10.61686/WZNAX74774.

Abstract

Finite abstractions are discrete approximations of dynamical systems, such that the set of abstraction trajectories contains, in a formal sense, all system trajectories. There is a consensus that abstractions suffer from the curse of dimensionality: for the same “accuracy” (how closely the abstraction represents the system), the abstraction size scales poorly with system dimensions. And, yet, after decades of research on abstractions, there are no formal results concerning their accuracy-size tradeoff. In this work, we derive a statistical, quantitative theory of abstractions’ accuracy-size tradeoff and uncover fundamental limits on their scalability, through rate-distortion theory – the branch of information theory studying lossy compression. Abstractions are viewed as encoder-decoder pairs, encoding trajectories of dynamical systems in a higher-dimensional ambient space. Rate represents abstraction size, while distortion describes abstraction accuracy, defined as the spatial average deviation between abstract trajectories and system ones. We obtain a fundamental lower bound on the minimum abstraction distortion, given the system dynamics and a threshold on abstraction size. The bound depends on the complexity of the dynamics, through generalized entropy. We demonstrate the bound’s tightness on certain dynamical systems. Finally, we showcase how the developed theory can be employed to construct optimal abstractions, in terms of the size-accuracy tradeoff, through an example on a chaotic system.

1 Introduction

Modern engineering systems are becoming more complex and must meet intricate specifications in safety-critical situations. For instance, a self-driving car must follow traffic rules, avoid collisions, and optimize speed and fuel consumption. Due to the complexity of these systems, traditional analytic methods for verification and control are intractable. For over two decades, to address verification and control of complex dynamics and objectives, abstraction-based methods have flourished [1, 2]. Given a dynamical system, these methods construct a finite system – the abstraction –, arising from partitioning the state (and control) space of the original system, such that all trajectories of the original system are contained, in a formal sense, in the set of abstraction trajectories. Employing this property, one may solve an intractable verification or control problem for the original system over the finite abstraction, with formal guarantees of correctness. Over the years, research on abstractions has spanned deterministic systems [3, 4, 5], stochastic systems [6, 7, 8], and, lately, data-driven scenarios [9, 10, 11, 12].

Despite their immense success, there is a consensus that abstractions suffer from the curse of dimensionality, limiting their practical relevance; for a given accuracy (how closely the abstraction describes the true dynamics), the abstraction size scales poorly with system dimensions. And even though abstractions have received considerable interest in the past decades, there are still no formal results concerning their curse of dimensionality and accuracy-size tradeoff.

Contributions

In this work, we derive a statistical, quantitative theory of abstractions’ accuracy-size tradeoff and uncover fundamental limits on their scalability. To that end, we establish connections with rate-distortion theory – the branch of information theory studying lossy compression [13, Chapter 10]. The key observation for the whole theory is that abstractions are information-theoretic encoder-decoder pairs, encoding trajectories of dynamical systems, in a higher-dimensional, ambient space. Rate represents abstraction size, while distortion is defined as the spatial average deviation between abstract trajectories and system ones, thus capturing the average accuracy of an abstraction. Then, building on recent developments in rate-distortion theory for generalized measurable sets [14, 15], we derive fundamental limits of abstractions’ accuracy-size tradeoff: for given system dynamics, we obtain a fundamental lower bound on the minimum abstraction distortion, for a given threshold on abstraction size. The fundamental lower bound depends on the complexity of the system’s dynamics, through generalized entropy. We demonstrate the tightness of the bound on certain dynamical systems. Finally, we showcase how the developed theory can be employed to construct optimal abstractions, in terms of the size-accuracy tradeoff, through an example on a chaotic system, and we provide a discussion towards a general procedure for constructing optimal abstractions.

Related work

Through decades of research, there has been considerable effort to construct scalable abstractions. Indicatively, [16, 17, 18] adapt the partition’s resolution depending on the local uncertainty a given state-space region induces to the abstraction. Further, [19] constructs multi-resolution abstractions, employing feedback-refinement relations. The work [20] employs optimal control, such that the generated trajectories result in smaller abstraction cells and only a portion of the state space needs to be partitioned. Although the above methods result in more scalable abstractions, they neither provide quantitative results on the accuracy-size tradeoff, nor optimize some metric describing it. Another approach to derive more accurate abstractions is introducing memory [21, 22], based on sequences of outputs. In [23], it is shown that the size of such memory-based abstractions increases exponentially with the sequence length for deterministic chaotic systems. Apart from adaptive-partitioning techniques, compositional methods [5, 24] decompose the system to smaller ones, that are abstracted more efficiently. However, they do not address scalability issues of abstracting each subsystem. Further, it is also worth mentioning [25], which, for a particular class of stochastic abstractions, demonstrates that partitioning the control space is unnecessary.

The connection between information theory and symbolic dynamics is well-known [26]; listing the whole literature on the topic is impossible. Worth mentioning is the work in [27], which employs rate-distortion theory to characterize complexity of dynamical systems and their relationship with so-called shifts¹¹1A class of discrete systems. Abstractions can be cast as shifts.. Nonetheless, this work does not consider the deviation between a shift and a dynamical system, but rather focuses on asymptotic results (arbitrarily large partition size, steady-state trajectories) and the qualitative question of if a system can be embedded into a shift. Thus, it does not provide a quantitative theory of the accuracy-size tradeoff. Finally, the works [28, 29, 30] employ rate-distortion theory, to compress models that are already discrete and do not focus on abstracting continuous dynamics with formal guarantees.

2 Preliminaries

2.1 Measure spaces, Hausdorff measure, generalized entropy

For our purposes, we make use of information theory over general measurable spaces, based on [14, 15]. Thus, we first recall some related notions. We denote the $m$ -dimensional Hausdorff measure²²2The Hausdorff measure is a generalization of the Lebesgue measure, and measures the size of a given set. E.g., $\mathcal{H}^{1}(\mathcal{C})=2\pi$ , where ${\mathcal{C}}$ is the unit circle embedded in $\mathbb{R}^{n}$ . by $\mathcal{H}^{m}$ . Denote the restriction of $\mathcal{H}^{m}$ to the compact set ${\mathcal{K}}$ by $\mathcal{H}^{m}_{{\mathcal{K}}}$ . Consider a measure $\mu$ over the measure space $(\mathcal{X},\Sigma_{\mathcal{X}},\nu)$ , where $\Sigma_{\mathcal{X}}$ is the Borel $\sigma$ -algebra of $\mathcal{X}\subseteq\mathbb{R}^{n}$ . When $\mu$ is absolutely continuous w.r.t. $\nu$ (denoted by $\mu\ll\nu$ ), we denote the Radon–Nikodym derivative by $\frac{\mathrm{d}\mu}{\mathrm{d}\nu}$ . When $\nu=\mathcal{H}^{m}_{\mathcal{X}}$ (assuming $\mathcal{X}$ is $m-$ dimensional), then $\frac{\mathrm{d}\mu}{\mathrm{d}\mathcal{H}^{m}_{\mathcal{X}}}$ is the probability distribution associated to $\mu$ . Absolute continuity $\mu\ll\mathcal{H}^{m}_{\mathcal{X}}$ suggests that $\mu$ is not concentrated in arbitrarily small balls in $\mathcal{X}$ . We denote the volume of the unit ball in $\mathbb{R}^{n}$ as $v_{n}=\frac{\pi^{n/2}}{\Gamma(n/2+1)}$ , where the Gamma function $\Gamma(a)=\int_{0}^{\infty}t^{a-1}e^{-t}\mathrm{d}t$ .

Let $\mathcal{X}\subset\mathbb{R}^{n}$ be a finite union of compact, $m$ -dimensional, $C^{1}$ -manifolds. Denote by $c_{\mathcal{X}}>0$ the constant such that

\mathcal{H}^{m}_{\mathcal{X}}(B(\hat{x},\delta))\leq c_{\mathcal{X}}\delta^{m},\quad\text{for all }\hat{x}\in\mathbb{R}^{n}\ \text{and }\delta>0,

(1)

where $B(\hat{x},\delta)\coloneqq\{x\in\mathcal{X}:\ \|x-\hat{x}\|<\delta\}$ . The constant $c_{\mathcal{X}}$ always exists and is finite, as per [14, Lemma 1].

Consider a random variable $x$ , distributed over the measure space $(\mathcal{X},\Sigma_{\mathcal{X}},\mathcal{H}^{m}_{\mathcal{X}})$ with probability measure $\mu_{x}$ . The generalized entropy of $x$ (w.r.t the Hausdorff measure) is

h(x)\coloneqq-\mathbb{E}_{x}\!\left[\log\Big(\frac{\mathrm{d}\mu_{x}}{\mathrm{d}\mathcal{H}^{m}_{\mathcal{X}}}\Big)\right],

(2)

where $\mathbb{E}_{x}[\cdot]$ denotes the expectation operator w.r.t. the random variable $x$ . The generalized entropy is the extension of the classical Shannon entropy to continuous spaces, and is a measure of uncertainty or complexity of a random variable. For a measure space $(\mathcal{X},\Sigma_{\mathcal{X}},\mathcal{H}^{m}_{\mathcal{X}})$ with $0<\mathcal{H}^{m}_{\mathcal{X}}(\mathcal{X})<\infty$ : a) $h(x)$ is maximized for the uniform distribution $\mu_{x}=\mathcal{H}^{1}_{\mathcal{X}}/\mathcal{H}^{1}_{\mathcal{X}}(\mathcal{X})$ , b) $h(x)$ is bounded, when $\mu_{x}\ll\mathcal{H}^{m}_{\mathcal{X}}$ . Finally, employing a similar generalization as in (2), let us denote the generalized Rényi entropy with parameter $a\in\mathbb{R}_{+}\setminus\{1\}$ by $h_{a}(x):=\frac{1}{1-a}\log\mathbb{E}_{x}[(\frac{\mathrm{d}\mu_{x}}{\mathrm{d}\mathcal{H}_{\mathcal{X}}^{m}})^{a-1}]$ .

The example below shows how the above apply to computing the entropy a dynamical system’s trajectories.

Example 2.1 (Entropy of trajectories of the doubling map).

Consider the dynamical system $x^{+}=f(x)$ , where the doubling map $f:[0,1]\to[0,1]:x\mapsto 2x\ \mathrm{mod}\ 1$ . Consider the set of $3$ -length trajectories of the system ${\mathcal{B}}:=\{(x_{0},f(x_{0}),\allowbreak f(f(x_{0}))):\ x_{0}\in[0,1]\}\subseteq[0,1]^{3}$ . Notice that ${\mathcal{B}}$ is the union of 4 straight-line segments:

	$\displaystyle{\mathcal{B}}=$	$\displaystyle\{(x_{0},2x_{0},4x_{0}):x_{0}\in[0,.25]\}\cup$
		$\displaystyle\{(x_{0},2x_{0},4x_{0}-1):x_{0}\in[.25,.5]\}\cup$
		$\displaystyle\{(x_{0},2x_{0}-1,4x_{0}-2):x_{0}\in[.5,.75]\}\cup$
		$\displaystyle\{(x_{0},2x_{0}-1,4x_{0}-3):x_{0}\in[.75,.1]\}.$

Further, consider random initial conditions $x_{0}\sim U[0,1]$ , where $U[0,1]$ is the uniform distribution over $[0,1]$ . It is well known that $U[0,1]$ is invariant under the doubling map. Thus, the random variable $\xi(x_{0})=(x_{0},f(x_{0}),\allowbreak f(f(x_{0})))\in{\mathcal{B}}$ , that is the system trajectories, is uniformly distributed across ${\mathcal{B}}$ ; i.e., the probability measure $\mu_{\xi}=\mathcal{H}^{1}_{{\mathcal{B}}}/\mathcal{H}^{1}_{{\mathcal{B}}}({\mathcal{B}})$ . Its generalized entropy is

	$\displaystyle h(\xi)=-\mathbb{E}_{\xi}\!\left[\log\Big(\frac{\mathrm{d}\mu_{\xi}}{\mathrm{d}\mathcal{H}^{1}_{{\mathcal{B}}}}\Big)\right]$	$\displaystyle=-\int_{{\mathcal{B}}}\log\Big(\frac{\mathrm{d}\mu_{\xi}}{\mathrm{d}\mathcal{H}^{1}_{{\mathcal{B}}}}\Big)\mathrm{d}\mu_{\xi}$
		$\displaystyle=-\int_{{\mathcal{B}}}\log(\frac{1}{\mathcal{H}^{1}_{{\mathcal{B}}}({\mathcal{B}})})\mathrm{d}\mu_{\xi}$
		$\displaystyle=\log(\sqrt{21})\int_{{\mathcal{B}}}\mathrm{d}\mu_{\xi}=\log(\sqrt{21}),$

where we have used that the total length of the line segments is $\mathcal{H}^{1}_{{\mathcal{B}}}({\mathcal{B}})=\sqrt{21}$ , and that $\int_{{\mathcal{B}}}\mathrm{d}\mu_{\xi}=1$ , as $\mu_{\xi}$ is a probability measure.

2.2 Rate-distortion theory on measurable spaces

Refer to caption — Figure 1: The typical source coding setting.

A typical setting in information theory is source coding, see Fig. 1. A source emits a message $x\in\mathcal{X}\subseteq\mathbb{R}^{n}$ , which is a random variable over the measure space $(\mathcal{X},\Sigma_{\mathcal{X}},\mathcal{H}^{m}_{\mathcal{X}})$ , with associated probability measure $\mu_{x}$ , where $X$ is assumed to be $m$ -dimensional. The encoder $s:\mathcal{X}\to\mathcal{Y}$ , where $\mathcal{Y}$ is finite, outputs the coded message $s(x)=y\in\mathcal{Y}$ . Finally, the decoder $g:\mathcal{Y}\to\hat{\mathcal{X}}$ , upon receiving $y$ , decodes it into $g(y)=\hat{x}$ . Compression takes place by encoding the continuous message $x$ into a low-dimensional, finite coded message $z$ . The encoder cardinality $|\mathcal{Y}|$ determines the compression, and the compression rate is defined by $\log|\mathcal{Y}|$ . A distortion function $d:\mathcal{X}\times\hat{\mathcal{X}}\to\mathbb{R}_{+}$ measures the deviation of $\hat{x}$ from the original message $x$ . A typical distortion function, when $\hat{\mathcal{X}}=\mathbb{R}^{n}$ , is the squared error $d(x,\hat{x})=\|x-\hat{x}\|^{2}$ .³³3Here, we present a simplified setting of source coding, where the encoder-decoder is deterministic, and the source emits a single message. For the general theory, see [13, 15].

Of particular interest is the fundamental limit of the rate-distortion tradeoff, i.e. the following quantity:

	$\displaystyle D(R)=\inf_{s,g}$	$\displaystyle\ \mathbb{E}_{x}[d(x,\hat{x})\mid s,g]$
	$\displaystyle\mathrm{s.t.}$	$\displaystyle\ s:\mathcal{X}\to\mathcal{Y},\ g:\mathcal{Y}\to\hat{\mathcal{X}},$
		$\displaystyle\ \log\|\mathcal{Y}\|\leq R,\ y=s(x),\ \hat{x}=g(y),$

where the expectation is taken w.r.t. the random variable $x$ . In words, $D(R)$ is the minimum achievable average distortion, for a given compression rate threshold $R$ . The function $D(R)$ has an inverse, $R(D)$ , which is the minimum compression rate, for a given maximum expected distortion threshold $D$ . The following result provides a fundamental lower bound on $R(D)$ and $D(R)$ .

Theorem 2.1 (Generalized Shannon lower bound [15, Thm. 3.1, simplified]).

Let $\mathcal{X}\subseteq\mathbb{R}^{n}$ be a finite union of compact, $m$ -dimensional, $C^{1}$ -manifolds, and $\mu_{x}\ll\mathcal{H}^{m}_{\mathcal{X}}$ . Assume that $\hat{\mathcal{X}}\subseteq\mathbb{R}^{n}$ and that $(\hat{\mathcal{X}},\Sigma_{\hat{\mathcal{X}}})$ is measurable. Consider the Euclidean distortion fucntion $d:\mathcal{X}\times\hat{\mathcal{X}}\to\mathbb{R}_{+}:(x,\hat{x})\mapsto\|x-\hat{x}\|^{2}$ . Then

	$\displaystyle R(D)$	$\displaystyle\geq R_{*}(D)\coloneqq h(x)-\frac{m}{2}-\log\Big(\frac{c_{\mathcal{X}}D^{m/2}\Gamma(1+m/2)}{(\frac{m}{2})^{m/2}}\Big),$		(3)
	$\displaystyle D(R)$	$\displaystyle\geq D_{*}(R)\coloneqq\frac{m}{2}\Big(\frac{e^{-R+h(x)-m/2}}{c_{\mathcal{X}}\Gamma(1+m/2)}\Big)^{2/m}.$		(4)

Proof Sketch.

This is the special case of [15, Thm. 3.1] for (finite unions of) compact, $C^{1}$ -manifolds and Euclidean distortion. ∎

2.3 Transition systems

Definition 2.2 (Transition system).

A transition system $S$ is a tuple $S=(\mathcal{X},\underset{S}{\rightarrow})$ , where $\mathcal{X}$ is the state space and $\underset{S}{\rightarrow}\subseteq\mathcal{X}\times\mathcal{X}$ is a transition relation.

A transition system $S=(\mathcal{X},\underset{S}{\rightarrow})$ is deterministic if, for any $x\in\mathcal{X}$ , there exists at most one $x^{\prime}\in\mathcal{X}$ , such that $(x,x^{\prime})\in\underset{S}{\rightarrow}$ . Given a transition system $S=(\mathcal{X},\underset{S}{\rightarrow})$ , its $l$ -length behavior ${\mathcal{B}}_{l}^{S}$ is defined as ${\mathcal{B}}_{l}^{S}:=\Big\{\xi:\ \xi=\{x_{i}\}_{i=0}^{l-1},\ (x_{k},x_{k+1})\in\underset{S}{\rightarrow},\ k=0,1,\dots,l-1\Big\}$ . That is, the $l$ -length behavior is the set of $l$ -long trajectories. Notice that ${\mathcal{B}}_{l}^{S}\subseteq\mathcal{X}^{l}$ .

3 Abstractions and the curse of dimensionality

3.1 Finite abstractions of dynamical systems

Throughout this work, we consider deterministic dynamical systems $x^{+}=f(x)$ , with $f:\mathcal{X}\to\mathcal{X}$ . Dynamical systems obtain the transition-system representation $S=(\mathcal{X},\underset{S}{\rightarrow})$ , where $\underset{S}{\rightarrow}:=\{(x,y):y=f(x),\ x\in\mathcal{X}\}$ . We make the following assumption.

Assumption 1 (The state space).

The set $\mathcal{X}\subseteq\mathbb{R}^{n}$ is $n$ -dimensional, connected and compact.

Under this assumption, ${\mathcal{B}}_{l}^{S}$ is an $n$ -dimensional subset of $\mathcal{X}^{L}\subseteq\mathbb{R}^{nl}$ .

Let us introduce abstractions of dynamical systems.

Definition 3.1 (Measurable Partition).

Given a set $\mathcal{X}$ , a finite collection of measurable, disjoint sets $\mathcal{Y}=\{Y_{i}\}$ , such that $\bigcup_{i}Y_{i}\supseteq\mathcal{X}$ , is a measurable partition of $\mathcal{X}$ .

Definition 3.2 (Abstraction).

Given a dynamical system with transition-system representation $S=(\mathcal{X},\underset{S}{\rightarrow})$ and a measurable partition $\mathcal{Y}=\{Y_{i}\}$ of $\mathcal{X}$ , a transition system $A=(\mathcal{Y},\underset{A}{\rightarrow}\nolinebreak)$ is an abstraction of $S$ if, for any $x,x^{\prime}\in\mathcal{X}$ and $Y,Y^{\prime}\in\mathcal{Y}$ , such that $x\in Y$ and $x^{\prime}\in Y^{\prime}$ , we have $(x,x^{\prime})\in\underset{S}{\rightarrow}$ $\implies$ $(Y,Y^{\prime})\in\underset{A}{\rightarrow}$ .

Although the dynamical system $S$ is deterministic, the abstraction $A$ is generally non-deterministic. With a slight abuse of formality, we often treat trajectories $\{\omega_{i}\}_{i=1}^{l}$ of the abstraction (with $\omega_{i}\in\mathcal{Y}$ ) as subsets of $\mathcal{X}^{l}$ , that is $\{\omega_{i}\}_{i=1}^{l}\equiv\omega_{0}\times\omega_{1}\times\dots\times\omega_{l}\subseteq\mathcal{X}^{l}$ .

Theorem 3.3 (Behavioral inclusion [1, Theorem 4.18, simplified]).

For a system $S=(\mathcal{X},\underset{S}{\rightarrow})$ , a partition $\mathcal{Y}$ of $\mathcal{X}$ and an abstraction $A=(\mathcal{Y},\underset{A}{\rightarrow})$ of $S$ , the following holds for any $l$ : ${\mathcal{B}}_{l}^{S}\subseteq{\mathcal{B}}_{l}^{A}$ .

In fact, ${\mathcal{B}}_{l}^{A}$ is $nl$ -dimensional, and covers the $n$ -dimensional set of system trajectories ${\mathcal{B}}_{l}^{S}$ . This observation is instrumental in this work. Through behavioral inclusion, abstractions encode information about the infinite, continuous system behavior ${\mathcal{B}}_{S}^{l}$ into the finite abstraction behavior set ${\mathcal{B}}_{l}^{A}$ . While this enables computational methods to verification problems for dynamical systems, it also generally entails information loss, as the following section explains.

3.2 Abstraction-based verification and information loss

In typical verification problems, we are given a set of initial conditions $\Xi_{0}\subseteq\mathcal{X}$ for the system $S$ and we have to check if the corresponding set of system trajectories $\Xi=\{\xi\in{\mathcal{B}}_{l}^{S}:\ \xi_{0}\in\Xi_{0}\}$ satisfies a given property. For example, in the case of safety, we have to check if $\Xi\cap\mathcal{U}^{l}=\emptyset$ , where $\mathcal{U}\subseteq\mathcal{X}$ is an unsafe set. Computing the exact reachable set $\Xi$ is generally impossible. Abstractions $A$ address this problem by computing the corresponding set of abstract state trajectories $\Omega_{A}=\bigcup\limits_{\omega\in{\mathcal{B}}_{l}^{A},\ \omega_{0}\cap\Xi_{0}\neq\emptyset}\omega$ , which is tractable, as the abstraction is finite. Notice that, by behavioral inclusion, we have $\Xi\subseteq\Omega_{A}$ . Finally, for safety verification, if $\Omega_{A}\cap\mathcal{U}^{l}=\emptyset$ , then one may safely deduce that the system is safe.

As abstractions group system states $x\in\mathcal{X}$ in sets $Y\subseteq\mathcal{X}$ , information loss is inevitable. In general, the partition $\mathcal{Y}$ needs to have a relatively high resolution, to recover a meaningful verification answer. E.g., in the extreme case of $|\mathcal{Y}|=1$ , for any set of initial conditions $\Xi_{0}\subseteq\mathcal{X}$ , the abstraction returns $\Omega_{A}=\mathcal{X}^{l}$ , i.e. the whole ambient space of $l$ -length trajectories. As such, for small $|\mathcal{Y}|$ , the abstraction $A$ does not accurately represent the system $S$ . On the other hand, for large $|\mathcal{Y}|$ , where the abstraction is more accurate, the computations on the abstraction become heavier – even intractable. Thus, there is a trade-off between abstraction accuracy and partition size $|\mathcal{Y}|$ . In what follows, we provide a statistical, quantitative theory of the accuracy-size tradeoff, based on rate-distortion theory, and provide bounds on the accuracy-size tradeoff.

4 Information-theoretic framework for finite abstractions

In what follows, consider the dynamical system $x^{+}=f(x)$ , with $x\in\mathcal{X}\subseteq\mathbb{R}^{n}$ , under Assumption 1. The dynamical system admits the transition system representation $S=(\mathcal{X},\underset{S}{\rightarrow})$ . Towards deriving a statistical quantification on the accuracy-size tradeoff of abstractions, we impose a probability distribution $p_{\xi_{0}}:\mathcal{X}\to\mathbb{R}_{+}$ on the system’s initial conditions. Verification, then, becomes: sampling an initial condition $\xi_{0}\in\mathcal{X}$ , with $\xi_{0}\sim p_{\xi_{0}}$ , and afterward employing the abstraction to give a verification answer.⁴⁴4To be mathematically precise, $\xi_{0}$ is a random variable over $(\mathcal{X},\Sigma_{\mathcal{X}},{\mathcal{L}}^{n}_{\mathcal{X}})$ , where ${\mathcal{L}}^{n}$ is the $n$ -dimensional Lebesgue measure, with probability measure $\mu_{\xi_{0}}$ such that $\frac{\mathrm{d}\mu_{\xi_{0}}}{\mathrm{d}{\mathcal{L}}^{n}_{\mathcal{X}}}=p_{\xi_{0}}$ .

Let us show how an abstraction $A$ can be viewed as an encoder-decoder pair of system trajectories $\xi\in{\mathcal{B}}_{l}^{S}$ . For the following, we refer the reader to Figure 2. The system (source) samples an initial condition $\xi_{0}\sim p_{\xi_{0}}$ and generates the trajectory $\xi=(\xi_{0},\dots,\xi_{l-1})\in{\mathcal{B}}_{l}^{S}$ (the message). The encoder $s_{A}:{\mathcal{B}}_{l}^{S}\to\mathcal{Y}$ looks at the initial condition $\xi_{0}$ and returns the corresponding abstract initial condition:⁵⁵5Abstractions only use (sets of) initial conditions, for verification, as explained in Section 3. Nonetheless, from an information-theoretic perspective, $\xi_{0}$ and $\xi$ are equivalent, as $\xi_{0}\mapsto\xi$ is one-to-one; that is, $\xi_{0}$ and $\xi$ carry the exact same information when the dynamics $f$ is known.

s_{A}(\xi):=Y,\ \text{s.t.}\ \xi_{0}\in Y

(5)

The decoder $g_{A}$ , upon receiving the initial condition $\omega_{A_{0}}=s_{A}(\xi)$ , outputs the set of all abstract state trajectories corresponding to $\omega_{A_{0}}$ . That is, for the decoder we have $g_{A}:\mathcal{Y}\to 2^{\mathcal{X}^{l}}$ with

g_{A}(y):=\bigcup\limits_{\omega\in{\mathcal{B}}_{l}^{A},\ \omega_{0}=y}\hskip-8.53581pt\omega.

(6)

The compression rate, determined by the encoder’s size, is $\log(|\mathcal{Y}|)$ . Indeed, notice that the abstraction encodes the system’s trajectories ${\mathcal{B}}_{l}^{S}$ into exactly $|\mathcal{Y}|$ outcomes, that is $\{g_{A}(z):\ z\in\mathcal{Y}\}$ .

To capture the accuracy of the abstraction, and compare the message $\xi$ and output $\Omega_{A}=g_{A}(s_{A}(\xi))$ , we employ a distortion function $d:{\mathcal{B}}_{l}^{S}\times 2^{\mathcal{X}^{l}}\to\mathbb{R}^{+}$ defined by

d(\xi,\Omega_{A}):=\sup_{\xi^{\prime}\in\Omega_{A}}\frac{1}{l}\|\xi-\xi^{\prime}\|^{2}.

(7)

In words, $d(\xi,\Omega_{A})$ returns the worst possible distortion between system trajectories $\xi$ and abstract trajectories $\Omega_{A}$ , averaged over the time horizon $l$ . This is in-line with abstraction-based verification, where the worst-case outcome is considered.

Let us now explain what “expected (or average) distortion”, for a given abstraction $A$ , means in the context of verification. The expected distortion $\mathbb{E}_{\xi_{0}}[d(\xi,\Omega_{A})\mid A]$ is taken w.r.t. the initial-condition distribution $p_{\xi_{0}}$ . Thus, for $N\to\infty$ verification problems, where the initial condition $\xi_{0}\sim p_{\xi_{0}}$ , $\mathbb{E}_{\xi_{0}}[d(\xi,\Omega_{A})\mid A]$ is the average distortion. As $d$ measures the distance between system trajectories and abstract state trajectories, the expected distortion $\mathbb{E}_{\xi_{0}}[d(\xi,\Omega_{A})\mid A]$ is thus the spatial, statistical average of the deviation between system trajectories and abstract state trajectories, over initial conditions in $\mathcal{X}$ with distribution $p_{\xi_{0}}$ .

Remark 1 (Initial-condition distribution).

The distribution $p_{\xi_{0}}$ weights how much each initial condition $\xi_{0}\in\mathcal{X}$ contributes to the average distortion $\mathbb{E}_{\xi_{0}}[d(\xi,\Omega_{A})\mid A]$ . Arguably, the most suitable choice for $p_{\xi_{0}}$ is the uniform distribution, as, when constructing an abstraction, the initial condition is unknown and all initial conditions are considered equally likely.

Finally, the optimal abstraction accuracy-size tradeoff is captured by the following rate-distortion quantity:

	$\displaystyle D_{abs}(R):=\inf_{A}$	$\displaystyle\ \mathbb{E}_{\xi_{0}}[d(\xi,\Omega_{A})\mid A]$
	$\displaystyle\mathrm{s.t.}$	$\displaystyle\ A\text{ is an abstraction of }S,$
		$\displaystyle\ \text{\eqref{eq:abstraction_encoder}, \eqref{eq:abstraction_decoder} hold},$
		$\displaystyle\ \log\|\mathcal{Y}\|\leq R,\ \Omega_{A}=g_{A}(s_{A}(\xi)).$

That is, the minimum average deviation of abstract state trajectories and system trajectories, over all possible abstractions with a given upper-bound $e^{R}$ on partition size. Likewise, we also consider the inverse $R_{abs}(D)$ , which is the ( $\log$ of the) minimum partition size for a given upper threshold $D$ on the average deviation of abstract state trajectories and system trajectories.

Remark 2 (Statistics of abstractions’ accuracy and size).

The proposed theory does not aim at providing (probabilistic) guarantees on the correctness of abstractions. These are a-priori provided by Definition 3.2, through behavioral inclusion or related properties. Instead, the theory developed here provides (guarantees on the) statistical quantification of abstractions’ accuracy and size.

Remark 3 (The message space is ${\mathcal{B}}_{l}^{S}$ ).

Even though we have reduced everything thus far to the initial condition distribution $p_{\xi_{0}}$ , the message space is ${\mathcal{B}}_{l}^{S}$ , i.e. the system trajectories. Indeed, although the expectation $\mathbb{E}[d(\xi,\Omega_{A})\mid A]$ can be taken either w.r.t. $\xi_{0}\sim p_{\xi_{0}}$ or w.r.t. the random variable $\xi\in{\mathcal{B}}_{l}^{S}$ (as $\xi_{0}\mapsto\xi$ is one-to-one), the distortion $d$ considers the whole $\xi\in{\mathcal{B}}_{l}^{S}$ . As such, in the coming section, to derive bounds on $D_{abs}(R)$ and $R_{abs}(D)$ , employing the theory presented in Section 2.2, we reason about the random variable $\xi\in{\mathcal{B}}_{l}^{S}$ and its associated probability measure $\mu_{\xi}$ over $({\mathcal{B}}_{l}^{S},\Sigma_{{\mathcal{B}}_{l}^{S}},\mathcal{H}^{n}_{{\mathcal{B}}_{l}^{S}})$ , which is solely determined by the initial condition distribution $p_{\xi_{0}}:\mathcal{X}\to\mathbb{R}_{+}$ and the system dynamics $f:\mathcal{X}\to\mathcal{X}$ . Hence, we take expectations $\mathbb{E}_{\xi}$ and $\mathbb{E}_{\xi_{0}}$ interchangeably.

5 Rate-distortion theory and a fundamental limit for abstractions

5.1 A fundamental limit on abstracting dynamical systems

Having modeled the statistics of abstraction-based verification as a source coding problem, we now proceed to probing the fundamental limits of the abstraction accuracy-size tradeoff, by providing lower bounds on $R_{abs}(D)$ and $D_{abs}(R)$ .

Note that abstractions, given the message, output sets and the associated distortion (7) is set-based. This is in contrast to typical encoder-decoder pairs considered in Thm. 2.1, which output points and the distortion function is the Euclidean distance. Thus, the results from Section 2.2 do not straightforwardly apply, to derive bounds on $D_{abs}$ and $R_{abs}$ . In what follows, we derive said bounds, both employing Thm. 2.1 and quantifying the aforementioned distortion disparity. This enables a rate-distortion theory for abstractions. First, we present an intermediate, purely geometric result, providing a lower bound on the average distortion of a given abstraction.

Proposition 5.1 (Abstraction vs. encoder distortion).

Consider a dynamical system $x^{+}=f(x)$ with transition system representation $S=(\mathcal{X},\underset{S}{\rightarrow})$ , and let Assumption 1 hold. Let $\xi\in{\mathcal{B}}_{l}^{S}$ be a trajectory of $S$ , with $\xi_{0}\sim p_{\xi_{0}}:\mathcal{X}\to\mathbb{R}_{+}$ . Consider a measurable partition $\mathcal{Y}$ of $\mathcal{X}$ and an associated abstraction $A$ , and let $\Omega_{A}=g_{A}(s_{A}(\xi))$ , where $s,g$ are given by (5) and (6). Consider an encoder-decoder pair $(s_{q_{A}},g_{q_{A}})$ , where $s_{q_{A}}(\xi)=g_{A}(s_{A}(\xi))$ and $g_{q_{A}}(z)=x_{c}(z)$ , where $x_{c}(z):=\operatorname*{arg\,min}_{y}\max_{y^{\prime}\in z}\|y-y^{\prime}\|^{2}$ is the Chebyshev center of the set $z$ . Denote the Chebyshev radius of set $z$ , by $r_{c}(z):=\min_{y}\max_{y^{\prime}\in z}\|y-y^{\prime}\|^{2}$ . Let $\xi_{q_{A}}=g_{q_{A}}(s_{q_{A}}(\xi))$ . The following lower bound holds for the average distortion of the abstraction:

\mathbb{E}_{\xi_{0}}[d(\xi,\Omega_{A})]\geq\frac{1}{l}\mathbb{E}_{\xi_{0}}[\|\xi-\xi_{q_{A}}\|^{2}]+\frac{1}{l}\mathbb{E}_{\xi_{0}}[r_{c}^{2}(\Omega_{A})],

(8)

where $d$ is the distortion function in (7).

Prop. 5.1 suggests that the average distortion of an abstraction is lower bounded by the expected distortion of a particular encoder-decoder pair (the one outputting the Chebyshev centers of the abstractions outputs) plus a term depending on the size of the abstraction’s outputs. Employing Prop. 5.1, in Theorem 5.2 below, we derive fundamental lower bounds on $D_{abs}(R)$ and $R_{abs}(D)$ , by lower-bounding each of the two terms in the right-hand side of (8) separately, over all abstractions with the same rate (or the same expected distortion). The first term in the right-hand side of (8) can be lower bounded as in Thm. 2.1, being the expected distortion of an encoder-decoder pair with the same rate as the abstraction. To bound the second term, we observe that the abstraction’s outputs $\Omega_{A}$ define an $nl$ -dimensional cover⁶⁶6This cover is precisely $\mathcal{Z}:=\{Z:Z=g_{A}(s_{A}(x_{0})),x_{0}\in\mathcal{X}\}$ and note that $s_{A}(x)$ takes values in the set $|\mathcal{Y}|$ . Thus $|\mathcal{Z}|=|\mathcal{Y}|$ . of ${\mathcal{B}}_{l}^{S}$ , and the cover’s size is equal to the abstraction’s size; the bound is then obtained by lower-bounding over all possible $nl$ -dimensional covers of ${\mathcal{B}}_{l}^{S}$ , using geometric measure theory (see Lemma 8.1). For an illustrative example of the above, see Fig. 3.

Theorem 5.2 (Shannon lower bound for abstractions).

1.

${\mathcal{B}}^{S}_{l}$ is a finite union of bounded, $n$ -dimensional $C^{1}$ -manifolds,
2.

$\mu_{\xi}\ll\mathcal{H}^{n}_{{\mathcal{B}}_{l}^{S}}$ , with $p_{\xi}\coloneqq\frac{\mathrm{d}\mu_{\xi}}{\mathrm{d}\mathcal{H}^{n}_{{\mathcal{B}}_{l}^{S}}}$ .

The average distortion of any abstraction $A$ with partition size $|\mathcal{Y}|\leq e^{R}$ , where $R>0$ , is lower bounded as follows

	$\displaystyle D_{abs}(R)\geq$	$\displaystyle\frac{n}{2l}\Big(\frac{e^{-R+h(\xi)-n/2}}{c_{{\mathcal{B}}_{l}^{S}}\Gamma(1+n/2)}\Big)^{2/n}$		(9)
		$\displaystyle+\frac{1}{l}c_{{\mathcal{B}}_{l}^{S}}^{-2/n}\max_{s\in(1,\infty]}e^{\frac{2}{n}(-\frac{s}{s-1}R+h_{s}(\xi))},$		(9)

where $c_{{\mathcal{B}}_{l}^{S}}$ is defined by (1).

Notice that a valid lower bound is obtained for any value of $s\in(1,\infty]$ in the right-hand side of (9); maximization over $s$ provides the tightest bound. In the numerical examples in Section 6, we compute the bound for multiple values of $s$ . Further, one may recover a lower bound on $R_{abs}(D)$ numerically, by fixing $D$ in the left-hand side of (9) and solving numerically for $R$ (as the right-hand side is a decreasing function of $R$ , this is trivially computed by, e.g., bisection methods). The above theorem, thus, provides fundamental limits on the accuracy-size tradeoff, or the scalability, of abstractions, for given dynamics $x^{+}=f(x)$ .

Remark 4 (On the assumptions of Thm. 5.2).

The first assumption of Thm. 5.2, requiring ${\mathcal{B}}^{S}_{l}$ to be a union of smooth manifolds, is satisfied whenever the dynamics $f$ is piecewise continuously-differentiable. The second assumption is satisfied whenever $f$ is piecewise continuous and the initial condition distribution $p_{\xi_{0}}$ is such that $\mu_{\xi_{0}}\ll{\mathcal{L}}^{n}$ , where ${\mathcal{L}}^{n}$ the Lebesgue measure.

In the coming section, we provide: a) closed-form expressions on $h(\xi)$ , $h_{s}(\xi)$ and $c_{{\mathcal{B}}_{l}^{S}}$ , for certain classes of dynamics $x^{+}=f(x)$ , and b) an interpretation on how the complexity of the dynamics and the time-horizon $l$ affect the fundamental lower bound (9) in Thm. 5.2.

5.2 Interpretation and calculus for Theorem 5.2

Before we proceed with the interpretation of Thm. 5.2, let us show how one may compute $h(\xi)$ , $h_{s}(\xi)$ and $c_{{\mathcal{B}}_{l}^{S}}$ , which are required to compute the lower bound (9). Let us define the function $b_{l}:\mathcal{X}\to{\mathcal{B}}_{l}^{S}$ by

b_{l}(x)\coloneqq\big[\,\begin{matrix}x^{\mkern-1.5mu\mathrm{T}}\!&f(x)^{\mkern-1.5mu\mathrm{T}}\!&f(f(x))^{\mkern-1.5mu\mathrm{T}}\!&\cdots&f^{(l-1)^{\mkern-1.5mu\mathrm{T}}\!}(x)\end{matrix}\,\big]^{\mkern-1.5mu\mathrm{T}}\!,

(10)

which maps an initial state into its $l$ -long trajectory.

Proposition 5.3 (Computing $h(\xi)$ and $h_{s}(\xi)$ ).

Consider a system $x^{+}=f(x)$ with $f:\mathcal{X}\to\mathcal{X}$ measurable and piecewise Lipschitz⁷⁷7That is, $\mathcal{X}$ is a countable union $\cup_{i}\mathcal{X}_{i}$ of Lebesgue-measurable sets such that the restriction of $f$ to each $\mathcal{X}_{i}$ is Lipschitz. This condition may be relaxed to $f$ approximately Lipschitz, see [31, Thm. 3.1.8, Sec. 3.2.1], which also implies approximate differentiability. and differentiable. Let Assumption 1 hold, and $\xi_{0}\sim p_{\xi_{0}}:\mathcal{X}\to\mathbb{R}_{+}$ . The following expressions hold

h(\xi)=h(\xi_{0})+\frac{1}{2}\int_{\mathcal{X}}p_{\xi_{0}}(x)\log\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))\mathrm{d}x,

(11)

h_{s}(\xi)=\frac{1}{1-s}\log\int_{\mathbb{R}^{n}}\frac{p_{\xi_{0}}(x)^{s}}{\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))^{\frac{s-1}{2}}}\,\mathrm{d}x,\quad s>1,

(12)

h_{\infty}(\xi)=\mathrm{ess}\sup_{\mathcal{X}}\Big(\frac{1}{2}\log\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))-\log p_{\xi_{0}}(x)\Big),

(13)

where $J_{b_{l}}$ denotes the Jacobian matrix of $b_{l}$ .

Proposition 5.4 (Computing $c_{{\mathcal{B}}_{l}^{S}}$ ).

Consider a system $x^{+}=f(x)$ , with $f:\mathcal{X}\to\mathcal{X}$ differentiable a.e., and let Assumption 1 hold. The following facts on $c_{{\mathcal{B}}_{l}^{S}}$ hold:

1.

$c_{{\mathcal{B}}_{l}^{S}}\leq v_{n}$ , if $f$ is affine;
2.

$c_{{\mathcal{B}}_{l}^{S}}\leq M^{l}v_{n}$ , if $f$ is piecewise affine with $M$ modes;
3.

$c_{{\mathcal{B}}_{l}^{S}}\leq v_{n}\big(\sum_{i=0}^{l-1}L^{2i}\big)^{n/2}$ , if $f$ is Lipschitz continuous with constant $L$ .

Remark 5 ( $c_{{\mathcal{B}}_{l}^{S}}$ at high rates).

As the partition size $|\mathcal{Y}|$ grows large, the Chebyshev balls of the abstraction outputs (c.f. Prop. 5.1 and Lemma 8.1) become small. Hence, in the case of smooth $f$ , their intersection with the manifold approaches the case of an affine system, with $c_{{\mathcal{B}}_{l}^{S}}\leq v_{n}.$ Similarly, in the piecewise affine case, for sufficiently small balls – at least $|\mathcal{Y}|\geq M^{l}$ –, these can be chosen to intersect with at most one piece each. Thus, to reduce conservatism of the bound in such high-rate cases, one can inspect the lower bound of Thm. 2.1 by using $c_{{\mathcal{B}}_{l}^{S}}=v_{n}.$ We demonstrate this in the numerical examples in Section 6.

We proceed to discussing Thm. 5.2. First, inspecting (9), systems with more complex dynamics lead to bigger abstraction distortion, for fixed abstraction size, since the right-hand side is increasing w.r.t. $h(\xi)$ and the Rényi entropy $h_{s}(\xi)$ ; equivalently, more complex systems require bigger abstraction size for the same distortion.

Regarding the effect of the time-horizon $l$ on the bound (9), we have to inspect the effect that $l$ has on $h(\xi)$ and $h_{s}(\xi)$ . Let us first demonstrate that, for the “simple” dynamics of exponentially stable systems, the abstraction distortion converges to 0 for $l\to\infty$ .

Example 5.1 (Exponentially stable systems).

Consider a system $x^{+}=f(x)$ whose origin is exponentially stable on a given compact set in $\mathbb{R}^{n}$ . Then, there is a Lyapunov function $V:\mathbb{R}^{n}\mapsto\mathbb{R}_{+}$ satisfying $V(x)\geq\frac{1}{r}\|x\|^{2}$ for a given $r>0$ and, for all $x$ s.t. $V(x)\leq 1$ (w.l.o.g.), $V(f(x))\leq aV(x),$ with $a\in[0,1).$ This allows us to create an abstraction $A$ with the associated partition $Y_{i}=\{x\in\mathbb{R}^{n}\mid a^{i}<V(x)\leq a^{i-1}\}$ for $i=1,...,N,$ and $Y_{N+1}=\{x\in\mathbb{R}^{n}\mid V(x)\leq a^{N}\}$ ; and transitions $Y_{i}\xrightarrow[A]{}Y_{j}$ if and only if $j<i$ or $j=i=N+1$ . The abstraction encapsulates the fact that, after $N$ steps or less, all trajectories reach the sublevel set $V(x)\leq a^{N}$ . We get that $2r$ and $2a^{n}r$ are overapproximations of the diameters of $Y_{i},~~i\leq N,$ and $Y_{N+1},$ respectively. Then, recalling the distortion (7), for any trajectory $\xi$ of the system, $l>N,$

d(\xi,\Omega_{A})\leq\frac{1}{l}(2Nr^{2}+2(l-N)a^{2N}r^{2})\underset{l\to\infty}{=}2a^{2N}r^{2},

which can be made arbitrarily small by suitable choice of $N$ .

Indeed, as the following example shows, for Schur LTI systems, the bound (9) converges to 0, for $l\to\infty$ , which demonstrates the bound’s tightness.

Example 5.2 (Schur LTI systems).

For a Schur LTI system, we have $c_{{\mathcal{B}}^{S}_{l}}\leq v_{n}$ and

	$\displaystyle J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x)$	$\displaystyle=\sum_{i=0}^{l-1}(A^{\mkern-1.5mu\mathrm{T}}\!A)^{i}$
		$\displaystyle=(I-A^{\mkern-1.5mu\mathrm{T}}\!A)^{-1}(I-(A^{\mkern-1.5mu\mathrm{T}}\!A)^{l})\underset{l\to\infty}{=}(I-A^{\mkern-1.5mu\mathrm{T}}\!A)^{-1}.$

Thus, both $h(\xi)$ and $h_{s}(\xi)$ are finite, for $l\to\infty$ , and the bound (9) converges to 0.

Conversely, the example below shows that, even for marginally stable systems, the bound may not vanish with $l\to\infty$ .

Example 5.3 (Marginally stable LTI system).

For the simple system $x^{+}=x$ , we have that $\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))=\det(lI)=l^{n}$ and, by Prop. 5.3, $h(\xi)=h(x_{0})+\frac{n\log l}{2}$ and

	$\displaystyle h_{s}(\xi)$	$\displaystyle=\frac{1}{1-s}\log\Big(\frac{1}{l^{n(s-1)/2}}\int_{\mathbb{R}^{n}}p_{0}^{s}\,\mathrm{d}x\Big)$
		$\displaystyle=-\frac{n(s-1)\log l}{2(1-s)}+\frac{1}{1-s}\log\int_{\mathbb{R}^{n}}p_{0}^{s}\,\mathrm{d}x$
		$\displaystyle=\frac{n\log l}{2}+h_{s}(\xi_{0}).$

Replaced in the distortion bound (9), the $l$ in the denominator is canceled out, indicating a positive lower bound for any $l$ . This independence on $l$ is expected, as abstracting $x^{+}=x$ is the same as encoding the initial condition $\xi_{0}$ .

6 Numerical Examples

6.1 The doubling map

Consider the doubling map from Example 2.1. For any trajectory length $l$ , its behavior ${\mathcal{B}}_{l}^{S}$ is composed by $2^{l-1}$ line segments in $\mathbb{R}^{l}$ described by $(x_{0},2x_{0},4x_{0},...,2^{l-1}x_{0})\bmod 1,$ uniformly distributed with $p_{\xi}(\xi)=1/\sqrt{1+4+...+4^{l-1}}=\sqrt{3/(4^{l}-1)},$ giving $h(\xi)=h_{s}(\xi)=\frac{1}{2}(\log(4^{l}-1)-\log{3})$ for all $s\in(1,\infty].$ Using Prop. 5.4 for piecewise affine systems, we obtain $c_{{\mathcal{B}}_{l}^{S}}\leq\sqrt{(4^{l}-1)/3}v_{n},$ enabling us to compute the lower bound in Theorem 5.2. In light of Remark 5, we also determine the high-rate lower bound by picking $c_{{\mathcal{B}}_{l}^{S}}=v_{1}=2.$ The lower-bound curves can be seen in Fig. 4 for $s=2$ and $s=\infty$ . It is apparent that the tightest bound is obtained with $s=\infty$ and $c_{{\mathcal{B}}_{l}^{S}}=v_{1}.$ In this case, the bound is consistently half of that of the optimal distortion $D_{abs}(R)$ , which is remarkably close. As a comparison, the standard Shannon lower bound is 1.42 times smaller than the optimal quantizer distortion of a uniform random variable in $\mathbb{R}^{1}$ , in the standard source-coding setting.

Let us explain how we were able to compute the actual optimal achievable abstraction distortion. Following the reasoning in Section 5, we first build an optimal cover for ${\mathcal{B}}_{l}^{S}$ (afterwards, we show that this optimal cover admits a distortion that is equal to that of a specific abstraction with the same rate, and thus its rate-distortion curve is optimal, among all abstractions). For a given $l$ , consider $R=\log(k2^{l-1})$ , where $k$ is an arbitrary natural number. Since all segments are equiprobable and congruent, and probability is uniform among them, the optimal partition of ${\mathcal{B}}_{l}^{S}$ is obtained by cutting each of the $2^{l-1}$ segments in $k$ equal pieces. The expected error between a trajectory $\xi$ and the Chebyshev center of its corresponding piece is $\mathbb{E}[\|\xi-\xi_{q_{A}}\|^{2}]=\frac{1-4^{-l}}{9k^{2}},$ which is obtained by computing the squared length of each segment, $L^{2}=1/(2^{l-1}k)^{2}(1+2^{2}+...+(2^{l-1})^{2})=(4^{l}-1)/(3k^{2}4^{l-1})=4(1-4^{-l})/3k^{2},$ followed by using the variance of the uniform distribution, giving $L^{2}/12$ . To determine the corresponding abstraction lower bound, we use the distortion $d$ in (7) on the aforementioned pieces. For one dimensional line segments, $\max_{\xi^{\prime}\in\Omega_{A}(\xi)}\|\xi-\xi^{\prime}\|^{2}=(\|\xi-\xi_{q_{A}}\|+L/2)^{2}$ . Its expected value is thus the second moment of a uniform from $L/2$ to $L$ . Using $\mathbb{E}[X^{2}]=\mathrm{Var}(X)+\mathbb{E}[X]^{2}=L^{2}/48+9L^{2}/16=7L^{2}/12,$ we obtain $D_{cover}(R)=\frac{7}{l}4^{l-2}(4^{l}-1)e^{-2R},$ where we used $R=\log(k2^{l-1})$ , and $D_{cover}(R)$ is the optimal distortion among all $l$ -dimensional covers of ${\mathcal{B}}_{l}^{S}$ .

Finally, we show that

D_{abs}(R)=D_{cover}(R)=\frac{7}{l}4^{l-2}(4^{l}-1)e^{-2R}.

Notice that, in general, $D_{abs}(R)\geq D_{cover}(R)$ , as abstractions are covers. However, the optimal cover built above determines an abstraction $A$ that gives the same distortion, and thus we have $D_{abs}(R)=D_{cover}(R)$ . First, $\mathcal{Y}$ is the uniform grid with segments of length $1/k2^{l-1}$ . Each trajectory of the abstraction is a sequence of segments of lengths $1/k2^{l-1},1/k2^{l-2},...,1/k$ , thus giving a box in $\mathbb{R}^{l}$ containing any related trajectory $\xi$ . For each box, the set of related trajectories is precisely a diagonal of the box. As such, the furthest edge along the diagonal is again a solution to $\arg\sup_{\xi^{\prime}\in\Omega_{A}(\xi)}\|\xi-\xi^{\prime}\|^{2}$ . hence, the abstraction has the same distortion as the optimal cover. The above reasoning is illustrated in Fig. 5

6.2 A 3D nonlinear system and abstractions with uniform grids

Consider the nonlinear system $f:\mathbb{R}^{3}\to\mathbb{R}^{3}$ where

f(x)=\begin{bmatrix}0.9x_{1}+0.1\sin{x_{2}}\\ 2x_{2}^{3}-x_{2}\\ 0.9x_{3}+0.1x_{1}x_{2},\end{bmatrix}

and $\mathcal{X}=[-1,1]^{3},$ which is forward invariant under $f$ . This system has multiple equilibria, hence the origin is not stable in $\mathcal{X}$ . For each $N$ in $\{10,20,50,100\},$ we build abstractions $A_{N}$ by using uniform partitioning of $\mathcal{X}$ with grids of size $N\times N\times N$ and determining the transition map using interval arithmetic. Then, we compute the distortion lower bound from Theorem 5.2 using Prop. 5.3 and Prop. 5.4, case 3.⁸⁸8The entropies were computed using Monte-Carlo integration with 10000 samples, while Jacobians and the Lipschitz constant were determined using automatic differentiation. Furthermore, lower bounds were also computed by picking $c_{{\mathcal{B}}_{l}^{S}}=v_{3}$ , in light of Remark 5. The resulting distortion lower bound curves can be seen in Fig. 6. In this case, as the abstraction we construct is not necessarily the optimal one, its expected distortion is generally 100x higher than the fundamental lower bound. Still, this demonstrates the validity of the lower bound, even in cases with nonlinear dynamics; even more importantly, it indicates how conservative standard abstractions with uniform grids might be.

7 Conclusion and Future Research:
Towards Minimal Abstractions

We have developed a statistical, quantitative theory on the accuracy-size tradeoff of finite abstractions of dynamical systems. Through this theory, we have uncovered fundamental limits on their scalability: given the system dynamics, we have obtained a fundamental bound on the achievable abstraction accuracy, for a given abstraction size. To that end, we have established connections with rate-distortion theory. From an information-theoretic perspective, we have developed rate-distortion theory for the particular class of encoder-decoder pairs that abstractions constitute: set-based, with set-based distortion. Overall, this novel theory quantifies scalability limits of abstractions, and provides insights on how the complexity of the dynamics to be abstracted dictates these limits.

Most importantly, the developed theory may be employed to construct minimal abstractions, harnessing their full scalability potential. From this work, it becomes clear that, to construct minimal abstractions, one has to solve the problem of encoding trajectories of dynamical systems, through coverings in a high-dimensional, ambient space. In fact, this has already been demonstrated, in Section 6.1, where we construct a minimal abstraction of the doubling-map dynamics. Future research will thus focus on the general problem of constructing minimal abstractions. Towards that goal, information-theoretic algorithms optimizing the rate-distortion tradeoff, such as the information bottleneck method (see [32]), could be adapted for abstractions.

8 Technical Results and Proofs

Proof of Prop. 5.1.

For any given $\xi\in{\mathcal{B}}_{l}^{S}$ , we will prove that

d(\xi,\Omega_{A})\geq\frac{1}{l}\|\xi-\xi_{q_{A}}\|^{2}+\frac{1}{l}r_{c}^{2}(\Omega_{A}),

where note that $\xi\in\Omega_{A}$ and $\xi_{q_{A}}=x_{c}(\Omega_{A})$ . Then, the proof is complete by applying the expectation operator to the above inequality.

Define $w(x^{\prime})=\max_{y\in\Omega_{A}}\|y-x^{\prime}\|^{2}$ . The function $w$ is convex, being the pointwise maximum of the convex quadratic maps $x^{\prime}\mapsto\|y-x^{\prime}\|^{2}$ . We have $x_{c}(\Omega_{A})=\arg\min_{x^{\prime}\in\mathbb{R}^{nl}}w(x^{\prime})$ and $r_{c}^{2}(\Omega_{A})=w(x_{c}(\Omega_{A}))=\max_{y\in\Omega_{A}}\|y-x_{c}(\Omega_{A})\|^{2}$ .

Define the set of maximizers

M:=\{y\in\Omega_{A}:\|y-x_{c}(\Omega_{A})\|=r_{c}(\Omega_{A})\}.

The subdifferential of $w$ at $x^{\prime}$ is $\partial w(x^{\prime})=\operatorname{conv}\{x^{\prime}-y:\ y\in M\}$ , where $\operatorname{conv}$ denotes the convex hull operator. Since $x_{c}(\Omega_{A})$ minimizes $w$ , the optimality condition $0\in\partial w(x_{c}(\Omega_{A}))$ gives $0\in\operatorname{conv}\{\,x_{c}(\Omega_{A})-y:y\in M\,\}$ . Hence there exist finitely many points $y_{1},\dots,y_{m}\in M$ and coefficients $\lambda_{i}\geq 0$ , $\sum_{i}\lambda_{i}=1$ , such that

\sum_{i=1}^{m}\lambda_{i}(y_{i}-x_{c}(\Omega_{A}))=0.

(14)

Now, fix $\xi\in\mathcal{X}^{l}$ and let $\xi_{*}\in\arg\max_{y\in x_{\mathrm{abs}}}\|y-\xi\|^{2}$ . By definition of $\xi_{*}$ , for every $y_{i}\in M$ we have $\|y_{i}-\xi\|^{2}\leq\|\xi_{*}-\xi\|^{2}$ . Taking the convex combination with the $\lambda_{i}$ and expanding gives

\sum_{i=1}^{m}\lambda_{i}\big(\|y_{i}-\xi\|^{2}-\|\xi_{*}-\xi\|^{2}\big)\leq 0.

Since $\|y_{i}-\xi\|^{2}=\|y_{i}-x_{c}(\Omega_{A})\|^{2}+\|x_{c}(\Omega_{A})-x\|^{2}+\allowbreak 2(y_{i}-x_{c}(\Omega_{A}))^{\mkern-1.5mu\mathrm{T}}\!(x_{c}(\Omega_{A})-\xi)$ and $\|y_{i}-x_{c}(\Omega_{A})\|^{2}=r_{c}(\Omega_{A})^{2}$ , for the above inequality we have

r_{c}(\Omega_{A})^{2}+\|x_{c}(\Omega_{A})-\xi\|^{2}-\|\xi_{*}-\xi\|^{2}\leq 0,

where, using (14), the cross term has vanished. Finally, using (7),

r_{c}(x_{\mathrm{abs}})^{2}+\|x_{q_{A}}-x\|^{2}\leq\|x_{*}-x\|^{2}=l\cdot d(x,x_{A}).

∎

Towards proving Thm. 5.2, we introduce the following lemma.

Lemma 8.1.

Let $M\subset\mathbb{R}^{n}$ be a finite union of bounded, disjoint, $m$ -dimensional $C^{1}$ -manifolds. Let $X$ be a random variable in $M$ with probability measure $\mu_{X}\ll\mathcal{H}^{m}_{M}$ and density $p=\frac{\mathrm{d}\mu_{x}}{\mathrm{d}\mathcal{H}^{m}_{M}}$ . Then, for any collection $\mathcal{Y}\coloneqq\{Y_{i}\}_{i=1}^{N}$ of $N$ measurable, $n$ -dimensional sets $Y_{i}\subseteq\mathbb{R}^{n}$ covering $M$ , the following holds for any $s\in(1,\infty]$ :

	$\displaystyle\inf_{\mathcal{Y}}\mathbb{E}_{X}\Big[\sum_{i=1}^{N}\mathbf{1}_{Y_{i}}(X)\,r_{c}(Y_{i})^{2}\Big]\geq$			(15)
	$\displaystyle c_{M}^{-2/m}\max_{s\in(1,\infty]}e^{\frac{2}{m}h_{s}(X)}\,N^{-\frac{2}{m(1-1/s)}}$	,		(15)

where $c_{M}$ is defined by (1), $\mathbf{1}_{Y_{i}}(\cdot)$ is the indicator function of set $Y_{i}$ , $r_{c}(Y_{i})$ denotes the Chebyshev radius of $Y_{i}$ , and $v_{m}=\frac{\pi^{m/2}}{\Gamma(m/2+1)}$ is the volume of the unit ball in $\mathbb{R}^{m}$ .

Proof.

Define $S_{i}:=Y_{i}\cap M\subset M$ . Then $\{S_{i}\}_{i=1}^{N}$ forms a measurable $m$ -dimensional cover of $M$ . Let $p_{i}:=\mu_{X}(S_{i})=\int_{S_{i}}pd\mathcal{H}^{m}$ and $r_{i}:=r_{c}(S_{i})$ . Because $S_{i}\subset Y_{i},$ then $r_{i}\leq r_{c}(Y_{i})$ , giving

\mathbb{E}_{X}\Big[\sum_{i=1}^{N}\mathbf{1}_{Y_{i}}(X)r_{c}(Y_{i})^{2}\Big]=\sum_{i=1}^{N}p_{i}r_{c}(Y_{i})^{2}\geq\sum_{i=1}^{N}p_{i}r_{i}^{2}.

Hence it suffices to lower bound $\sum_{i=1}^{N}p_{i}r_{i}^{2}$ over collections $\mathcal{Y}$ .

For a given $i$ , by definition, $S_{i}\subset B(c_{i},r_{i})\cap M$ for some Chebyshev center $c_{i}$ . For any $s>1$ , we have:

	$\displaystyle p_{i}$	$\displaystyle\leq\int_{B(c_{i},r_{i})\cap M}p\,d\mathcal{H}^{m}$
		$\displaystyle=\int_{M}p\,\mathbf{1}_{B(c_{i},r_{i})\cap M}\,d\mathcal{H}^{m}$
		$\displaystyle\leq\Big(\int_{M}p^{s}d\mathcal{H}^{m}\Big)^{1/s}\Big(\int_{M}(\mathbf{1}_{B(c_{i},r_{i})\cap M})^{\frac{s}{s-1}}\,d\mathcal{H}^{m}\Big)^{1-1/s}$
		$\displaystyle\leq\\|p\\|_{L^{s}(M)}\,(\mathcal{H}^{m}(B(c_{i},r_{i})\cap M))^{1-1/s}$
		$\displaystyle\leq\\|p\\|_{L^{s}(M)}\,(c_{M}\,r_{i}^{m})^{1-1/s},$

where $\|p\|_{L^{s}(M)}\coloneqq\Big(\int_{M}p^{s}d\mathcal{H}^{m}\Big)^{1/s},$ in the third step we used Hölder’s inequality, and in the final step we used the inequality (1). Defining $K_{s}\coloneqq\|p\|_{L^{s}(M)}\,c_{M}^{1-1/s}$ , from the inequality above we have:

r_{i}\geq\left(\frac{p_{i}}{K_{s}}\right)^{\frac{1}{m(1-1/s)}}\implies r_{i}^{2}\geq\left(\frac{p_{i}}{K_{s}}\right)^{\alpha},

where $\alpha:=2/(m(1-1/s))=2s/(m(s-1))$ . Multiplying by $p_{i}$ gives

p_{i}r_{i}^{2}\geq K_{s}^{-\alpha}\,p_{i}^{1+\alpha}\implies\sum_{i=1}^{N}p_{i}r_{i}^{2}\geq K_{s}^{-\alpha}\sum_{i=1}^{N}p_{i}^{1+\alpha}.

(16)

Our job now is to find a lower bound to $\sum_{i=1}^{N}p_{i}^{1+\alpha}$ over discrete probabilities $p_{i}$ . First, notice that $\alpha>0$ since $s>1$ . Therefore, the map $t\mapsto t^{1+\alpha}$ is convex in $t\in[0,+\infty)$ . Thus, by Jensen’s inequality,

\sum_{i=1}^{N}p_{i}^{1+\alpha}\geq N\bigg(\frac{1}{N}\sum_{i=1}^{N}p_{i}\bigg)^{1+\alpha}=N\left(\frac{1}{N}\right)^{1+\alpha}=N^{-\alpha}.

Substituting in (16) gives

\sum_{i=1}^{N}p_{i}r_{i}^{2}\geq(K_{s}N)^{-\alpha}=\Big(\int_{M}p^{s}d\mathcal{H}^{m}\Big)^{-\alpha/s}N^{-\alpha}\,c_{M}^{-2/m}.

Now, by definition of the Rényi entropy,

h_{s}(x)=\frac{1}{1-s}\log\mathbb{E}\Big[p(x)^{s-1}\Big]=\frac{1}{1-s}\log\int\Big(\frac{\mathrm{d}\mu_{x}}{\mathrm{d}\mathcal{H}_{\mathcal{X}}^{m}}\Big)^{s-1}\mathrm{d}\mu_{x},

which by the properties of the Radon–Nikodym derivative gives

	$\displaystyle h_{s}(x)$	$\displaystyle=\frac{1}{1-s}\log\int\Big(\frac{\mathrm{d}\mu_{x}}{\mathrm{d}\mathcal{H}_{\mathcal{X}}^{m}}\Big)^{s}\mathrm{d}\mathcal{H}_{\mathcal{X}}^{m}$
		$\displaystyle=\frac{1}{1-s}\frac{s}{\alpha}\frac{\alpha}{s}\log\int p^{s}\mathrm{d}\mathcal{H}_{\mathcal{X}}^{m}$
		$\displaystyle=-\frac{s}{\alpha(1-s)}\log\Big(\int p^{s}\mathrm{d}\mathcal{H}_{\mathcal{X}}^{m}\Big)^{-\alpha/s}.$

And, using $\alpha(1-s)=-2s/m$ gives

		$\displaystyle h_{s}(x)=\frac{m}{2}\log\Big(\int p^{s}\mathrm{d}\mathcal{H}_{\mathcal{X}}^{m}\Big)^{-\alpha/s}$
		$\displaystyle\iff\Big(\int p^{s}\mathrm{d}\mathcal{H}_{\mathcal{X}}^{m}\Big)^{-\alpha/s}=\mathrm{e}^{\frac{2}{m}h_{s}(x)}.$

Therefore, for any $s>1$ , we have

\inf_{\mathcal{Y}}\mathbb{E}_{X}\Big[\sum_{i=1}^{N}\mathbf{1}_{Y_{i}}(X)\,r_{c}(Y_{i})^{2}\Big]\geq c_{M}^{-2/m}\,e^{\frac{2}{m}h_{s}(X)}\,N^{-\frac{2}{m(1-1/s)}}.

∎

We proceed with the proof of Thm. 5.2.

Proof of Thm. 5.2.

We make use of Prop. 5.1. Take (8) and minimize both sides over all possible partitions $\mathcal{Y}$ with size $|\mathcal{Y}|\leq e^{R}$ and associated abstractions $A$ . We have

D_{abs}(R)\geq\inf_{A,|\mathcal{Y}|\leq e^{R}}\frac{1}{l}\mathbb{E}_{\xi}[\|\xi-\xi_{q_{A}}\|^{2}]+\frac{1}{l}\mathbb{E}_{\xi}[r_{c}^{2}(\Omega_{A})],

where recall that $\mathbb{E}_{\xi_{0}}[\cdot]=\mathbb{E}_{\xi}[\cdot]$ , and that for a given abstraction $A$ with corresponding encoder-decoder pair $s_{A},g_{A}$ , we have $\xi_{q_{A}}=g_{q_{A}}(s_{q_{A}}(\xi))$ with $s_{q_{A}}(\xi)=g_{A}(s_{A}(\xi))$ and $g_{q_{A}}(z)=x_{c}(z)$ , where $x_{c}(z)$ is the Chebyshev center of the set $z$ ; and $r_{c}(\Omega_{A})$ is the Chebyshev radius of $\Omega_{A}$ . Thus, $\xi_{q_{A}}$ is the output of the encoder-decoder pair $(s_{q_{A}},g_{q_{A}})$ with rate $R$ and message $\xi$ . Hence, the first term in the left-hand side of the above inequality, can be lower bounded by employing Thm. 2.1, to obtain:

D_{abs}(R)\geq\frac{n}{2l}\Big(\frac{e^{-R+h(\xi)-n/2}}{c_{{\mathcal{B}}_{l}^{S}}\Gamma(1+n/2)}\Big)^{2/n}+\frac{1}{l}\inf_{A,|\mathcal{Y}|\leq e^{R}}\mathbb{E}_{\xi}[r_{c}^{2}(\Omega_{A})].

To bound the second term, we employ Lemma 8.1. Notice that the abstraction’s outputs $\Omega_{A}$ are $nl$ -dimensional and define a cover⁹⁹9This cover is precisely $\mathcal{Z}:=\{Z:Z=g_{A}(s_{A}(x_{0})),x_{0}\in\mathcal{X}\}$ and note that $s_{A}(x)$ takes values in the set $|\mathcal{Y}|$ . Thus $|\mathcal{Z}|=|\mathcal{Y}|$ . of ${\mathcal{B}}_{l}^{S}$ (which is $n$ -dimensional) with cardinality $|\mathcal{Y}|\leq e^{R}$ (the same as the state-space partition). Thus, the term $\inf_{A,|\mathcal{Y}|\leq e^{R}}\mathbb{E}[r_{c}^{2}(\Omega_{A})]$ can be lower bounded as in (15), where we replace $m$ by $n$ , $M$ by ${\mathcal{B}}_{l}^{S}$ , $N$ by $e^{R}$ , and $\mu_{X}$ by $\mu_{\xi}$ . ∎

Proof of Prop. 5.3.

Fix any measurable subset ${\mathcal{A}}\subseteq\mathcal{X}$ . Because $\mu_{\xi_{0}}({\mathcal{A}})=\mu_{\xi}(b_{l}({\mathcal{A}})),$ the definitions of $p_{\xi}$ and $p_{\xi_{0}}$ imply that

\int_{b_{l}(A)}p_{\xi}(y)\mathrm{d}\mathcal{H}^{n}_{{\mathcal{B}}^{S}_{l}}(y)=\int_{{\mathcal{A}}}p_{\xi_{0}}(x)\mathrm{d}{\mathcal{L}}^{n}(x).

But also, since $b_{l}$ is injective, the area formula [31, Thm. 3.2.5] gives

\int_{b_{l}({\mathcal{A}})}\hskip-5.69054ptp_{\xi}(y)\mathrm{d}\mathcal{H}^{n}_{{\mathcal{B}}^{S}_{l}}(y)=\int_{\mathcal{A}}p_{\xi}(b_{l}(x))\sqrt{\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))}\mathrm{d}{\mathcal{L}}^{n}

implying that, for almost all $x\in\mathcal{X},$

p_{\xi}(b_{l}(x))=\frac{p_{\xi_{0}}(x)}{\sqrt{\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))}}.

(17)

Then, (2) becomes

	$\displaystyle h(\xi)=$	$\displaystyle-\int_{\mathbb{R}^{n}}p_{\xi_{0}}(x)\log(p_{\xi_{0}}(x))\mathrm{d}{\mathcal{L}}^{n}$
		$\displaystyle+\frac{1}{2}\int_{\mathbb{R}^{n}}\log\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))p_{\xi_{0}}(x)\mathrm{d}{\mathcal{L}}^{n}.$

Likewise, the area formula gives

	$\displaystyle h_{s}(\xi)=$	$\displaystyle\frac{1}{1-s}\log\int_{{\mathcal{B}}_{l}^{S}}p_{\xi}^{s}\,\mathrm{d}\mathcal{H}^{n}$
	$\displaystyle=$	$\displaystyle\frac{1}{1-s}\log\int_{\mathbb{R}^{n}}p_{\xi}(b_{l}(x))^{s}\,\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))^{\frac{1}{2}}\mathrm{d}{\mathcal{L}}^{n},$

and, applying (17) gives (12). Finally, in the particular case of $s=\infty,$ we have

	$\displaystyle h_{s}(\xi)$	$\displaystyle=\frac{s}{1-s}\log\Big(\int_{\mathbb{R}^{n}}\frac{p_{\xi_{0}}(x)^{s}}{\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))^{\frac{s-1}{2}}}\,\mathrm{d}x\Big)^{1/s}$
		$\displaystyle\underset{s\to\infty}{=}\log\mathrm{ess}\sup\frac{\sqrt{\det(J_{b_{l}}(x)^{\mkern-1.5mu\mathrm{T}}\!J_{b_{l}}(x))}}{p_{\xi_{0}}(x)},$

which gives the desired result by exploiting the fact that $\log$ is monotonically increasing.

∎

Lemma 8.2.

Let $X\subset\mathbb{R}^{n},$ and $f:X\to\mathbb{R}^{N}$ , $N\geq n,$ be a bi-Lipschitz function satisfying

\|x-x^{\prime}\|\leq\|f(x)-f(x^{\prime})\|\leq L\|x-x^{\prime}\|,\quad\forall x,x^{\prime}\in X,

for some $L\geq 1$ . Then for every $y\in\mathbb{R}^{N}$ and $\delta>0$ ,

\mathcal{H}^{n}(f(X)\cap B(y,\delta))\leq L^{n}v_{n}\delta^{n}.

Proof.

Fix $y\in\mathbb{R}^{N}$ and $\delta>0$ and define $Z\coloneqq f(X)\cap B(y,\delta)$ and its pre-image $E\coloneqq f^{-1}(Z)\subset\mathbb{R}^{n}$ . We start by finding a ball in $\mathbb{R}^{n}$ bounding $E$ .

For any $x_{1},x_{2}\in E$ , we have $f(x_{1}),f(x_{2})\in B(y,\delta)$ , so

\|f(x_{1})-f(x_{2})\|\leq\|f(x_{1})-y\|+\|f(x_{2})-y\|<2\delta.

By the lower Lipschitz bound $\|x_{1}-x_{2}\|\leq\|f(x_{1})-f(x_{2})\|$ , it follows that $\|x_{1}-x_{2}\|\leq 2\delta.$ This implies that $E$ is contained in some $n$ -dimensional ball of radius $\delta$ . Therefore, $\mathcal{H}^{n}(E)\leq v_{n}\delta^{n}.$

Since $f$ is $L$ -Lipschitz, by fundamental properties of the Hausdorff measure [31, Sec. 2.10.11]

\mathcal{H}^{n}(Z)=\mathcal{H}^{n}(f(E))\leq L^{n}\mathcal{H}^{n}(E)\leq L^{n}v_{n}\delta^{n}.

∎

Proof of Prop. 5.4.

We again use the function $b_{l}:\mathcal{X}\to{\mathcal{B}}_{l}^{S}$ , defined by (10). Since by assumption $\mathcal{X}$ is full dimensional in $\mathbb{R}^{n}$ , the tightest value for $c_{\mathcal{X}}$ is $c_{\mathbb{R}^{n}}=v_{n}.$ Now we look at each case.

Case (1) follows trivially by the observation that ${\mathcal{B}}_{l}^{S}$ is an $n$ -dimensional affine subset of $\mathbb{R}^{nl}$ , and that the intersection of an $nl$ -ball of radius $r$ and a plane of dimension $n$ is a ball of dimension $n$ and radius $\leq r.$ Hence, $\mathcal{H}_{{\mathcal{B}}_{l}^{S}}(B(z,\delta))\leq v_{n}\delta^{n}$ , for all $z\in\mathbb{R}^{nl}$ .

Case (2): If $f$ is piecewise affine, so is ${\mathcal{B}}_{l}^{S}$ , which has at most $M^{l}$ disjoint pieces. Denote by $Z_{i}$ each such piece of ${\mathcal{B}}_{l}^{S}$ , which is a bounded, connected $n$ -dimensional subset of some affine subspace of $\mathbb{R}^{nl}.$ Thus, ${\mathcal{B}}_{l}^{S}=\bigcup_{i=1}^{N}Z_{i}$ , with $N\leq M^{l}$ . Then, for all $z\in\mathbb{R}^{nl}$ and $\delta>0,$

\mathcal{H}^{n}\Big(\bigcup_{i}Z_{i}\cap B(z,\delta)\Big)=\sum_{i=1}^{N}\mathcal{H}^{n}(Z_{i}\cap B(z,\delta))\leq M^{l}v_{n}\delta^{n},

where in the last inequality we have used case (1) and the fact that $N\leq M^{l}$ .

Case (3): It is easy to see that $b_{l}$ is bi-Lipschitz with

\|x-y\|\leq\|b_{l}(x)-b_{l}(y)\|\leq\Big(\sum_{i=0}^{l}L^{2i}\Big)^{1/2}\|x-y\|.

Hence the result comes from applying Lemma 8.2. ∎

References

[1] P. Tabuada, Verification and control of hybrid systems: a symbolic approach. Springer Science & Business Media, 2009.
[2] A. Lavaei, S. Soudjani, A. Abate, and M. Zamani, “Automated verification and synthesis of stochastic hybrid systems: A survey,” Automatica, vol. 146, p. 110617, 2022.
[3] A. Girard, G. Pola, and P. Tabuada, “Approximately bisimilar symbolic models for incrementally stable switched systems,” IEEE Transactions on Automatic Control, vol. 55, no. 1, pp. 116–126, 2009.
[4] M. Rungger and M. Zamani, “Scots: A tool for the synthesis of symbolic controllers,” in Proceedings of the 19th international conference on hybrid systems: Computation and control, 2016, pp. 99–104.
[5] K. Mallik, A.-K. Schmuck, S. Soudjani, and R. Majumdar, “Compositional synthesis of finite-state abstractions,” IEEE Transactions on Automatic Control, vol. 64, no. 6, pp. 2629–2636, 2018.
[6] M. Zamani, P. M. Esfahani, R. Majumdar, A. Abate, and J. Lygeros, “Symbolic control of stochastic systems via approximately bisimilar finite abstractions,” IEEE Transactions on Automatic Control, vol. 59, no. 12, pp. 3135–3150, 2014.
[7] M. Lahijanian, S. B. Andersson, and C. Belta, “Formal verification and synthesis for discrete-time stochastic systems,” IEEE Transactions on Automatic Control, vol. 60, no. 8, pp. 2031–2045, 2015.
[8] A. Abate, J.-P. Katoen, J. Lygeros, and M. Prandini, “Approximate model checking of stochastic hybrid systems,” European Journal of Control, vol. 16, no. 6, pp. 624–641, 2010.
[9] R. Coppola, A. Peruffo, and M. Mazo, “Data-driven abstractions for verification of linear systems,” IEEE Control Systems Letters, vol. 7, pp. 2737–2742, 2023.
[10] T. Badings, L. Romao, A. Abate, D. Parker, H. A. Poonawala, M. Stoelinga, and N. Jansen, “Robust control for dynamical systems with non-gaussian noise via formal abstractions,” Journal of Artificial Intelligence Research, vol. 76, pp. 341–391, 2023.
[11] A. Devonport, A. Saoud, and M. Arcak, “Symbolic abstractions from data: A pac learning approach,” in 2021 60th IEEE Conference on Decision and Control (CDC). IEEE, 2021, pp. 599–604.
[12] M. Kazemi, R. Majumdar, M. Salamati, S. Soudjani, and B. Wooding, “Data-driven abstraction-based control synthesis,” Nonlinear Analysis: Hybrid Systems, vol. 52, p. 101467, 2024.
[13] T. M. Cover, Elements of information theory. John Wiley & Sons, 1999.
[14] E. Riegler, H. Bölcskei, and G. Koliander, “Rate-distortion theory for general sets and measures,” in 2018 IEEE International Symposium on Information Theory (ISIT). IEEE, 2018, pp. 101–105.
[15] E. Riegler, G. Koliander, and H. Bölcskei, “Lossy compression of general random variables,” Information and Inference: A Journal of the IMA, vol. 12, no. 3, pp. 1759–1829, 2023.
[16] S. Esmaeil Zadeh Soudjani and A. Abate, “Adaptive and sequential gridding procedures for the abstraction and verification of stochastic processes,” SIAM Journal on Applied Dynamical Systems, vol. 12, no. 2, pp. 921–956, 2013.
[17] S. Adams, M. Lahijanian, and L. Laurenti, “Formal control synthesis for stochastic neural network dynamic models,” IEEE Control Systems Letters, vol. 6, pp. 2858–2863, 2022.
[18] Y. Tazaki and J.-i. Imura, “Discrete-state abstractions of nonlinear systems using multi-resolution quantizer,” in International Workshop on Hybrid Systems: Computation and Control. Springer, 2009, pp. 351–365.
[19] K. Hsu, R. Majumdar, K. Mallik, and A.-K. Schmuck, “Multi-layered abstraction-based controller synthesis for continuous-time systems,” in Proceedings of the 21st International Conference on Hybrid Systems: Computation and Control (part of CPS Week), 2018, pp. 120–129.
[20] J. Calbert, L. N. Egidio, and R. M. Jungers, “Smart abstraction based on iterative cover and non-uniform cells,” IEEE Control Systems Letters, vol. 8, pp. 2301–2306, 2024.
[21] A.-K. Schmuck and J. Raisch, “Asynchronous l-complete approximations,” Systems & Control Letters, vol. 73, pp. 67–75, 2014.
[22] A. Banse, G. Delimpaltadakis, L. Laurenti, M. Mazo Jr, and R. M. Jungers, “Memory-dependent abstractions of stochastic systems through the lens of transfer operators,” in Proceedings of the 28th ACM International Conference on Hybrid Systems: Computation and Control, 2025, pp. 1–12.
[23] G. A. Gleizer and M. Mazo Jr, “Chaos and order in event-triggered control,” IEEE Transactions on Automatic Control, vol. 68, no. 11, pp. 6541–6556, 2023.
[24] A. Lavaei, S. Soudjani, and M. Zamani, “Compositional abstraction of large-scale stochastic systems: A relaxed dissipativity approach,” Nonlinear Analysis: Hybrid Systems, vol. 36, p. 100880, 2020.
[25] G. Delimpaltadakis, M. Lahijanian, M. Mazo Jr, and L. Laurenti, “Interval markov decision processes with continuous action-spaces,” in Proceedings of the 26th ACM International Conference on Hybrid Systems: Computation and Control, 2023, pp. 1–10.
[26] D. Lind and B. Marcus, An introduction to symbolic dynamics and coding. Cambridge university press, 2021.
[27] E. Lindenstrauss and M. Tsukamoto, “From rate distortion theory to metric mean dimension: variational principle,” IEEE Transactions on Information Theory, vol. 64, no. 5, pp. 3590–3609, 2018.
[28] D. Abel, D. Arumugam, K. Asadi, Y. Jinnai, M. L. Littman, and L. L. Wong, “State abstraction as compression in apprenticeship learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 3134–3142.
[29] O. Biza, R. Platt, J.-W. van de Meent, and L. L. Wong, “Learning discrete state abstractions with deep variational inference,” arXiv preprint arXiv:2003.04300, 2020.
[30] D. T. Larsson, D. Maity, and P. Tsiotras, “A generalized information-theoretic framework for the emergence of hierarchical abstractions in resource-limited systems,” Entropy, vol. 24, no. 6, p. 809, 2022.
[31] H. Federer, Geometric measure theory, ser. Grundlehren Math. Wiss. Springer, Cham, 1969, vol. 153.
[32] N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” arXiv preprint physics/0004057, 2000.

An Information Theory of Finite Abstractions and their Fundamental Scalability Limits

Abstract

1 Introduction

Contributions

Related work

2 Preliminaries

2.1 Measure spaces, Hausdorff measure, generalized entropy

Example 2.1 (Entropy of trajectories of the doubling map).

2.2 Rate-distortion theory on measurable spaces

Theorem 2.1 (Generalized Shannon lower bound [15, Thm. 3.1, simplified]).

Proof Sketch.

2.3 Transition systems

Definition 2.2 (Transition system).

3 Abstractions and the curse of dimensionality

3.1 Finite abstractions of dynamical systems

Assumption 1 (The state space).

Definition 3.1 (Measurable Partition).

Definition 3.2 (Abstraction).

Theorem 3.3 (Behavioral inclusion [1, Theorem 4.18, simplified]).

3.2 Abstraction-based verification and information loss

4 Information-theoretic framework for finite abstractions

Remark 1 (Initial-condition distribution).

Remark 2 (Statistics of abstractions’ accuracy and size).

Remark 3 (The message space is ℬlS{\mathcal{B}}_{l}^{S}).

5 Rate-distortion theory and a fundamental limit for abstractions

5.1 A fundamental limit on abstracting dynamical systems

Proposition 5.1 (Abstraction vs. encoder distortion).

Theorem 5.2 (Shannon lower bound for abstractions).

Remark 4 (On the assumptions of Thm. 5.2).

5.2 Interpretation and calculus for Theorem 5.2

Proposition 5.3 (Computing h​(ξ)h(\xi) and hs​(ξ)h_{s}(\xi)).

Proposition 5.4 (Computing cℬlSc_{{\mathcal{B}}_{l}^{S}}).

Remark 5 (cℬlSc_{{\mathcal{B}}_{l}^{S}} at high rates).

Example 5.1 (Exponentially stable systems).

Example 5.2 (Schur LTI systems).

Example 5.3 (Marginally stable LTI system).

6 Numerical Examples

6.1 The doubling map

6.2 A 3D nonlinear system and abstractions with uniform grids

7 Conclusion and Future Research: Towards Minimal Abstractions

8 Technical Results and Proofs

Proof of Prop. 5.1.

Lemma 8.1.

Proof.

Proof of Thm. 5.2.

Proof of Prop. 5.3.

Lemma 8.2.

Proof.

Proof of Prop. 5.4.

References

Remark 3 (The message space is ${\mathcal{B}}_{l}^{S}$ ).

Proposition 5.3 (Computing $h(\xi)$ and $h_{s}(\xi)$ ).

Proposition 5.4 (Computing $c_{{\mathcal{B}}_{l}^{S}}$ ).

Remark 5 ( $c_{{\mathcal{B}}_{l}^{S}}$ at high rates).

7 Conclusion and Future Research:
Towards Minimal Abstractions