Python3 整数储存原理小结

最新推荐文章于 2024-09-28 11:01:07 发布

翻译最新推荐文章于 2024-09-28 11:01:07 发布 · 2.9k 阅读

标签

#python整数对象

知识点专栏收录该内容

3 篇文章

订阅专栏

本文深入探讨Python中整数对象的储存机制，包括小整数池的预存储特性及长整数的动态储存方式，揭示Python高效管理内存的秘密。

Python3 整数对象小结

本篇将从整数数据储存的方式开始，分析python中的整数对象
如有错误，请留言，谢谢。

1 小整数储存

对于小型整数，python建立了一个小整数池，预存储这些整数。具体代码分析如下：

1.1 准备工作

#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS           257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS           5
#endif
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
#ifdef COUNT_ALLOCS
Py_ssize_t quick_int_allocs, quick_neg_int_allocs;
#endif

其中，NSMALLPOSINTS是小整数的上限，为257；NSMALLNEGINTS是小整数的下线，为-5。提前说明的是，这个小整数池区间是左闭右开的。它在这个类中也定义了一个PyLongObject 类型的静态数组，存储大小则是5+257。这包含了-5到256中的每一个整数。

/* Small integers are preallocated in this array so that they
   can be shared.
   The integers that are preallocated are those in the range
   -NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/

这里是对于小整数储存的注释，我也一并截下。

1.2 相关方法
static PyObject* get_samll_int(sdigit ival)

#ifdef COUNT_ALLOCS
Py_ssize_t quick_int_allocs, quick_neg_int_allocs;
#endif

static PyObject *
get_small_int(sdigit ival)
{
    PyObject *v;
    assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS);
    v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
    Py_INCREF(v);
#ifdef COUNT_ALLOCS
    if (ival >= 0)
        quick_int_allocs++;
    else
        quick_neg_int_allocs++;
#endif
    return v;
}

传入一个sdigit类型对象，获取一个对应整数池地址的引用。

static PyLongObject *maybe_small_long(PyLongObject *v)

static PyLongObject *
maybe_small_long(PyLongObject *v)
{
    if (v && Py_ABS(Py_SIZE(v)) <= 1) {
        sdigit ival = MEDIUM_VALUE(v);
        if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) {
            Py_DECREF(v);
            return (PyLongObject *)get_small_int(ival);
        }
    }
    return v;
}

将一个创建出的长整数进行判断，若为小整数，则调用get_small_int(sdigit)方法获取对应地址并回收长整数地址。否则，直接返回这个长整数对象。

2 长整数储存方式

2.1准备工作
接下来，从longintrepr.h这个头文件中，对于来分析PyLongObject类型的储存方式。

Parameters of the long integer representation.  There are two different
   sets of parameters: one set for 30-bit digits, stored in an unsigned 32-bit
   integer type, and one set for 15-bit digits with each digit stored in an
   unsigned short.  The value of PYLONG_BITS_IN_DIGIT, defined either at
   configure time or in pyport.h, is used to decide which digit size to use.

这里是对于长整数的具体表达形式的说明，python可能会采取两种不同的模式记录整数：30位（unsigned 32-bit int）和15位（unsigned short），究竟采取哪一种，这依据于pyport.h中的 PYLONG_BITS_IN_DIGIT变量的值。

/* If PYLONG_BITS_IN_DIGIT is not defined then we'll use 30-bit digits if all
   the necessary integer types are available, and we're on a 64-bit platform
   (as determined by SIZEOF_VOID_P); otherwise we use 15-bit digits. */

#ifndef PYLONG_BITS_IN_DIGIT
#if (defined HAVE_UINT64_T && defined HAVE_INT64_T && \
     defined HAVE_UINT32_T && defined HAVE_INT32_T && SIZEOF_VOID_P >= 8)
#define PYLONG_BITS_IN_DIGIT 30
#else
#define PYLONG_BITS_IN_DIGIT 15
#endif
#endif

如上所示。
对应于不同的PYLONG_BITS_IN_DIGIT，后续参数的数值也不太一样，此处就不展示了。

核心代码：

struct _longobject {
	PyObject_VAR_HEAD
	digit* ob_digit;
};

对于此处，Python给出的解释是这样的：

   The absolute value of a number is equal to
   	SUM(for i=0 through abs(ob_size)-1) ob_digit[i] * 2**(SHIFT*i)
   Negative numbers are represented with ob_size < 0;
   zero is represented by ob_size == 0.

根据ob_size逐位进行运算，其中的SHIFT跟上文提到的PYLONG_BITS_INDIGIT有关。

#define PyObject_VAR_HEAD
	PyObject_HEAD
	int ob_size;

这是对于PyObject_VAR_HEAD的定义，PyObject_HEAD是定常对象的宏，这里的变长对象的宏是PyObject_VAR_HEAD。

typedef struct _object{
	PyObject_HEAD
} PyObject;

typedef struct _object{
	int ob_refcnt;
	struct _typeobject *ob_type;
} PyObject;

ob_refcnt：对象的引用计数，与Python的内存管理机制有关，它实现了基于引用计数的垃圾收集机制。
ob_type：用于描述Python对象的类型信息。

2.2相关方法（完整版本较长，这里仅贴出关键的几处）
PyLongObject *_PyLong_New(Py_ssize_t size)

PyLongObject *
_PyLong_New(Py_ssize_t size)
{
    PyLongObject *result;
    if (size > (Py_ssize_t)MAX_LONG_DIGITS) {
        PyErr_SetString(PyExc_OverflowError,
                        "too many digits in integer");
        return NULL;
    }
    result = PyObject_MALLOC(offsetof(PyLongObject, ob_digit) +
                             size*sizeof(digit));
    if (!result) {
        PyErr_NoMemory();
        return NULL;
    }
    return (PyLongObject*)PyObject_INIT_VAR(result, &PyLong_Type, size);
}

补充：
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE )0)->MEMBER);
以及对于PyObject_MALLOC()的解释（实在找不到源码了…只能从官网的文档上扒一下）
void PyObject_Malloc(size_t n)
Allocates n bytes and returns a pointer of type void* to the allocated memory, or NULL if the request fails.
Requesting zero bytes returns a distinct non-NULL pointer if possible, as if PyObject_Malloc(1) had been called instead. The memory will not have been initialized in any way.
这个地址并不会在分配的时候初始化，而是在PyObject_INIT_VAR(result, &PyLong_Type, size);的时候初始化为可变长对象，并用 (PyLongObject*)转换成长整型对象。

需要注意的是，MAX_LONG_DIGITS这并不是一个很小的数，它的宏定义是
#define MAX_LONG_DIGITS ((PY_SSIZE_T_MAX - offsetof(PyLongObject, ob_digit))/sizeof(digit))
其中，PY_SSIZE_T_MAX来自于
SET_SYS_FROM_STRING(“maxsize”,PyLong_FromSsize_t(PY_SSIZE_T_MAX));
这段是系统分配的内存大小，可以使用sys.maxsize()来查看。