其他
Python在计算内存时值得注意的几个问题
The following article is from Python猫 Author 豌豆花下猫
sys.getsizeof()
来计算内存,但是用这个方法计算时,可能会出现意料不到的问题。该方法用于获取一个对象的字节大小(bytes) 它只计算直接占用的内存,而不计算对象内所引用对象的内存
“浅计算”与其它问题
“浅计算”方法的底层实现是怎样的? 为什么 getsizeof() 会采用“浅计算”的方法?
__sizeof__()
魔术方法,对于内置对象来说,这个方法是通过 CPython 解释器实现的。static Py_ssize_t
int___sizeof___impl(PyObject *self)
{
Py_ssize_t res;
res = offsetof(PyLongObject, ob_digit) + Py_ABS(Py_SIZE(self))*sizeof(digit);
return res;
}
字节增大:int 类型在 C 语言中只占到 4 个字节,但是在 Python 中,int 其实是被封装成了一个对象,所以在计算其大小时,会包含对象结构体的大小。在 32 位解释器中,getsizeof(1) 的结果是 14 个字节,比数字本身的 4 字节增大了。 字节减少:对于相对复杂的对象,例如列表和字典,这套计算机制由于没有累加内部元素的占用量,就会出现比真实占用内存小的结果。
“深计算”与其它问题
是否存在“深计算”的方法/实现方案? 实现“深计算”时应该注意什么?
pympler
和 pysize
:第一个项目已发布在 Pypi 上,可以“pip install pympler”安装;第二个项目烂尾了,作者也没发布到 Pypi 上(注:Pypi 上已有个 pysize 库,是用来做格式转化的,不要混淆),但是可以在 Github 上获取到其源码。118
190
206
300281
30281
"""Recursively finds size of objects in bytes"""
size = sys.getsizeof(obj)
if seen is None:
seen = set()
obj_id = id(obj)
if obj_id in seen:
return 0
# Important mark as seen *before* entering recursion to gracefully handle
# self-referential objects
seen.add(obj_id)
if hasattr(obj, '__dict__'):
for cls in obj.__class__.__mro__:
if '__dict__' in cls.__dict__:
d = cls.__dict__['__dict__']
if inspect.isgetsetdescriptor(d) or inspect.ismemberdescriptor(d):
size += get_size(obj.__dict__, seen)
break
if isinstance(obj, dict):
size += sum((get_size(v, seen) for v in obj.values()))
size += sum((get_size(k, seen) for k in obj.keys()))
elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
size += sum((get_size(i, seen) for i in obj))
if hasattr(obj, '__slots__'): # can have __slots__ with __dict__
size += sum(get_size(getattr(obj, s), seen) for s in obj.__slots__ if hasattr(obj, s))
return size
__dict__
和 __slots__
属性的部分(针对类对象),它主要是对字典类型及可迭代对象(除字符串、bytes、bytearray)作递归的计算,逻辑并不复杂。'''Return the combined size of the given objects
(with modified options, see method **set**).
'''
if opts:
self.set(**opts)
self.exclude_refs(*objs) # skip refs to objs
return sum(self._sizer(o, 0, 0, None) for o in objs)
'''Size an object, recursively.
'''
s, f, i = 0, 0, id(obj)
if i not in self._seen:
self._seen[i] = 1
elif deep or self._seen[i]:
# skip obj if seen before
# or if ref of a given obj
self._seen.again(i)
if sized:
s = sized(s, f, name=self._nameof(obj))
self.exclude_objs(s)
return s # zero
else: # deep == seen[i] == 0
self._seen.again(i)
try:
k, rs = _objkey(obj), []
if k in self._excl_d:
self._excl_d[k] += 1
else:
v = _typedefs.get(k, None)
if not v: # new typedef
_typedefs[k] = v = _typedef(obj, derive=self._derive_,
frames=self._frames_,
infer=self._infer_)
if (v.both or self._code_) and v.kind is not self._ign_d:
# 猫注:这里计算 flat size
s = f = v.flat(obj, self._mask) # flat size
if self._profile:
# profile based on *flat* size
self._prof(k).update(obj, s)
# recurse, but not for nested modules
if v.refs and deep < self._limit_ \
and not (deep and ismodule(obj)):
# add sizes of referents
z, d = self._sizer, deep + 1
if sized and deep < self._detail_:
# use named referents
self.exclude_objs(rs)
for o in v.refs(obj, True):
if isinstance(o, _NamedRef):
r = z(o.ref, i, d, sized)
r.name = o.name
else:
r = z(o, i, d, sized)
r.name = self._nameof(o)
rs.append(r)
s += r.size
else: # just size and accumulate
for o in v.refs(obj, False):
# 猫注:这里递归计算 item size
s += z(o, i, d, None)
# deepest recursion reached
if self._depth < d:
self._depth = d
if self._stats_ and s > self._above_ > 0:
# rank based on *total* size
self._rank(k, obj, s, deep, pid)
except RuntimeError: # XXX RecursionLimitExceeded:
self._missed += 1
if not deep:
self._total += s # accumulate
if sized:
s = sized(s, f, name=self._nameof(obj), refs=rs)
self.exclude_objs(s)
return s
'''Return the aligned flat size.
'''
s = self.base
if self.leng and self.item > 0: # include items
s += self.leng(obj) * self.item
# workaround sys.getsizeof (and numpy?) bug ... some
# types are incorrectly sized in some Python versions
# (note, isinstance(obj, ()) == False)
# 猫注:不可 sys.getsizeof 的,则用上面逻辑,可以的,则用下面逻辑
if not isinstance(obj, _getsizeof_excls):
s = _getsizeof(obj, s)
if mask: # align
s = (s + mask) & ~mask
return s
小结
相关链接
【end】
◆
精彩推荐
◆
本周四晚八点,澎思科技智能安防行业解决方案副总监带来的直播《疫情防控天网:云端边下的全栈AI技术与应用》。扫描二维码或者点击阅读原文即刻报名。
推荐阅读
你点的每个“在看”,我都认真当成了AI