The Problem
It seems I have way too much stuff in the portal history for objects that don't longer exits. What can I do? And how does it all work?
After deleting a lot of objects from a site I was surprised to see that even after packing, the Data.fs was still quite bloated: around 4GB for some hundred folders. That can't possibly be.
Tracing
I found the following page http://plone.org/documentation/kb/debug-zodb-bloat
That was all quite interesting, and I followed the steps through. I got the impression that I had lots of OFS.Image.Pdata and also a bit too much P.A.Field.Image for my taste:
Class Name Count TBytes Pct AvgSize
---------------------------------------------- ------- --------- ----- -------
...
<class 'OFS.Image.Pdata'> 47133 3840900148 86.8% 81490.68
...
<class 'Products.Archetypes.Field.Image'> 25326 265376666 6.0% 10478.43
So now I folloed 'Step 2' and ran
./bin/zopepy parts/zope2/lib/python/ZODB/scripts/dump.py var/filestorage/Data.fs > dump.txt
and looked at it. Especially I searched for the Pdata. One interesting (of a lot of them) had the oid "0x00000000002263b8".
To help my further research I created a little script that would allow me to do the 'Step 3' in the debug-zodb-bloat a bit better, by building the refmap only once, and storing it for later use:
import sys
sys.path.append('.')
from inspectZodbUtils import buildRefmap, doSearch
from ZODB.FileStorage import FileStorage
import cPickle
if len(sys.argv)>1:
filename = sys.argv[1]
storage = FileStorage(filename)
refmap = buildRefmap(storage)
cPickle.dump(refmap,open('refmap.pickle','w'))
else:
refmap = cPickle.load(open('refmap.pickle'))
import ipdb; ipdb.set_trace()
print 'fini'
So the first I would run it as
../bin/instance run q2.py ../var/filestorage/Data.fs
and later on only as
../bin/instance run q2.py
Maybe this is the point in time to praise a) cPickle and b) ipdb. Actually I started to read the pickle code as well, to kind of get an understanding of what was going on.
Anyhow, at the ipdb prompt I could now look at the oid mentioned above:
from ZODB.utils import p64
i = p64(0x00000000002263b8)
[app._p_jar[y] for y in path]
which gave the following output:
[<Image at image_mini>,
<BTrees._OOBTree.OOBTree object at 0x1172a89c>,
<ATImage at debug_configurations.PNG>,
<Products.CMFEditions.ArchivistTool.ObjectData object at 0x11722dac>,
<Products.CMFEditions.ZVCStorageTool.ZVCAwareWrapper object at 0x1172296c>,
<ZopeVersion at 7>,
<BTrees._OOBTree.OOBTree object at 0x1172a8e4>,
<ZopeVersionHistory at 695836488>,
BTrees._OOBTree.OOBucket([('6851117129', <ZopeVersionHistory at 6851117129>), ('6851967824', <ZopeVersionHistory at 6851967824>), ('6865017397', <ZopeVersionHistory at 6865017397>), ('6865947487', <ZopeVersionHistory at 6865947487>), ('6869947712', <ZopeVersionHistory at 6869947712>), ('6876654125', <ZopeVersionHistory at 6876654125>), ('6885760954', <ZopeVersionHistory at 6885760954>), ('6911165914', <ZopeVersionHistory at 6911165914>), ('6921026365', <ZopeVersionHistory at 6921026365>), ('6924684527', <ZopeVersionHistory at 6924684527>), ('6928625820', <ZopeVersionHistory at 6928625820>), ('6930059518', <ZopeVersionHistory at 6930059518>), ('6931991625', <ZopeVersionHistory at 6931991625>), ('6937499171', <ZopeVersionHistory at 6937499171>), ('6942075215', <ZopeVersionHistory at 6942075215>), ('694421509', <ZopeVersionHistory at 694421509>), ('6958252627', <ZopeVersionHistory at 6958252627>), ('695836488', <ZopeVersionHistory at 695836488>), ('6962374688', <ZopeVersionHistory at 6962374688>), ('6969874691', <ZopeVersionHistory at 6969874691>), ('6970090560', <ZopeVersionHistory at 6970090560>), ('697019247', <ZopeVersionHistory at 697019247>), ('6972404889', <ZopeVersionHistory at 6972404889>), ('6972858120', <ZopeVersionHistory at 6972858120>), ('6973604267', <ZopeVersionHistory at 6973604267>)]),
BTrees._OOBTree.OOBucket([('6773749252', <ZopeVersionHistory at 6773749252>), ('6776428556', <ZopeVersionHistory at 6776428556>), ('6777239827', <ZopeVersionHistory at 6777239827>), ('6777885573', <ZopeVersionHistory at 6777885573>), ('6784418225', <ZopeVersionHistory at 6784418225>), ('6785227879', <ZopeVersionHistory at 6785227879>), ('6788995801', <ZopeVersionHistory at 6788995801>), ('6791790198', <ZopeVersionHistory at 6791790198>), ('679295379', <ZopeVersionHistory at 679295379>), ('6795881181', <ZopeVersionHistory at 6795881181>), ('6801955479', <ZopeVersionHistory at 6801955479>), ('6802303487', <ZopeVersionHistory at 6802303487>), ('681163508', <ZopeVersionHistory at 681163508>), ('6816736841', <ZopeVersionHistory at 6816736841>), ('6821976622', <ZopeVersionHistory at 6821976622>), ('6822305396', <ZopeVersionHistory at 6822305396>), ('6826208798', <ZopeVersionHistory at 6826208798>), ('6829700253', <ZopeVersionHistory at 6829700253>), ('683348517', <ZopeVersionHistory at 683348517>), ('6834359534', <ZopeVersionHistory at 6834359534>), ('68404997', <ZopeVersionHistory at 68404997>), ('6841189737', <ZopeVersionHistory at 6841189737>), ('6847502691', <ZopeVersionHistory at 6847502691>)]),
BTrees._OOBTree.OOBucket([('6666052774', <ZopeVersionHistory at 6666052774>), ('6672478127', <ZopeVersionHistory at 6672478127>), ('6675922832', <ZopeVersionHistory at 6675922832>), ('6676319233', <ZopeVersionHistory at 6676319233>), ('6681315686', <ZopeVersionHistory at 6681315686>), ('6682030763', <ZopeVersionHistory at 6682030763>), ('6684334569', <ZopeVersionHistory at 6684334569>), ('668754745', <ZopeVersionHistory at 668754745>), ('6694231287', <ZopeVersionHistory at 6694231287>), ('6698458229', <ZopeVersionHistory at 6698458229>), ('6699928945', <ZopeVersionHistory at 6699928945>), ('6699959128', <ZopeVersionHistory at 6699959128>), ('6717912996', <ZopeVersionHistory at 6717912996>), ('6725309273', <ZopeVersionHistory at 6725309273>), ('6732365671', <ZopeVersionHistory at 6732365671>), ('6732441266', <ZopeVersionHistory at 6732441266>), ('6745243275', <ZopeVersionHistory at 6745243275>), ('6755179016', <ZopeVersionHistory at 6755179016>), ('6755687148', <ZopeVersionHistory at 6755687148>), ('6757413776', <ZopeVersionHistory at 6757413776>), ('6761760255', <ZopeVersionHistory at 6761760255>), ('6762277917', <ZopeVersionHistory at 6762277917>), ('6767266586', <ZopeVersionHistory at 6767266586>), ('6768276066', <ZopeVersionHistory at 6768276066>)]),
<BTrees._OOBTree.OOBTree object at 0x1172aa04>,
<ZopeRepository at repo>,
<ZVCStorageTool at portal_historiesstorage>,
<PloneSite at Plone>,
<Application at >,
{'Application': <Application at >, 'ZGlobals': <BTrees._OOBTree.OOBTree object at 0xc2ee074>}]
What does that tell me:
- it seems that the portal_historiesstorage is involved.
- there are lots of classes and objects involved that don't make too much sense to me.
- there are lots of ids like '695836488' involved, which are not the normal UID - what are they?
I poked a bit around in portal_historiesstorage, and then a bit more. This is really confusing land. What is this all about?
A bit of a hint is in Products/CMFEditions/doc, and there also in presentations. There are several tools involved, policies and whatever. I also had a short look at interfaces, and also at the tests, but they didn't make too much sense to me.
So, after poking around some more, and just not getting it, I decided to stick the beloved 'import ipdb; ipdb.set_trace' into line 285, that in the 'save' method of the CopyModifyMergeRepositoryTool. This was to see what would happen if I created a new object. This is a version of my notes, following through the code. Its not 100% complete. I tried to note down all the calls that seemed to be somewhat interesting, and tried to keep the nesting intact. Sometimes I also wrote down comments, variable values etc.
/home/joerg/projects/myproject/development/eggs/Products.CMFEditions-1.2.4-py2.4.egg/Products/CMFEditions/CopyModifyMergeRepositoryTool.py(285)
/home/joerg/projects/myproject/development/eggs/Products.CMFEditions-1.2.4-py2.4.egg/Products/CMFEditions/CopyModifyMergeRepositoryTool.py(421)_prepareSysMetadata()
not too interesting
/home/joerg/projects/myproject/development/eggs/Products.CMFEditions-1.2.4-py2.4.egg/Products/CMFEditions/CopyModifyMergeRepositoryTool.py(433)_recursiveSave()
/home/joerg/projects/myproject/development/eggs/Products.CMFEditions-1.2.4-py2.4.egg/Products/CMFEditions/ArchivistTool.py(210)prepare()
/home/joerg/projects/myproject/development/eggs/Products.CMFEditions-1.2.4-py2.4.egg/Products/CMFEditions/utilities.py(43)dereference()
/home/joerg/projects/myproject/development/eggs/Products.CMFUid-2.1.2-py2.4.egg/Products/CMFUid/UniqueIdHandlerTool.py(129)queryUid()
returns the default, which is None.
returns obj, history_id, here (obj,None). history_id here referes to cmf_uid
/home/joerg/projects/myproject/development/eggs/Products.CMFEditions-1.2.4-py2.4.egg/Products/CMFEditions/ZVCStorageTool.py(187)isRegistered()
/home/joerg/projects/myproject/development/eggs/Products.CMFEditions-1.2.4-py2.4.egg/Products/CMFEditions/ZVCStorageTool.py(408)_getShadowStorage()
return self._shadowStorage
wants to return shadow_storage.isRegistered(history_id), but goes to
/home/joerg/projects/myproject/development/parts/zope2/lib/python/ZODB/Connection.py(749)setstate()
This is just loading its ghost, np
/home/joerg/projects/myproject/development/eggs/Products.CMFEditions-1.2.4-py2.4.egg/Products/CMFEditions/ZVCStorageTool.py(760)isRegistered()
return history_id in self._storage
return False
/home/joerg/projects/myproject/development/eggs/Products.CMFUid-2.1.2-py2.4.egg/Products/CMFUid/UniqueIdHandlerTool.py(103)register()
/home/joerg/projects/myproject/development/eggs/Products.CMFUid-2.1.2-py2.4.egg/Products/CMFUid/UniqueIdHandlerTool.py(129)queryUid()
returns the default, which is None. Same as above
/home/joerg/projects/myproject/development/eggs/Products.CMFUid-2.1.2-py2.4.egg/Products/CMFUid/UniqueIdGeneratorTool.py(61)__call__()
self._uid_counter += 1
return self._uid_counter (8552)
/home/joerg/projects/myproject/development/eggs/Products.CMFUid-2.1.2-py2.4.egg/Products/CMFUid/UniqueIdHandlerTool.py(91)_setUid()
annotation = anno_tool(obj, self.UID_ATTRIBUTE_NAME)
annotation.setUid(uid)
/home/joerg/projects/myproject/development/eggs/Products.CMFUid-2.1.2-py2.4.egg/Products/CMFUid/UniqueIdAnnotationTool.py(65)setUid()
self._uid = uid
obj.cmf_uid == annotation
return uid (8552)
history_id == 8552
/home/joerg/projects/myproject/development/eggs/Products.CMFEditions-1.2.4-py2.4.egg/Products/CMFEditions/ModifierRegistryTool.py(139)getReferencedAttributes()
return {} //keine Referenzen
/home/joerg/projects/myproject/development/eggs/Products.CMFEditions-1.2.4-py2.4.egg/Products/CMFEditions/ArchivistTool.py(182)_cloneByPickle()
//pickles, gets size of pickle
//creates clone by unpickling
return approxSize, u.load(), inside_orefs, outside_orefs // return approxSize, u.load(), (), ()
/home/joerg/projects/myproject/development/eggs/Products.CMFEditions-1.2.4-py2.4.egg/Products/CMFEditions/ModifierRegistryTool.py(220)beforeSaveModifier()
for ignored_id, mod in self._collectModifiers(obj, ISaveRetrieveModifier):
mdata, icrefs, ocrefs = mod.beforeSaveModifier(obj, obj_clone)
Products.CMFEditions.StandardModifiers.RetainUIDs -> return {}, [], []
Products.CMFEditions.StandardModifiers.NotRetainATRefs -> return {}, [], []
Products.CMFEditions.StandardModifiers.RetainWorkflowStateAndHistory -> return {"review_state": review_state}, [], []
Products.CMFEditions.StandardModifiers.RetainPermissionsSettings -> return {}, [], []
return metadata, inside_crefs, outside_crefs // ({'review_state': 'published'}, [], [])
sys_metadata.update(metadata) // now: {'comment': '', 'originator': None, 'review_state': 'published', 'timestamp': 1341068928.3322461}
clone_info = ObjectData(clone, inside_crefs, outside_crefs)
obj_info = ObjectData(obj, inside_orefs, outside_orefs)
// o = ObjectData(foo,bar,baz) -> o.foo, o.bar, ob.baz
return PreparedObject(history_id, obj_info, clone_info,
referenced_data, app_metadata,
sys_metadata, is_registered, approxSize)
/home/joerg/projects/myproject/development/eggs/Products.CMFEditions-1.2.4-py2.4.egg/Products/CMFEditions/ArchivistTool.py(397)__init__()
//Basically another wrapper object.
modifies sys_metadata for lots of values. Interesting:
sys_metadata['parent'] = {
'history_id': portal_uidhandler.register(parent), ....
and:
metadata = {
'sys_metadata': sys_metadata,
'app_metadata': app_metadata,
}
...
self.metadata = metadata
...
sys_metadata['originator'] = "%s.%s.%s" % (prep.history_id,
clone.version_id,
clone.location_id, ) #'8552.0.0'
//saving references. None here
/home/joerg/projects/myproject/development/eggs/Products.CMFEditions-1.2.4-py2.4.egg/Products/CMFEditions/ArchivistTool.py(275)save()
/home/joerg/projects/myproject/development/eggs/Products.CMFEditions-1.2.4-py2.4.egg/Products/CMFEditions/ArchivistTool.py(263)register()
#def register(self, prepared_obj)
return storage.register(prepared_obj.history_id,
prepared_obj.clone,
prepared_obj.referenced_data,
prepared_obj.metadata)
/home/joerg/projects/myproject/development/eggs/Products.CMFEditions-1.2.4-py2.4.egg/Products/CMFEditions/ZVCStorageTool.py(198)register()
zvc_method = self._getZVCRepo().applyVersionControl # /Plone/portal_historiesstorage/repo
/home/joerg/projects/myproject/development/eggs/Products.CMFEditions-1.2.4-py2.4.egg/Products/CMFEditions/ZVCStorageTool.py(367)_applyOrCheckin()
#def _applyOrCheckin(self, zvc_method, history_id, shadowInfo,
object, referenced_data, metadata):
#zvc_method == ZopeRepository.applyVersionControl
#history_id == 8552
#shadowInfo == {'vc_info': None}
#object == prepared_obj.clone -- see above
#referenced_odata == {}
#metadata -- see above :516
checking size -> pickling if in doubt
vc_info = self._getVcInfo(object, shadowInfo)
/home/joerg/projects/myproject/development/eggs/Products.CMFEditions-1.2.4-py2.4.egg/Products/CMFEditions/ZVCStorageTool.py(467)_getVcInfo()
return None (in this case)
zvc_obj = ZVCAwareWrapper(object, referenced_data, metadata,
vc_info)
/home/joerg/projects/myproject/development/eggs/Products.CMFEditions-1.2.4-py2.4.egg/Products/CMFEditions/ZVCStorageTool.py(941)__init__()
#a wrapper object. Interesting in 945
#self._physicalPath = metadata['sys_metadata'].get('physicalPath', ())[:] # copy
# here: ()
/home/joerg/projects/myproject/development/eggs/Products.CMFEditions-1.2.4-py2.4.egg/Products/CMFEditions/ZVCStorageTool.py(512)_encodeMetadata()
basically comment+pickle of metadata
/home/joerg/projects/myproject/development/eggs/Products.ZopeVersionControl-1.0a1-py2.4.egg/Products/ZopeVersionControl/Repository.py(146)applyVersionControl()
#this is zvc_method
#now for the real stuff
...
branch = 'mainline'
...
history = self.createVersionHistory(object)
/home/joerg/projects/myproject/development/eggs/Products.ZopeVersionControl-1.0a1-py2.4.egg/Products/ZopeVersionControl/Repository.py(51)createVersionHistory()
history_id = str(randint(1, 9999999999))
/home/joerg/projects/myproject/development/eggs/Products.ZopeVersionControl-1.0a1-py2.4.egg/Products/ZopeVersionControl/VersionHistory.py(32)__init__()
#history_id now e.g. '7417491791'
mainline = self.createBranch('mainline', None)
/home/joerg/projects/myproject/development/eggs/Products.ZopeVersionControl-1.0a1-py2.4.egg/Products/ZopeVersionControl/VersionHistory.py(84)createBranch()
branch = BranchInfo(branch_id, version_id)
/home/joerg/projects/myproject/development/eggs/Products.ZopeVersionControl-1.0a1-py2.4.egg/Products/ZopeVersionControl/VersionHistory.py(230)__init__()
#just a mapping object.
self._branches[branch_id] = branch
self.id = history_id
self._histories[history_id] = history
return history.__of__(self)
version = history.createVersion(object, branch)
/home/joerg/projects/myproject/development/eggs/Products.ZopeVersionControl-1.0a1-py2.4.egg/Products/ZopeVersionControl/VersionHistory.py(96)createVersion()
branch = self._branches.get(branch_id) #branch_id == 'mainline'
...
version_id = '%d' % (len(branch) + 1) #'1'
version = ZopeVersion(version_id, object)
/home/joerg/projects/myproject/development/eggs/Products.ZopeVersionControl-1.0a1-py2.4.egg/Products/ZopeVersionControl/Version.py(72)__init__()
#Another mapper object
#does _not_ contain the object (which is version, a ZVCAwareWrapper)
...
branch.append(version)
/home/joerg/projects/myproject/development/eggs/Products.ZopeVersionControl-1.0a1-py2.4.egg/Products/ZopeVersionControl/VersionHistory.py(242)append()
#just metadata storage, not the actual object
#the ids in self.m_order count down, and only point to version id [(2147483647, '1')]
self._versions[version_id] = version #still no object
version.saveState(object)
/home/joerg/projects/myproject/development/eggs/Products.ZopeVersionControl-1.0a1-py2.4.egg/Products/ZopeVersionControl/Version.py(93)saveState()
self._data = self.stateCopy(obj, self) #becomes effectively the clonebypickled clone of the clone?
/home/joerg/projects/myproject/development/eggs/Products.ZopeVersionControl-1.0a1-py2.4.egg/Products/ZopeVersionControl/Version.py(105)stateCopy()
ignore = listNonVersionedObjects(obj)
/home/joerg/projects/myproject/development/eggs/Products.ZopeVersionControl-1.0a1-py2.4.egg/Products/ZopeVersionControl/nonversioned.py(64)listNonVersionedObjects(
return getNonVersionedDataAdapter(obj).listNonVersionedObjects()
/home/joerg/projects/myproject/development/eggs/Products.ZopeVersionControl-1.0a1-py2.4.egg/Products/ZopeVersionControl/nonversioned.py(40)getNonVersionedDataAdapter()
return StandardNonVersionedDataAdapter(obj)
/home/joerg/projects/myproject/development/eggs/Products.ZopeVersionControl-1.0a1-py2.4.egg/Products/ZopeVersionControl/nonversioned.py(92)listNonVersionedObjects()
return ()
res = cloneByPickle(aq_base(obj), ignore)
removeNonVersionedData(res)
getNonVersionedDataAdapter(obj).removeNonVersionedData()
/home/joerg/projects/myproject/development/eggs/Products.ZopeVersionControl-1.0a1-py2.4.egg/Products/ZopeVersionControl/nonversioned.py(40)getNonVersionedDataAdapter()
return StandardNonVersionedDataAdapter(obj)
/home/joerg/projects/myproject/development/eggs/Products.ZopeVersionControl-1.0a1-py2.4.egg/Products/ZopeVersionControl/nonversioned.py(94)removeNonVersionedData()
#did not do anything
return res
return version.__of__(self)
history_id = history.getId() # '7417491791'
version_id = version.getId() # '1'
info = VersionInfo(history_id, version_id, VersionInfo.CHECKED_IN)
/home/joerg/projects/myproject/development/eggs/Products.ZopeVersionControl-1.0a1-py2.4.egg/Products/ZopeVersionControl/Utility.py(51)__init__()
#a wrapper object around the given data, add logged in user
object.__vc_info__ = info
history.addLogEntry(version_id,LogEntry.ACTION_CHECKIN,_findPath(object),message is None and 'Initial checkin.' or message)
/home/joerg/projects/myproject/development/eggs/Products.ZopeVersionControl-1.0a1-py2.4.egg/Products/ZopeVersionControl/VersionHistory.py(51)addLogEntry()
#creates a wrapper object, adds it under a count down key in self._eventLog._data[key]
return object
shadowInfo = {"vc_info": zvc_obj.__vc_info__,
"metadata": metadata,}
history = self._getShadowHistory(history_id, autoAdd=True)
#boils down to: self._shadowStorage._storage[history_id]=ShadowHistory()
return history.save(shadowInfo) #just the metadata, returns the version_id in the _shadowStorage_
#
#prep.original.object.version_id == 0
And now, what have I learned? I was most interested (and confused) by the use of history_id - what are they, and where do they come from?
The numbers here are from another run, with a new object at a higher cmf_uid of 8553:
#obj is the freshly created object
cmfuid = obj.cmf_uid() # 8553
ph = getToolByName(context.portal_url.getPortalObject(),'portal_historiesstorage')
hid = ph._shadowStorage._storage[cmfuid]._full[0]['vc_info'].history_id #'7496765414'
zvh = ph.zvc_repo._histories[hid]
vid = zvh._branches['mainline'].m_order[2147483647] #'1'
zv = zvh._versions[vid]
zv._data._object.object.title == obj.title
So, I got the feeling that there is a zvc_repo that contains the actual version information, and the _shadowStorage, which has some meta information. zvc_repo seems to work with the 'funny' numbers, which are randomly chosen longer ones, while _shadowStorage seems to use rather numbered short numbers.
This reminded me also of another post that by now seemed to be quite related to my question of understanding cmfeditions, the bloat and the repo:
http://stackoverflow.com/questions/9683466/purging-all-old-cmfeditions-versions
I got the feeling that some information in there is maybe a bit wrong, especially in regards to just deleting the _shadowStorage.
How could I find out? Also by now, I wanted to see what was really going on. Looking at the data would be nice. But pprint doesn't work here (it doesn't show you the contents of classes), the collective.zodbbrowser only works on plone4, and did not seem to be quite suited.
Structure
So I hacked a really ugly tool together, that can show me the full tree below a certain object. The tool is called rprint (not published yet because its real ugly code). It walks the tree of the objects, shows me the content of its subelements (only stored data). If it encounters an object like a Btree which has an items method, it either only prints and follows the first item, or items that match a list of ids that I have specified.
Using the cmf_uid and history_id from above, I got the following:
ipdb> reload(rprint); rprint.rprint(portal.portal_historiesstorage,[8553,'7496765414'])
<Products.CMFEditions.ZVCStorageTool.portal_historiesstorage>
_shadowStorage: <Products.CMFEditions.ZVCStorageTool.ShadowStorage>
_storage: <BTrees._OOBTree.OOBTree>
"8553": <Products.CMFEditions.ZVCStorageTool.ShadowHistory>
_full: <BTrees._IOBTree.IOBTree>
"0": {
"metadata": {
"sys_metadata": {
"comment": ''
"originator": '8553.0.0'
"parent": {
"history_id": 2
"location_id": None
"version_id": None
}
"timestamp": 1341238710.606384
"review_state": 'published'
"approxSize": 3041
"principal": 'myadmin'
}
"app_metadata": { }
}
"vc_info": <Products.ZopeVersionControl.Utility.VersionInfo>
status: 1
timestamp: 1341238712.795213
history_id: '7496765414'
user_id: 'myadmin'
version_id: '1'
}
...
_available: [ 0]
nextVersionId: 1
_sizeInaccurate: False
_approxSize: 3041
...
zvc_repo: <Products.ZopeVersionControl.ZopeRepository.ZopeRepository>
_labels: <BTrees._OIBTree.OIBTree>
{}
_branches: <BTrees._OIBTree.OIBTree>
"mainline": 1
...
title: 'ZVC Storage'
_created: 1292350937.0929461
id: 'repo'
_histories: <BTrees._OOBTree.OOBTree>
"7496765414": <Products.ZopeVersionControl.ZopeVersionHistory.ZopeVersionHistory>
_labels: <BTrees._OOBTree.OOBTree>
{}
_branches: <BTrees._OOBTree.OOBTree>
"mainline": <Products.ZopeVersionControl.VersionHistory.BranchInfo>
date_created: 1341238712.7773039
m_date: <BTrees._IIBTree.IIBTree>
"22353978": 2147483647
...
root: None
name: 'mainline'
m_order: <BTrees._IOBTree.IOBTree>
"2147483647": '1'
...
...
_eventLog: <Products.ZopeVersionControl.EventLog.EventLog>
_data: <BTrees._IOBTree.IOBTree>
"2147483647": <Products.ZopeVersionControl.EventLog.LogEntry>
user_id: 'myadmin'
path: ''
action: 1
timestamp: 1341238712.7953091
message: "S''\n.\x00\n\x80\x02}q\x01(U\x0csys_metadataq\x02}q\x03(U\x07commentq\x04U\x00U\noriginatorq\x05U\x088553.0.0q\x06U\x06parentq\x07}q\x08(U\nhistory_idq\tK\x02U\x0blocation_idq\nNU\nversion_idq\x0bNuU\ttimestampq\x0cGA\xd3\xfckm\xa6\xce\xffU\x0creview_stateq\rU\tpublishedq\x0eU\napproxSizeq\x0fM\xe1\x0bU\tprincipalq\x10U\x07myadminq\x11uU\x0capp_metadataq\x12}q\x13u."
version_id: '1'
...
_versions: <BTrees._OOBTree.OOBTree>
"1": <Products.ZopeVersionControl.ZopeVersion.ZopeVersion>
date_created: 1341238712.777617
_data: <Products.CMFEditions.ZVCStorageTool.ZVCAwareWrapper>
_physicalPath: ( )
_object: <Products.CMFEditions.ArchivistTool.ObjectData>
outside_refs: [ ]
object: <Products.ATContentTypes.content.document.ATDocument>
_Sharing_page__Delegate_Contributor_role_Permission: ( )
_signature: "FL\x06\xbf\x95\xedQ/\x96Xl\x89\x1a'b\xea"
creation_date: DateTime('2012/07/02 16:18:03.004 GMT+2')
marshall_hook: None
_Access_contents_information_Permission: ( 'Editor', 'Manager', 'Owner', 'Reader')
location_id: 0
id: 'test3-title'
tableContents: False
_at_creation_flag: False
cmf_uid: <Products.CMFUid.UniqueIdAnnotationTool.UniqueIdAnnotation>
_uid: 8553
id: 'cmf_uid'
modification_date: DateTime('2012/07/02 16:18:03.097 GMT+2')
title: u'test3 title'
demarshall_hook: None
workflow_history: <Persistence.mapping.PersistentMapping>
data: {
"baworkflow": (
{
"action": None
"review_state": 'published'
"actor": 'myadmin'
"comments": ''
"time": DateTime('2012/07/02 16:18:03.009 GMT+2')
}
)
}
__provides__: <zope.interface.Provides object at 0xff17dcc>
_Change_portal_events_Permission: ( 'Editor', 'Manager', 'Owner')
_EtagSupport__etag: 'ts41238683.1'
presentation: False
__annotations__: <BTrees._OOBTree.OOBTree>
"Archetypes.storage.AnnotationStorage-text": <Products.Archetypes.BaseUnit.BaseUnit>
mimetype: 'text/html'
binary: 0
filename: ''
raw: u'<p>test3 maintext</p>'
__name__: 'text'
id: 'text'
original_encoding: 'utf-8'
size: 21
...
portal_type: 'Document'
version_id: 0
_at_uid: '3bdb26d9de1c4a1f3765d89dc8b1611a'
_Sharing_page__Delegate_Reader_role_Permission: ( 'Editor', 'Manager', 'Owner')
at_references: <OFS.Folder.Folder>
id: 'at_references'
_Modify_portal_content_Permission: ( 'Editor', 'Manager', 'Owner')
_View_Permission: ( 'Editor', 'Manager', 'Owner', 'Reader')
_Sharing_page__Delegate_Editor_role_Permission: ( 'Editor', 'Manager', 'Owner')
_List_folder_contents_Permission: ( 'Editor', 'Manager', 'Owner', 'Reader')
_owner: (
[ 'acl_users']
'myadmin',
)
_md: <Persistence.mapping.PersistentMapping>
data: {
"creators": ( u'myadmin',)
"description": <Products.Archetypes.BaseUnit.BaseUnit>
mimetype: 'text/plain'
binary: 0
filename: ''
raw: u'test3 description'
__name__: 'description'
id: 'description'
original_encoding: 'utf-8'
size: 17
"effectiveDate": None
"rights": <Products.Archetypes.BaseUnit.BaseUnit>
mimetype: 'text/html'
binary: 0
filename: ''
raw: u''
__name__: 'rights'
id: 'rights'
original_encoding: 'ascii'
size: 0
"subject": ( )
"language": u''
"location": u''
"contributors": ( )
"allowDiscussion": None
"excludeFromNav": False
"expirationDate": None
}
_Sharing_page__Delegate_Reviewer_role_Permission: ( )
__ac_local_roles__: {
"myadmin": [ 'Owner']
}
inside_refs: [ ]
_referenced_data: { }
id: '1'
...
id: '7496765414'
...
__ac_local_roles__: {
"myadmin": [ 'Owner']
}
(the script can actually supress following down certain classes. I have surpressed DateTime, but left in the Products.ATContentTypes.content. Makes this output a bit longer to read, but this way I can see the cmf_uid annotation of the cloned object).
And this now really helped me to get the picture of whats going on.
- The portal_historiesstorage is the plone container of the history
- it stores plone related information in the _shadowStorage._storage
- information in the _storage uses the cmf_uid as the key
- under _shadowStorage._storage._full\'0\' lies the information for the first version (a.k.a revision) of the object. Note that here its 0, not 1.
- in vc_info is the internal history_id, which is the longer number. Please note that this history_id is not the one used in methods of portal_repository, there they actually mean the cmf_uid.
- the zvc_repo contains the actual object related data, not only meta information.
- there is an assumption of branches, but only 'mainline' seems to be used
- there is one ZopeVersionHistory Object per plone object, and that is filed in _histories under longer random history_id, e.g. 7496765414
- The ZVH object has information about branches, labels, that are not really used, the eventLog (which contains the message upon saving a new revision) and then the actual version, starting to count at '1'.
- under _data._object.object is the actual revision data. This is a clone of the original object, with all references to other objects replaced by 'placeholder' objects. Haven't looked at that those references, because its quite a lot already. Only interesting to see how cPickle is used a lot in the whole process to create clones of objects, estimate the sizes on the disk etc.
- the clone contains again the cmf_uid, which is used as the key in the _shadowStorage._storage.
With this in mind, was ready again to look at the 'documentation' of the highest level object in cmfedition land, the portal_repository (CopyModifyMergeRepositoryTool). The interface documentation is in CMFEditions/interfaces/IRepository. It seems that there history_id is always the cmf_uid. Selector can be one of two things, either a counter of versions, or the key in _versions in the zvc_repo. Also very interesting is the test06_retrieveWithNoMoreExistentObject in tests/test_CopyModifyMergeRepository. Here its interesting how a temporary object is used as a carrier for the cmf_uid we are really interested in.
Results
So, it seems that reason for the bloat in the Data.fs is lots of old revision of images, where the plone object no longer exists. By creating temporary Images with cmf_uids of the 'lost' Images, and then calling
portal_repository.purge(temp_obj, selector_aka_version_id_or_counter)
I should be able to purge really all old version, or run a modified version of the script that zupo posted in
http://stackoverflow.com/questions/9683466/purging-all-old-cmfeditions-versions
What I realy learned though are two things: documenting research steps rocks. Without notes I would have been completely lost in this maze. Being able to see the data structure that code operates on seems to be a really important tool to understand plone land. Before python, I used sql or phpmyadmin a lot to understand the data structures. This gave me a mental picture of the data the code in the app worked on. Data structures told me sooo much more then the code about the application. In zope land it used to be the zmi, where one could poke around the data structures. It helped a lot. plone land however the zmi gets more and more abandoned, and its always a huge pile of hidden layers of adapters and annotations. Even though there might be interfaces, tests and source - the overall picture for me is in the data structures, and that is (or was) missing.
Update 17.7.2012
Here is a script to delete old revisions from the histories. After this, the Data.fs packs down to 270 MB
import os
from os import environ
from StringIO import StringIO
import logging
from AccessControl.SecurityManagement import newSecurityManager
from AccessControl.SecurityManager import setSecurityPolicy
from Testing.makerequest import makerequest
from Products.CMFCore.tests.base.security import PermissiveSecurityPolicy, OmnipotentUser
# Force application logging level to DEBUG and log output to stdout for all loggers
import sys, logging
root_logger = logging.getLogger()
root_logger.setLevel(logging.DEBUG)
handler = logging.StreamHandler(sys.stdout)
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
handler.setFormatter(formatter)
root_logger.addHandler(handler)
def spoofRequest(app):
"""
Make REQUEST variable to be available on the Zope application server.
This allows acquisition to work properly
"""
_policy=PermissiveSecurityPolicy()
_oldpolicy=setSecurityPolicy(_policy)
newSecurityManager(None, OmnipotentUser().__of__(app.acl_users))
return makerequest(app)
# Enable Faux HTTP request object
app = spoofRequest(app)
portal = app.Plone
hs = portal.portal_historiesstorage
ppp = portal.portal_purgepolicy
shadowstorage = hs._shadowStorage._storage
keys = sorted(shadowstorage.keys())
result = portal.portal_catalog()
cataloged = set()
for r in result:
cataloged.add(r.cmf_uid)
extra = [k for k in keys if k not in cataloged]
#lets clean the historiesstorage
for key in extra:
print 'purging %s (%s)' % (key,len(hs.getHistory(key,countPurged=False)))
while len(hs.getHistory(key,countPurged=False)):
hs.purge(key,0,metadata=dict(sys_metadata=dict(comment='clear by jhb')),countPurged=False)
import transaction; transaction.commit()
print 'fini'