Skip to content. | Skip to navigation

Personal tools

Navigation

You are here: Home / Members / jhb / Filesystem changes in linux

Filesystem changes in linux

by Jörg Baach last modified Dec 06, 2012 12:55 PM
I would like to have a program that monitors any changes to the filesystem, accross all users. How can it be done?

I want to build a metadata server, that does fulltext search, stores links between files, ideally accross users. How can this be done? Especially: how can the server be notified of all the important changes in the filesystem in realtime?

Results

 

With the different technical approaches one of the most important bits showed to be the upstart time. E.g. inotify is a really nice system to get informed. But everytime I would reboot the machine (or my laptop) minutes of idling are required to start up the system.

In the end it seems that right now different solutions are suitable for different scenarios:

  • fanotify: sounds like the best approach, but we have to wait until its there.
  • intofiy if the machine is very rarely restarted, and startup time does not matter. And memory consumption neither
  • samba audit vfs: when startup time is relevant, and at the same time it can be ensured that all file access goes through samba only
  • python-fuse: When startup time is crucial, and only one user accesses the directories (maybe this can be fixed?)

Update December 6th, 2012

Having spent some more time on this issue, I found the following:

  • On Ubuntu 12.04 fanotify works.
  • There is at least one python binding that seems to work: https://bitbucket.org/mjs0/pyfanotify
  • Looking at forum entry it is confirmed - fanotify does not monitor deletes. I will never know when files are removed from the filesystem, so no way to remove them from e.g. my fulltext search engine. This bug entry seems to confirm this.
  • Which would lead back to inotify, which still takes a long time to setup for recursive directory setup
  • Or back to use fuse to write a virtual filesystem (layer), which would notify me of all changes. A nice python binding is fusepy. This works, but is somewhat slow (at least in python)

 

So, the choice is either long setup time but no big perfomance impact by using inotify, or very short setup time, but a performance hit. Great.

Research

inotify

http://inotify.aiken.cz/

 

This seems to be the standard in modern kernels. One needs to add watches for all files, and then gets notified. Problems seem to be the number of watches - its at least one watch per directory. If the system restarts, the watches need to be set again, and a stat is done on each of the dirs. Takes a rather long time.

dnotify

The antecessor of inotify. Seemed to have the issue of blocking filesystems.

fam

http://oss.sgi.com/projects/fam/

 

http://oss.sgi.com/projects/fam/news.html

 

File alteration monitor - doesn't seem to be in use any more

Fanotify

http://lwn.net/Articles/339253/

 

via (http://stackoverflow.com/questions/1835947/how-do-i-program-for-linuxs-new-fanotify-file-system-monitoring-feature

)

"fanotify, built on top of fsnotify, is supposed to replace intofiy which replaced dnotify".

"fanotify has two basic 'modes' directed and global. fanotify directed works much like inotify in that userspace marks inodes it is interested in and gets events from those inodes. fanotify global instead indicates that it wants everything on the system and then individually marks inodes that it doesn't care about."

This is very much exactly what I would want to use, only its not there yet.

 

tripwire

http://tripwire.org/

 

Used for security audits to see if files have changed. This is more part of intrusion detection than a file system change monitor. Needs to be run regulary to scan the filesystem.

samba vfs audit

http://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/VFS.html#id2650781

 

A module for the samba server. Can be configured (it seems) to monitor use and change of files. Requires access to files through samba, of course.

Systemtap

http://sourceware.org/systemtap/

Could be a nice approach, but I haven't managed to write a script that handles the case where files are changed without full path, e.g. 'touch foobar' instead of '/home/joerg/tmp/foobar'. So far I only got notified of a 'foobar' being accessed, but which one?

Python-fuse

https://sourceforge.net/apps/mediawiki/fuse/index.php?title=Main_Page

The idea is to put a small layer on top of the real filesystem. Something along the line of: http://esteve.tizos.net/archives/searchable-filesystem-with-fuse-python/. I modified his script a bit, so that it does not do any indexing, but logs to a file: my proof of concept

llfuse (python)

(update April 8th, 2011)

http://code.google.com/p/python-llfuse/

This seems to be a better fuse binding which actually supports proper release calls. Which means we could act upon having written the file.

There is a ubuntu .deb at:

http://ppa.launchpad.net/nikratio/s3ql/ubuntu/pool/main/p/python-llfuse/

Research links

 

http://www.little-idiot.de/linuxsolutionguide/notify.htm

An older german page, points to changedfiles


http://www.bangstate.com/changedfiles/

Exactly what I would need, but needs a 2.4 kernel

 

http://www.linux.com/archive/feature/150200

http://projects.l3ib.org/trac/fsniper

Fsniper allows watching directories / files

 

http://www.pubbs.net/kernel/200905/109416/

Links to fsnotify/fanotify. From what I see that would be exactly whats needed, but it does not seem to be there (yet).

 

http://esteve.tizos.net/archives/searchable-filesystem-with-fuse-python/

python-fuse driven filesystem with hook to indexing. Maybe a good starting point?

Add comment

You can add a comment by filling out the form below. Plain text formatting.

Question: What is 6 times 7?
Your answer: