Saturday, August 11, 2007

Resolving Sabayon and Gentoo Peformance Issues

Over the last few days, I've taken a good hard look at performance issues related to Sabayon and Gentoo. Several people have noted that Sabayon seems slow, so I tested out 3.4e. I noticed that Portage was horrendously slow, and discovered the first culprit - Beagle, a search tool that is enabled by default under Sabayon. I talked to Fabio (the Sabayon guy) about this and he is going to look into disabling it by default in the next release, since it can potentially have such a negative impact on system performance, particularly Portage performance.

However, even without Beagle, it still appeared that Portage was  having severe performance issues with emerge -up world. I was using Portage 2.1.3.3 and tried downgrading to 2.1.2.11, and the performance issues went away - the older 2.1.2 version of Portage was 2x faster on emerge -up world! Something was definitely going on here.

On IRC, I visited #gentoo-portage and Ferringb (pkgcore lead) suggested I do some performance profiling of both 2.1.3.3 and 2.1.2.11, which I did, and I found that 2.1.3.3 was making 3x the function calls of 2.1.2.11. I tracked down zmedico (main Portage developer) online, who requested my profile data, and he was able to quickly track down the problem, which was related to some code breakage in the 2.1.3 release. He sent a patch over to me, which I tried, and it fixed the problem - Portage 2.1.3.3 with the patch was now running at normal speed again.

I also had an opportunity to chat with zmedico about the future direction of Portage and the challenges that Portage is trying to solve for users and developers. This gave me some additional insight into what approaches could be used to help improve Gentoo in the future. From my perspective, one challenge that Portage is facing is that it is essentially trying to achieve several divergent goals - be a ports system for a meta-distribution and also provide a good and safe user experience for Gentoo users. In some cases, Portage can't really do a good job in both areas at the same time. 

Here's why. As a meta-distribution, Gentoo can have very complex dependency chains. However, as a user-focused distribution, you kind of want the dependency chains in Gentoo to be as straightforward and elegant as possible, without any weird conflicts - in other words, have developers do a lot of the heavy lifting to make dependencies less-fine grained and eliminate strange corner cases and blockers. Yet this hard work impacts the ability of Gentoo developers to keep the Portage tree up-to-date.

So you basically end up in a situation where some developers push toward the goal of Gentoo the ever-capable meta-distribution, full of possibilities, causing dependency chains get more complex and sophisticated, resulting in associated sophisticated problems. Meanwhile, Portage is pursuing a goal of protecting users from broken systems to compensate for the increased sophistication in the Portage tree. In many cases this means that Portage is starting to emphasize more internal checks, more involved dependency calculations, and the use of blockers to prevent emerges that could potentially break things.

But many Gentoo users want things to just work and don't want to deal with blockers. Essentially, they want to have the developers do the heavy lifting to make things "just work." They don't necessarily want to deal with the complex issues of the Portage tree.

My suggestion to address this problem is to have a larger Gentoo ecosystem that contains several independent projects consisting of small development teams, all collaborating and each focusing on delivering value to a specific target audience. Frankly, some teams need to be able to add stuff to their internal Portage tree without worrying about QA, red tape and policies so they can be productive. For a developer-facing team, you want to maximize innovation and rapid development, and sometimes creating a masterpiece involves making a bit of a mess first.

In contrast, user-facing teams need to focus on making sure that their Portage tree, which is delivered directly to users, it always 100% consistent, tested, sane and that any major changes are carefully planned and deployed.

Basically, developers focusing on different problems need to have the freedom to use different approaches in their development style to remain productive and happy. And similarly, different target audiences have different needs from their package management tools.

I don't think it's possible to have Portage be a great ports system and also have it be a great general package manager and preserver of long-term system integrity. At the very least, I think it is monumentally difficult to design one tool do everything for everybody, because too many compromises need to be made. I think a far better approach is to encourage specialized tools and approaches where it makes sense. It's far better to try to do one thing well than try to do hundreds of things in a mediocre way.

So here are the basic steps: Keep the tools focused on doing a specific thing really well. Transition development teams into small independent groups with their own local policies and approaches, so they can better focus on delivering value to a specific target audience the way that only THEY know how. Then create a larger collaborative ecosystem to tie everything together.

Not too hard, eh? :)

27 comments:

Eitan said...

IMHO disabling Beagle (or Tracker) is not a simple choice. It should be thought about since search is a powerful feature to have on the desktop. The next release of Gutsy should have Tracker on by default.

Anonymous said...

is it really all that useful? I had it on my box (running mepis), and then got rid of it because I never used it. Ubuntu is not Gentoo, and what they do should not really affect what gentoo does. Furthermore it seems like a pretty straight forward decision to me seeing as it is affecting a rather essential program to the system in such a big way.

To me all this fuss about a desktop search is unwarrented...I never really found it that useful

Wolfger said...

I've never felt the need for Beagle at all. Not sure why other people do, but if it's a performance drain on a critical application (like, say, *portage*), then it should definitely be off by default rather than on by default.

Eitan said...

Ubuntu might not be Gentoo, but if Sabayon aims to be a polished desktop distribution it should consider user's needs.

Sure, if I am searching for something I'll use grep just like the next geek. But that is not who Sabayon is targeting, their first point in "What makes us different" on their website is "Our decisions are driven by you, our users". If Sabayon's users are also it's developers, like Gentoo, it's fine. But if there is any aspiration to target the common end user, they should listen to real users.

Real users want a way to search photos, movies, IM conversations, e-mails, documents, web browsing history, etc. On windows they have that Vista thing, or Google desktop, on Mac OS X they have spotlight, and on Linux they have Beagle or Tracker.

Anonymous said...

Beagle was such a hog it made my system unusable. It should never be enabled by default.

If users just want a standard distribution then they should use Ubuntu or a binary Sabayon Distro. I like the way portage or more precisely pkgcore is going. The focus on speed is just wrong.(Speed is important but should not be the central requirement) I fact that I can install packages with complex dependencies is one of Gentoo strengths. Gentoo should not try to become just another Distro......

Fabio Erculiani said...

I think that Beagle should just fire up on demand. That's the best choice for both points of view

Arne said...

Yup emerge -up world also has just speeded up quite a bit on my system after I upgraded from 2.1.3.3 to 2.1.3.5, except for this blog this is the first positive change that I feel and know of that resulted from Daniel being back with Gentoo, nice :)

I look forward to "good old drobbins" helping many other positive changes in my favorite (and only OS) in the future.

eris23 said...

Any potential resource hog (e.g. Beagle) should default to asking the user for approval before running. I run Sabayon, have multiple 300GB external USB drives formatted NTFS, run KTorrent, and like to have Rhythmbox keep itself up-to-date. It mostly works. Throwing Beagle in as well is the proverbial straw that broke the camel's back. Windows was more problematic.

Anonymous said...

Daniel Robbins is the man!!

Daniel I'm so glad you're getting fueled once again about Gentoo. You should be leading this monster, I've always thought so.

Anonymous said...

Amen (to subprojects). I've been waiting for this for years: Gentoo is a meta-distribution sorely in need of some distributions. Portage opens up the space to do exciting things, but leaving the rest to the user is far from ideal: developer teams focused around specific package-sets and/or system roles are what's needed, so that users can test these specific distributions (and so contribute to their development via bug-testing etc.) rather than testing their own unique setups and hoping that portage can somehow cope with an infinite variety of situations.

tekwyzrd said...

I can't help wondering how ubuntu got dragged into this matter. Ubuntu has nothing to do with portage or Sabayon.

As for Beagle, with every linux install I do the first thing I do after the system is running is get rid of Beagle. In openSUSE it made my computer so slow it was nearly useless.

I'm typing this on a computer running Sabayon 3.4e. Beagle has been removed. Don't use it - don't want it. My files are well organized. I remember where I put things. If I need to search for a file I use Tools - Find File in Konqueror.

Desktop search utilities like Beagle are a waste of resources, a useless duplication of functions, and promote disorganized use of your computer.

That's my opinion. I realize that others will disagree. For this reason I like Fabio's suggestion.

Anonymous said...

Hello.

As a user of both Gentoo and FreeBSD/OpenBSD. Why is it that portage is more troublesome than say the freebsd ports system?

Anonymous said...

I agree, if Beagle is a resource drain it shouldn't be enabled by default.

I like SL linux and where it's going - Compiz fusion set up and running OTB Wow! It would be nice if the upcoming SL mini would ship without Beagle ;-) Hint.

Kartesus said...

Yes... Gentoo must focus on be a meta-distro with specialized teams doing the heavy lift for the user!

Anonymous said...

Kat and its equivalent Beagle can drain your CPU.
I remove both, but especially Beagle which runs on MONO!

Nuf said.

Pete Woods

Anonymous said...

I've been wondering for a while now why we can't move past user modified Portage configuration and siimply use Git for configuration at the user level.

Like what I do? Then Git my config. Want a desktop with KDE for x86_64 and multilib? Then Git a known good (and tested) set of configuration files (package.mask, make.conf, etc) for a desktop.

This lets developers choose to ignore it in the interest of their development while giving the true "users" a one-stop-shopping experience for getting up and running.

Ivan said...

"I've been wondering for a while now why we can't move past user modified Portage configuration and siimply use Git for configuration at the user level."

So a gentoo based distro could be defined in terms of installed packages and a patch for vanilla configs?
I kinda like the idea :)

Timothy Redaelli said...

IMHO portage should die and we should use a different package manager or rewite totally portage

Daniel Robbins said...

Timothy, I happen to totally agree with you. There are several paradigms in Portage that need to evolve quite a bit to make it more flexible and powerful.

Nathan Powell said...

Hey Robbins. I was wondering if you could provide that patch to us Sabayon Users. Is it an overall portage problem or is it a problem affecting only us Sabayon Users? And yes I do agree with disabling Beagle. It runs at 50% CPU on both cores even when it isn't indexing and I really dislike it.

Enderandrew said...

Daniel,

How do you feel about pkgcore and pauldis? Personally, I'm more of a fan of pauldis, even if it means a bit of a learning curve for Gentoo users. In my opinion, I'm not sure we should constantly be thinking of how to fix portage, but consider if portage is the right tool for the task.

And not only is Beagle a huge performance drain, it also eats up gigs of storage. On a laptop especially you don't want the constant thrashing of your HDD. If you need a good integrated desktop search, I can't recommend Strigi enough. The performance is quite nice.

kerneloftruth said...

recoll, based on xapian also isn't that bad, unfortunately it doesn't seem to index the files that well

in all cases we need a desktop searchtool whose index-building can be started on-demand - in contrast to beagle

pkgcore is a great tool - I prefer it over paludis, it however (still) doesn't seem to work that well as portage in terms of 'emerge -e system' and especially 'emerge -e world'

Onlooker said...

portage not being "all things to all people" is an abstract idea, very.

emerge checks USE flags, deps, compiles the source and installs the binaries into /, so I think i does its job well.

problem is in how dependencies are handled, but no other distro or OS seem to have done it perfectly either. portage is as good as apt-get/pacman and years ahead of yum.

Now is it practical to embark on a wholesale rewrite of portage, throwing out hundreds of man-hours of work?(it was not implied, but one can easily assume)
One breakage is not reason enough to do so, when your own patch seems to have fixed the problem.

Among the hundreds of packages in gentoo, a handfull of packages have large/convoluted deps such as gnome or kde. perhaps it is better to start thinking how to simplify those.

kerneloftruth said...

Hi Daniel,

it's been some time since you wrote your last blog entry

everything allright ?

please keep us updated (as your time allows to)

Mat

Daniel Robbins said...

Hey KernelOfTruth,

Yep, everything is all right. I am working several new blog posts after neglecting by blog for a while. Will have some new posts soon! :)

Christopher Friedt said...

Hi Daniel,

I've been a Gentoo user for a long time and I've never had a chance to say thanks :) ... so Thanks!!

Anyway, I was just reading about your suggestions for the future of Portage, and I thought I'd suggest a couple of my own.

I'm quite often tempted to try a set of ~x86 marked packages (>=gnome-2.20 and all of its dependencies for example, or something like a new ATI driver). Sometimes the upgrades work, and sometimes several things are completely broken.

What I think could be much smoother would be having 'restore points' built into portage if one has FEATURES="buildpkgs" set.

Aside from that, I think that more should be done with binary distributions. From a green perspective, why should tens of thousands of users be compiling packages from scratch all the time? We're just wasting precious watts of likely non-renewable-sourced electricity.

Then I thought - well... take the event-space consisting of all of the permutations of use-flags, versions, and arches, and then make a DB with a list of links to matching, pre-compiled binaries.

Of course it's completely massive, but for several servers it shouldn't be too big of a load to distribute between all of them. Especially considering that the table would be relatively sparse due to dependency ranges.

Why not have a google-esque distributed filesystem for all of them... even a P2P binary package system :)

earthshdw said...

Great to see you back on the gentoo front again. Doubly pleased to see that you're willing to look at other gentoo-based distros instead of giving them a snobby purist eye. This is greatly appreciated by those of us who are using Sabayon. I hope you're a role-model for other gentoo people in the future, because we haven't been getting the warmest reception in freenodes #gentoo, yet many of us in freenode's #sabayon channel are giving help to straight gentoo users because they've found us to be more helpful. I hope to see your continued cooperation with other gentoo forks in the future. :)