Category: Alfresco

Alfresco Community Edition

How does one eat a pizza? Most people take a slice in their hand and start biting from the narrow top. Some use a knife and a fork to neatly cut the pizza slice into thin pieces. There are children who scrape the cheesy bits and eat them with a fork and discard the base totally. There could be numerous other ways of eating the same pizza. If not, this will make a good research topic.

The Alfresco community edition is like that ill-fated pizza slice in my opinion. There are innumerable ways in which people use it. Many use the community edition out of the box. Yet a large section use Alfresco Share or Workdesk as the UI with the out of the box community edition server. There are others who deep dive into the code and make the necessary changes to suit them (with or without contributing the changes back to the community). There are others who build their own applications but use Alfresco as the repository.

Alfresco is the numero-uno open source ECM platform out there. Most customers who think of scaling the ECM tree would have downloaded and played with the Alfresco community edition. We did the same thing long ago and decided to use Alfresco community edition as yet another supported repository for our ECM UI framework product.

Having worked with ECM products such as FileNet, we were always apprehensive of the scalability aspect of Alfresco, the community edition to be precise. We have seen trillions of documents going in and coming back from FileNet repositories seamlessly or thousands of users working with their documents and tasks using FileNet based applications. FileNet anyway runs on high horse power servers in a clustered or farmed environment to scale. On the other hand Alfresco’s hardware resource requirements are minimal. I can easily run Alfresco on my 32-bit laptop. Naturally we sell FileNet based solutions to customers who operate high volumes or have many users. Alfresco community edition based offerings are typically for lower volume/lower user customers.

Recently one of our customers in India brought a performance issue with their Alfresco community edition based installation. They have only less than 10 users but have larger volumes of documents. The customer uses our capture solution as well as the document management application that uses the Alfresco community edition as the repository underneath. The issue was that the system was very slow. At the first instance we felt that we were vindicated in our assumptions that Alfresco community edition cannot scale beyond a point.

A closer look at the issue revealed that there might be a way out. The customer uses our capture product to ingest anywhere between 15000 to 25000 documents a day to the repository. All the documents for a day get into a folder specifically created for that date. Further analysis prompted us to think that too many documents in one folder could be the one hindering the performance. So we changed the capture export configuration to create sub-folders within the day folder and limit the maximum documents per sub-folder to less than 2000. SharePoint used to have a performance issue when the number of documents in a folder exceeded 2K and may be that awareness might have helped in trying something like this out. Anyway the change worked like a charm and the repository sucked in the pending documents in a jiffy.

The customer is using the system well and so we are delighted too. As of now the customer has about 4+ million documents in the repository. The entire ECM infrastructure runs on a single lower-end server. The return on investment on this solution has been tremendous. It might not be a bad idea to get an ECM setup on the Alfresco Community Edition after all!

ECM 101

This blog is all about ECM (Enterprise Content Management). I got into this domain accidently about 15 years back and stuck with it almost forever. Like any other technology vertical, ECM also has far too many facets. My comfort zones are content and process management, case management, capture, records management, and forms management.

AIIM, the premier ECM industry association, defines Enterprise Content Management (ECM) as the technologies used to capture, manage, store, preserve, and deliver content and documents related to organizational processes. ECM tools and strategies allow the management of an organization’s unstructured information, wherever that information exists.

ECM, according to me, is all about managing unstructured information and making it available to business transactions. Structured information is data that is defined with attributes and are kept in transactional information systems. Plainly, all the data you can put into database tables and search for is structured information. Anything else could fall into the unstructured category. This could include paper documents, office documents, emails, faxes, images, audio, video etc. Statistically, about 20% of all information that an organization deals with fall into the structured category. This means that a huge load of information lies in unstructured format and it is always difficult to search for and retrieve. ECM fits right there.

ECM is a matured industry with thousands of players present across market bands. While the small and medium segments are crowded with plenty of product vendors, the enterprise segment saw major consolidations in the past couple of years. The discussions in this forum is aimed primarily at the enterprise segment which is dominated by IBM FileNet, EMC Documentum, OpenText, Oracle, and of course Microsoft SharePoint. Other interesting options would be Alfresco and SpringCM.

ECM technologies and products are always complemented by Capture, DW & BI, and DRM. It would be interesting to analyze on how these technology solutions can co-exist with ECM.

The blog however will be heavily inclined to technologies such as FileNet, SharePoint, and Alfresco since the current authors have quite a bit of experience on these technologies. We are planning to induct more authors with diverse platform backgrounds, so that at some point #ECM becomes a blog site with a comprehensive ECM view.