Tuning Docuware 6.X on a larger network

Tuning Docuware 6.* on a large network

We are defining a LARGE network as a BUSY Docuware network.  It may only have 5 people adding content and 10 people using that content but those people adding content are killing the system with 10K to 100K images each day.  Even worse they may scan in color and the bigger the images, the more the impact to the entire system.

A LARGE network could also be a busy network with lots of users editing and changing lots of items all day long in cabinets marked for full text indexing.  This can make the system busy all day and all night.

IS may think they just throw more memory and more CPU at it but that normally does not solve the problem.  This is a discussion of the different parts of the architechual foundation and how to make the most of them.

The “Thumbnail service” is a service that not only eats at CPU time but can interrupt and tremendously slow down a system directly affecting the user. Our advice TURN IT OFF!  On smaller systems, even on local systems, the change may not be dramatic but on larger segmented systems the service can add 500% more time to a simple process like un-stapling documents. You can test this very easily by opening a tray and drop a large number of files into it and staple them together. Now with the Thumbnail service turned OFF, un-staple them. Seems instantaneous, staple them back together and turn the service back on and un-staple them again.  You will probably see the circle time imposed over the image outline for seconds an image and not the nearly instant display you had before.  I have seen systems that took 5 minutes to display the images with the service on, off it is as a user expects very quick.  So TURN IT OFF.


“Workflow server” can also be another serious CPU waster. Workflows that bulk update a large portion of a system can eat away and CPU time and slow a system down to a crawl.  Users often have auto imports that run every few minutes to every few seconds all of which are big time users. You can manage this with multiple workflow engines on multiple machines the smart thing is to get it off the main server onto a different server more dedicated to Workflow. Adjusting your workflow filters can really help in this situation.  If you only need to update those records where the ‘name your field’ is blank then filter just those records.  Then make certain that you are working from the smallest database  to find the match in the largest database. This will improve your speed and reduce the time it takes to complete.

“LOCAL Full Text Services/Reading” Although I have yet to see a way to tune the fulltext engines easily the most important thing is to be aware of them, turn them off or move them when you can.  For example, at the workstation when users start to complain it is taking too long to process an import or a scan to the basket then you have 2 things you can do: Remove the thumbnail service and turn off the fulltext/index on the LOCAL machine. The LOCAL machine is being used to OCR the records before they are sent to the server. Unfortunately, Docuware does not take into account the impact that this has on the local machine at the time and it can be devastating.

“Full Text Services on Server” Fulltext as a service is more of an ON or OFF service. Although there are things you can do to adjust the way it runs when doing background work the issue is that it often runs on every image that is displayed. That is because when an image is read (OCR) by the engine the text is stored for indexing but the LOCATIONS of that text are NOT stored.  In order to display the image and highlight the full text selected Docuware must RE-OCR every image as it is displayed and then based on that read highlight the text accordingly.  LOTS of horsepower is used to do this. You can manage this by moving this to another machine. It helps more than you think. 

“SQL” There is a big difference between MYSQL on the local server and MSSQL on a separate server.  It would take a very long time to describe all of the benefits of moving the database to another separate server so I will just say it this way, DO IT! Docuware agrees.

“STORAGE” Storage would not seem to be a thing to move off a server having what you need close sounds like what we need but when you think about it who uses storage more, the users or the services? The answer is simple, the services, even if your user base searches and retrieves every image every day. If a cabinet has full-text enabled the FULLTEXT engine will have seen that document at least 3 times before you did! When you store the record it is read for indexing EVEN IF YOU DON’T USE IT…..for point and shoot. After the image is stored the full-text engine will read and re-index images as needed. Sometimes overnight or during the day whenever there are records to process.  You want the fulltext engine to have access to the images away from the users so the users experience is the best it can be. Even after all reading and re-indexing and storing, the engine will read it once again every time you display it.  It is a great deal of repeat business making it prudent to keep full-text close to storage and away from the users experience.

Docuware Recommends:
Docuware recommends busier systems move SQL Server, Storage and FullText Services to a separate server from the other Docuware services.  I agree with Docuware, there are times where you can build the SQL box with Full Text and attach storage to it but, if your system is really laid out for a large number of users you probably already have a diverse system.  Most clients have a SQL server or server farm, separate storage attached by ISCSI,  Network Attached Storage or CAS.  In these cases you will NOT want to add the full-text service and the NAS as a connection to the SQL Server. 

Many of the larger systems have SQL farms where multiple Microsoft SQL Servers are teamed up to provide fast service. You are not going to be able to move storage or full text onto those systems.  Many large users, including my company, store on Network Attached Storage Devices ( NAS ) and not file servers. NAS and full-text do not go together at all.  You want an application server separate from the rest with it’s own memory and the ability to respond very quickly. Solr has good documentation on how to make the system more responsive and better suited to read documents. 

What we recommend to larger clients is to move the busiest service(s) off the main Docuware server.
The current large format layout for one of the largest systems we have contains 4 servers plus support:

               DATABASE: On the SQL Server/SQL Farm
                 STORAGE:  On NAS
PROCESS SERVICES: WORKFLOW Server
                                     FULLTEXT Server
       USER SERVICES: ALL OTHER DOCUWARE SUPPORT SERVICES on 1 or 2 matched servers

Watching the network and the servers you can see it is a very good balance for the users and the services.

Building a Docuware Foundation:
I once heard a fellow in his martial arts class say, “I’ve got the pajamas, when are we going to kick some heads”.  Your system may be ready to go but you lack the basic structure which may be needed when building a bigger Docuware system.

Trays:
One issue can be heavy scanning/importing users.  Trays are not just a space on your local machine that temporarily stores the images but are now a combination of storage space and SQL table in Database. Seeing that Docuware comes with 1 database setup for everything it can be a real issue when this Database is trying to manage the users, workflow, full-text, upload and your trays not to mention the storage is getting hit from all of the above to present, OCR and update the index when auto index hit them.  We manage this issue by adding a Database for every common group of users which are High Volume Scanners.  This new Database houses all the tables for these Trays within a Department. Each Department with one High Volume Tray location and one Low Volume tray location.

More Databases=More diversity:
More Databases  means more resources to individuals and processes. If you were to review the method SQL uses to determine how it manages resources you would see each database with its own files and memory.  More files, means smaller files and smaller files are faster to access.  So by dividing up the data to multiple Databases you divide up the work.  This is really a multifold benefit as it is easier to manage a single departments needs thru their own individual database then everyone in a single database/storage location.

Clients often have requirements revolving around some common concerns.  Some of these concern mixing Documents from Different Departments into the same Database. Seeing that out of the box Docuware puts all of the file cabinet tables in the same Database you have to know a little about SQL to get around this need.  At one client they have Local, State and Federal Grants used to manage their operation.  Each of these may have rules requiring them not to intermingle data, images and resources with other processes.  We setup a separate storage unit for the Federal Documents and a separate Database for the Federal File cabinets etc.  Docuware can handle this as you can define each of these needs and use them as you see fit.  Now the system will be faster, more reliable and easier to manage and move.

Indexes:
 Indexes are available to make the records in the database more easily accessible. Indexes = speed but they also take up space. Be certain to index the main fields in the database, the key fields people use to find things and NOT the other fields people rarely look at. Example: I you were storing invoices you would index the invoice number and the name of the company/person on the invoice. You WOULD NOT index the amount of the invoice or the address because RARELY will anyone ever look them up using that information.

File cabinet Design:
When you build a file cabinet you can choose to allow the cabinet to have satellites. If you really are going to build satellites then great do it!  If you are thinking about building a satellite in the future, maybe later, then DON”T DO IT! Satellites generate overhead for the database to manage.  That is because it has to note which records are new and which have not been added to the satellites and the more you do not have a satellite the more they buildup.  If you are going to use the satellite then GREAT if not keep the box unchecked!

Storage:
Is storage and issue? It can be or it may not be.  Storage using Docuware is truly unique because you can use different storage locations and each of those locations can be dedicated to only one task if you wanted.  If you had a department who used the lion’s share of space you might have them on their own storage server and moved everyone else to a more common server. This would give you 2 servers for storage and more horsepower for Docuware to use.  Although this does not sound like a big benefit it can be if you run a great deal of imports and auto indexes.  Auto index pulls the xml file and updates that XLM every time you change something and it updates the logs accordingly.  Importing can drain a systems access by busying up a resource which is vital to the rest of the users.  More is not always better but it can be.

From a management point of view in a larger system having different storage locations for different office groups can be easier to manage and backup.  By separating different departments work into different storage locations you can better track how much space each department is using and using auditing techniques you can see which ones are the busiest. They may bill back the departments for the space they use making this technique a major part of their process.

Enterprise systems:
If you are on an Enterprise system chance are you already know some of these things.  Enterprise provides 2 of everything already but they do not have to be on the same machines. Let’s say we already use the suggestions from above and have a SQL Server, a NAS and 2 Full installs of Docuware on 2 Separate Servers.  They can fall back to each other but you can also load balance to them if you desire. You can expand the system by adding 2 more Servers, take the WORKFLOW, FULLTEXT off the 2 Docuware Servers and make 1 of the new boxes a FULLTEXT ONLY Server and 1 a WORKFLOW ONLY server or if you do not have file cabinets using Fulltext you an make 2 PROCESS Servers with Workflow and Full Text on each of them.

There is a good and a bad to this layout and only for the workflow servers, they never fall back to each other.  Workflow is a defined service and MUST be run on the server that it is designed for.  You can only define 1 workflow server to a process.  So having 2 is good and bad.  If forces you to load balance on experience.

You would now have 6 machines to power Docuware.  You would think that is enough.  Perhaps you have very little to store but your users group is really BIG! You might flip this scenario.  If the users are needing more access than I would keep ALL of the Docuware services on the main server and move the web server. 

In the worst case you can add more servers and give every group their own server, Users and Web go on their own server, Authentication, Content and User Support on their own Servers, Workflow and Full text on their own Servers add your SQL Boxes  and Storage Boxes and your system is well diversified.

The main point is not to look to software for all of the problems. Many are just architectural issues and can be solved in a different way.

Conclusion:

Although out of the box Docuware runs very well for many very small or even medium sized companies, it may not be the best configuration for a larger more diverse organization.  Designs for every organization must consider the hardware available and the load on the network as well as the obvious things like number of images and users.  The most obvious are the least of your worries and the foundation of the system is really what sets how everything works together.

We have seen large companies with many users have very low CPU loads with little diversity and we have seen medium sized organizations with 4 or 5 departments bury the computers in so much work they barely meet the need.  Throwing more memory and more processors seems like the place to start, yet it may not work at all.  Looking at the big picture, where the data is going, how it is getting there and what happens to it as it is processed is more than a low horse power issue. It could be that the CAS is too slow to meet the demand or the NAS is not diversified enough, or sometimes it is simply that full text and storage needs to move off the main server to its own box.

Although Docuware has recommendations for building and using their system remember they are being built on other systems they have no control over. Storage, Network even the SOLR full text engine can all be optimized by you for a very fast and efficient system.

In the end, Docuware is very fast and capable of meeting any size company’s needs.