Are you ready to begin archive your organization’s email, and other business communication data? Before you install Retain Unified Archiving, it is important to plan the implementation process. We have put together a list of best practices for completing the task. Outlined here are some of the highlights, but you can find the full article on the GWAVA support site, support.GWAVA.com, at this link http://support.gwava.com/kb/?View=entry&EntryID=2434
In planning your Retain archiving implementation, it is important to remember that no two systems are the same. There are many variables that impact performance and system design. No single document could adequately cover those variances, so we have narrowed the focus of this post to discuss the four main concepts you need to consider when planning your Retain installation:
- Retain Architecture
- Archive Job Flow
- Design Considerations
Retain Architecture
Retain can run on a bare metal server, or on a Virtual Machine (VM) running Windows Server, or SuSE Linux. For backup purposes and flexibility, we recommend running it on a VM.
Retain consists of four major components:
- Server – where the archive system is configured and maintained. It is responsible for storing, indexing, searching, and reading archived items.
- Worker – the component that interfaces with the messaging host/mail server containing the messages you are archiving. It retrieves the messages and passes them onto the Retain Server.
- Indexing Engine – scans the messages and attachments to make them searchable, indexing each word. When a user performs a search in his/her Retain mailbox, the list of messages returned is coming from the indexer, not the database. Understanding that difference helps you make decisions on memory configuration choices for Tomcat (which powers the indexer) and for the database.
- Database – stores most of the Retain configuration as well as the message metadata, which is information about the messages being store (subject, sender, recipients, links to attachments, indexed state, etc). When a user logs in to his/her Retain mailbox, the list of folders and their messages are being retrieved from the database.
Archive Job Flow
Design Considerations
There are five major considerations you need to take into account when designing a Retain system:
- Bandwidth
- Storage size
- Storage configuration
- RAM
- Retain configuration
Bandwidth
The Worker will be querying your messaging system for messages to be sent to it and receiving every one of those items; however, not all of those items will be sent over to the Server. If the link between the Worker and the messaging system is slow, you should consider placing the Worker on the messaging system’s server or on a server that has a fast link to the messaging system.
The only downside to this strategy is software updates. When upgrading Retain software, you will have to go to each server hosting a Worker and upgrade its software. If the Worker is running on the Retain Server itself, then the installer automatically upgrades every Worker on the Server.
Storage Size
The trend with disk storage is that it is increasingly becoming less expensive. At GWAVA, we cannot tell you how much storage space will be needed—it is almost impossible to predict with any degree of accuracy. A general rule of thumb is current size + rate of yearly growth.
Understanding the trends of your email growth, the current space consumed by it, and how Retain stores those messages can help you in making your plan. If anything, you’ll want at least enough storage space to handle your initial archive job of all your email to date and up to the next year or two. You can always add more disk space and Retain storage partitions after that.
Storage Configuration
When selecting the type of file system on Linux, we recommend going with XFS because it dynamically creates iNodes whereas ext3 forces you to configure those up front. Once you run out of iNodes, nothing can be written to that disk, even if you have plenty of disk space left.
Since Retain is an archiving solution, it will happily fill up your storage system. While it will warn you of impending doom, it is up to you to keep bad things from happening like the hard drive filling up completely. It is difficult to recover from that since not even the OS works very effectively under those conditions. You need to design in the ability to add extra storage easily.
Other Disk Partitions
You’ll want to separate your archive files from your indexes and from your database, which means two to three other partitions on your Retain Server in addition to your OS partition. If your database is on a separate server from Retain, then only two other partitions are needed; otherwise, you’ll want three additional partitions.
Disk Performance
This comes up a lot. A customer will claim that they are on a fast SAN. Often, it is not a SAN but a NAS and there are many considerations that go into performance. We’ll just touch on a few.
We often see network links to the NAS at 1 Gbps. This is actually slower than a SATA 2 (a.k.a., “SATA 3 Gbps”) or SATA 3 connection (a.k.a., “SATA 3 Gbps”). A SATA 2 connection (which is now getting to be a pretty old standard) is 3x faster than a 1 Gbps network connection. A fast single HDD can saturate a 1 Gbps connection but not quite a 3 Gbps connection with a sequential read/write. 7,200 RPM platter drives usually top out around 160-170 MB/s (or 1.28-1.36 Gbps).
So, knowing that disk I/O is the top issue with archive job performance, it is best to plan out your disk storage accordingly.
In addition to partition considerations, make sure that your storage is reliable. NFS mounts have almost always had problems, so you should shy away from those. NSS volumes are not supported, so do not use them. There are numerous other partition options you can choose, but it is important that you avoid these two in your implementation of Retain.
RAM
The amount of memory depends on the number of active mailboxes you are archiving, the mail volume, your underlying hardware, and how your Retain system will be used.
For small systems (1 – 250 mailboxes), 8GB of RAM might be OK if that’s all you can afford to allocate. It is true that a small Retain system can run on 4GB, but performance will be awful in most cases. If even decent performance matters, you really should not go lower than 8GB unless you are a very small business and have 0 – 50 mailboxes. You might even want to consider trying 12 – 16GB to see what difference that makes for you. For some, it will make a big difference. For others, it may make no difference, because a bottleneck exists elsewhere.
For medium sized systems (250 – 750 mailboxes), 12 – 16GB of RAM should be considered.
For larger systems, 16GB should be considered a minimum. Many large systems require 24 – 48GB of RAM. The more mailboxes and mail volume, the more RAM you should consider giving your Retain server. But, again, we have to emphasize that every system is unique and RAM may not be the biggest performance factor.
Case in point: We have a customer with 700 users that found allocating 24GB of RAM made a big difference. In another case, a customer that had 1,500 users needed only 12GB. We have systems with thousands of mailboxes and those systems do benefit from increased memory allocation, but their needs vary.
Retain Configuration
Worker Port to Server
Ensure that the “Server Port” is set to 48080. The use of ports 80 or 443 dramatically slows down the speed of communication between the Worker and the Server since all communication has to pass through the web server. Going direct to the Server’s port of 48080 is recommended unless security requirements mandate otherwise. You can change the port setting at Data Collection | Workers | [select the Worker object] | Connection. If you do need to change it, you’ll need to give the Worker a new bootstrap file.
Indexing All Content
It is recommended that you index all available content types and configure Retain to index all of a file’s content. These settings are found under Configuration | Server Configuration | Index. Mark all the extension types and set the “Stream Size” and “File Size” to “am”.
For more information about these settings see the knowledgebase article
Indexing: Stream Size and File Size Settings Explained
Other design considerations:
Multiple Workers
For larger systems, you can divide up your jobs and assign each job to its own Worker. Only one job can run at a time for a given Worker; thus, if you install multiple Workers, you can now have those jobs run simultaneously. We recommend that each Worker object (created in the Retain Server administration interface) be given a name that describes its location and – if on the Retain server with other Workers – the directory to which it was installed. For multiple Workers on the same server, each Worker program directory will be named RetainWorker (the first installed Worker) and RetainWorkerX (where “X” is the Worker number assigned during the installation).
For installing multiple Workers on Linux, see
Installing Multiple Workers On the Same Server (Linux)
For installing multiple Workers on Windows, see
Installing Multiple Workers On the Same Server (Windows)
For Office 365 and Gmail, you will definitely want multiple Workers, because archiving over the Internet from those systems is very slow— especially with Gmail. In those cases, create multiple distribution lists within those systems and create a separate job for each distribution list.
Profiles
One of the main points of having Retain is to have a complete archive of messages in your system because of data retention policies under which your organization must operate. If your email system’s retention functionality is not understood or properly configured, users could purge items from their trash folder before Retain gets a chance to archive it. For more information, see the knowledgebase article “How Retention Services and Item Store Flags Work“.
Archiving your business communication (email, social, instant massages, web searches and mobile device data) is not only essential for compliance and eDiscovery, but it provides important big data insights into how your company is operating. By following this guide you will get started archiving with the proper foundation for a successful installation and implementation. If you have any questions, you can contact us: questions@GWAVA.com or visit our support site https://support.gwava.com/.
So…what are the next steps you should take?
- Follow this guide to start archiving your organization’s communication with Retain!
- If you are not currently using Retain, please feel free to try it out. You can download a free trial here.
- If you are already using Retain, please read the complete best practices knowledge base article here http://support.gwava.com/kb/?View=entry&EntryID=2434
- Whether or not you are using Retain, be sure to view our administration and end user videos to see Retain in action!
About Retain
Retain is a unified archiving suite primarily focused on email archiving to archive Office 365 and archive Microsoft Exchange, but also includes comprehensive native support for archiving Gmail and archiving GroupWise email.
In addition to email archiving, Retain supports the archiving of all electronic business communication including data created on mobile devices, and social media, as well as built-in content filtering and URL Blocking. For mobile devices, Retain captures your business communication data to allow your organization to archive Android, archive BlackBerry, and archive iOS, including archiving SMS/text messages, BBM Messages, BBM Enterprise, phone call logs, and PIN Messages. Retain archives social media communication for Facebook, Twitter, and LinkedIn.
Retain Insights, introduced last year, is the underlying eDiscovery technology that allows Retain to search, and take action on outside data sources, as well as conduct e-discovery activities across any connected dataset, and is receiving high marks from customers. Retain also includes an improved search engine, which provides search suggestions, keyword tagging, a Boolean search wizard and Regex functions.
Retain is offered as an on-premises, or cloud solution.