State of Operations and Support Points from Vamsee


As described here, we use Vagrant to make sure that everyone has access to the rigth database and system level configurations. This is not completely streamlined. We do have some broken Vagrant files and configuration. Need review of the process and suggest improvements.


  1. Review the way we currently do deployment

There is no specific strategy in terms of deployment. Apart from the EMS the apps that we manage and deploy are not very huge and hence usually take less amount of time to deploy.
The idea generally has been though, to do whatever works and whoever is available. We hope that, with the new apps in the pipeline, this should change. Open to hipster tools to make life easier.

  1. Figure if we need/can use puppet or chef or any other system for deployment
  2. Define best practices here

Monitoring and Alerting

We have New Relic to monitor all the boxes that are frequently used and the apps. In the last few months, all of us have been pretty busy to see if this setup for monitoring is the best way or whether there are things that we can improve.
It will be nice to get a review of what we are monitoring - what is right and what is wrong.

Security and Access Control

After the rather nasty break-in to the development box few months ago, we have been quite careful on the security front. Most of the steps applied are documented here .
Also, a list of all the boxes and their configuration is here .
There's a Trello board on Security which was used to track the progress.

Broadly, we have a very loose access control strategy. Everyone can have access everywhere and we understand that this does not sound alright entirely. We need to review developer/operation roles and set access control mechanisms to respect that.


  1. Review our backup policy
  2. Disaster recovery policy


  1. KLP production server, staging server and the EMS production servers are being backed up by E2E themselves daily with one of their CDP Backups plans. They are backup of the whole system.
  2. KLP Dev server has no backup of files. It has daily and weekly database backups and they are moved to one of the backup servers.

System tuning and Capacity planning

  1. Django + PostgreSQL is a generic standard of all our apps. See if we can tune this better. Setup boilerplates for new machines, new apps and documentation.
  2. Help with capacity planning as we roll out the API and get more data into the system.


  1. We dump whatever documentation possible on Trac. Review if this is scalable, nice and suggest if there is room for improvement.
Last modified 4 years ago Last modified on 03/11/14 20:13:12