This blog outlines our experience moving one of our projects to Continuous delivery Model using GoCD on AWS. Prior to this implementation our code deployments were manual and on demand. We were looking for automated way of deploying code to various environments with minimal manual intervention. GoCD has continuous delivery as a first-class concept and provides an intuitive interface to start building CD pipelines. We started off with a quick PoC to validate some of our understanding and after initial success, we now use GoCD to define all of our deployment/delivery pipelines. This move forced us to have comprehensive test suite and work flow defined to set the criteria to promote code in different environments. The change also increased our ability to push more but smaller changes frequently. Deployment vs Delivery It is fairly often to see the terms Continuous Delivery and Continuous Deployment used interchangeably. For some this is a huge distinction…
Category: Software Development Methodologies
Understanding Resource Allocation configurations for a Spark application

Resource Allocation is an important aspect during the execution of any spark job. If not configured correctly, a spark job can consume entire cluster resources and make other applications starve for resources. This blog helps to understand the basic flow in a Spark Application and then how to configure the number of executors, memory settings of each executors and the number of cores for a Spark Job. There are a few factors that we need to consider to decide the optimum numbers for the above three, like: The amount of data The time in which a job has to complete Static or dynamic allocation of resources Upstream or downstream application Introduction Let’s start with some basic definitions of the terms used in handling Spark applications. Partitions : A partition is a small chunk of a large distributed data set. Spark manages data using partitions that helps parallelize data processing with minimal data…