Class ResizeJobFlowStep


  • public class ResizeJobFlowStep
    extends Object
    This class provides some helper methods for creating a Resize Job Flow step as part of your job flow. The resize step can be used to automatically adjust the composition of your cluster while it is running. For example, if you have a large workflow with different compute requirements, you can use this step to automatically add a task instance group before your most compute intensive step.
     AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey);
     AmazonElasticMapReduce emr = new AmazonElasticMapReduceClient(credentials);
    
     HadoopJarStepConfig config = new ResizeJobFlowStep()
         .withResizeAction(new ModifyInstanceGroup()
             .withInstanceGroup("core")
             .withInstanceCount(10))
         .withResizeAction(new AddInstanceGroup()
             .withInstanceGroup("task")
             .withInstanceCount(10)
             .withInstanceType("m1.small"))
         .withOnArrested(OnArrested.Continue)
         .withOnFailure(OnFailure.Continue)
         .toHadoopJarStepConfig();
    
     StepConfig resizeJobFlow = new StepConfig()
         .withName("Resize job flow")
         .withActionOnFailure("TERMINATE_JOB_FLOW")
         .withHadoopJarStep(config);
    
     RunJobFlowRequest request = new RunJobFlowRequest()
         .withName("Resize job flow")
         .withSteps(resizeJobFlow)
         .withLogUri("s3://log-bucket/")
         .withInstances(new JobFlowInstancesConfig()
             .withEc2KeyName("keypair")
             .withHadoopVersion("0.20")
             .withInstanceCount(5)
             .withKeepJobFlowAliveWhenNoSteps(true)
             .withMasterInstanceType("m1.small")
             .withSlaveInstanceType("m1.small"));
    
     RunJobFlowResult result = emr.runJobFlow(request);
     
    • Constructor Detail

      • ResizeJobFlowStep

        public ResizeJobFlowStep()
        Creates a new ResizeJobFlowStep using the default Elastic Map Reduce bucket (us-east-1.elasticmapreduce) for the default (us-east-1) region.
      • ResizeJobFlowStep

        public ResizeJobFlowStep​(String bucket)
        Creates a new ResizeJobFlowStep using the specified Amazon S3 bucket to load resources.

        The official bucket format is "<region>.elasticmapreduce", so if you're using the us-east-1 region, you should use the bucket "us-east-1.elasticmapreduce".

        Parameters:
        bucket - The Amazon S3 bucket from which to load resources.
    • Method Detail

      • withResizeAction

        public ResizeJobFlowStep withResizeAction​(ResizeJobFlowStep.ResizeAction resizeAction)
        Add a new action for this step to perform. These actions can be to modify or add instance groups. This step supports multiple actions, but requires at least one be specified.
        Parameters:
        resizeAction - An instance of ResizeAction defining the change.
        Returns:
        A reference to this updated object so that method calls can be chained together.
      • withWait

        public ResizeJobFlowStep withWait​(boolean wait)
        Specifies whether the step should wait for the modification to complete or if it should just continue onto the next step once the modification request is received. Defaults to true.
        Parameters:
        wait - Whether this step should wait for the modification to complete.
        Returns:
        A reference to this updated object so that method calls can be chained together.
      • withOnArrested

        public ResizeJobFlowStep withOnArrested​(ResizeJobFlowStep.OnArrested onArrested)
        What action this step should take if any of the instance group modifications result in the instance group entering Arrested state. This can happen when the bootstrap actions on the newly launched instances are continuously failing.
        Parameters:
        onArrested - Enum specifying which action to take.
        Returns:
        A reference to this updated object so that method calls can be chained together.
      • withOnFailure

        public ResizeJobFlowStep withOnFailure​(ResizeJobFlowStep.OnFailure onFailure)
        What action this step should take if the modification fails. This can happen when you request to perform an invalid action, such as shrink a core instance group.
        Parameters:
        onFailure - Enum specifying which action to take.
        Returns:
        A reference to this updated object so that method calls can be chained together.
      • toHadoopJarStepConfig

        public HadoopJarStepConfig toHadoopJarStepConfig()
        Creates the final HadoopJarStepConfig once you are done configuring the step. You can use this as you would any other HadoopJarStepConfig.
        Returns:
        HadoopJarStepConfig configured to perform the specified actions.