[AWS] Step Functions

“Step Functions” is a fully managed long-running, serverless workflow service that provides state machines. It provides a visual interface for you to build and run serverless applications as a series of steps.


Features

AWS Step Functions is a fully managed service that makes it easy to coordinate the components of distributed applications and microservices using visual workflows. It helps you building application from individual components that each perform a discrete function, which lets you scale the applications easily.

  • Step Functions is a reliable way to coordinate components and step through the functions of your application. Step Functions provides a graphical console to arrange and visualize the components of your application as a series of steps.
  • Step Functions logs the state of each step, so when things do go wrong, you can diagnose and debug problems quickly.

Using Step Functions

Step Functions automatically triggers and tracks each step, and retries when there are errors, so your application runs in order and as expected.

  • Each step in your application executes in order, as defined by your business logic.
  • Each function can be implemented as a Lambda function.
  • The output of one step may act as an input to the next.
  • Step Functions ensures your application to be executed in order.
    • The state of each step is logged, so you can track what/where went wrong.
  • Benefits
    • Coordination of distributed components
    • Built-in error handling, retries, and fault tolerance
  • Common Use Cases
    • Orchestration of micro-services
    • Security and IT automation
    • Data Processing and ETL (Extract, Transform, Load)
    • Machine Learning pipeline orchestration

Workflows

StandardExpress
can run for up to one yearcan run for up to only five minutes
Synchronous
: at-most-once execution
: Begins a workflow, waits until it completes, and returns a result
Asynchronous processing
: exactly-once execution (non-idempotent)
: Execution state internally persisted on every state transition
Asynchronous
: No internally persisted state
: at-least-once execution (should be idempotent)
: Begins a workflow and confirms it has started and the result can be found in CloudWatch Logs
Useful for long-running works that need to have an audit historyuseful for high-event-rate workloads such as IoT data streaming
Workflow Types

State Machine

Workflow steps are known as states, and they can perform work via tasks.

A state machine can be defined using JSON-based Amazon States Language (ASL). State machines maintain states (Lambda is a shot-running stateless function) and allow longer running processes.

  • Tasks are performing actions.
  • Activity is a program code that interacts with Step Functions using API actions
    • CreateStateMachine
    • StartExecution
    • StopExecution
    • GetActivityTask
    • SendTaskSuccess
    • SendTaskFailure
  • Lambda function responds to the state machine tasks.

Service Integration

  • Request Response (asynchronous integration)
    • Call a service and let Step Functions progress to the next state immediately after it gets an HTTP response.
    • After making the call, Step Functions will wait to receive an HTTP response before continuing the workflow execution. It will not wait for a notification that the job is complete.

"Send message to SNS":{  
   "Type":"Task",
   "Resource":"arn:aws:states:::sns:publish",
   "Parameters":{  
      "TopicArn":"arn:aws:sns:us-east-1:123456789012:helloTopic",
      "Message":"Hello!"
   },
   "Next":"NEXT_STATE"
}
  • Run a Job (synchronous integration)
    • Call a service, and have Step Functions wait for a job to complete before progressing.
    • After making the call, Step Functions will wait until the job completes before continuing with the workflow.
"Submit Batch Job": {
  "Type": "Task",
  "Resource": "arn:aws:states:::batch:submitJob.sync",
  "Parameters": {
    "JobDefinition": "arn:aws:batch:us-east-1:123456789012:job-definition/myJobDefinition",
    "JobName": "myJob",
    "JobQueue": "arn:aws:batch:us-east-1:123456789012:job-queue/myQueue"
  },
  "Next": "NEXT_STATE"
}

To have Step Functions wait, specify the “Resource” field in your task state definition with the .sync suffix.

  • Wait for a Callback with the Task Token (callback)
    • Call a service with a task token and have Step Functions wait until that token is returned with a payload before progressing.
    • A common use case for this callback integration pattern is a task that requires human intervention.
"Send message to SQS": {
  "Type": "Task",
  "Resource": "arn:aws:states:::sqs:sendMessage.waitForTaskToken",
  "Parameters": {
    "QueueUrl": "https://sqs.us-east-1.amazonaws.com/123456789012/testQueue",
    "MessageBody": {
        "Message": "Hello!",
        "TaskToken.$": "$$.Task.Token"
     }
  },
  "Next": "NEXT_STATE"
}

Amazon SQS is specified as the service in the “Resource” string with the .waitForTaskToken suffix. The Step Functions will indefinitely wait to continue until Amazon SQS returns the task token with either the SendTaskSuccess or SendTaskFailure call.

Use the following documentation to check which service can be used with which service integration pattern.


States

Each state in a state machine makes decisions based on inputs, perform actions, and passes output.

State Types

  • Pass
    • passes any input directly to its output with no work done
  • Task
    • represents a single unit of work performed, such as Lambda or SNS
  • Choice
    • adds a branching decision logic
  • Wait
    • creates a specified time delay by pausing the process
    • use the “Wait” state instead of a Lambda function to wait for a work to complete
  • Succeed
    • completes the execution successfully
  • Fail
    • stops the execution and marks it as failed
  • Parallel
    • runs parallel branches of executions
    • waits until all branches terminate
  • Map
    • runs a set of steps based on elements of an input array

You can retry the task by specifying the “Retry” configuration of the state.

"Get Job Status": {
  "Type": "Task",
  ...
  "Retry": [
    {
      "ErrorEquals": ["States.ALL"],
      "IntervalSeconds": 1,
      "MaxAttempts": 3,
      "BackoffRate": 2
    }
  ]
},

Input & Output Processing

  • A Step Function receives a JSON text as input and passes that input to the first state in the workflow.
  • Each state receives JSON as input and pass JSON as output to the next state.
  • You can use the filters to controls the flow of JSON.
    1. State Input
      • InputPath
        • Select the part of a state input and pass it to a task
      • Parameters
        • Create a collection of key-value pairs that are passed as an input to a task
        • If the value is selected with a JSONPath expression, the key should end in “.$
          • “Parameters” : { “DBName.$”: “$.DatabaseName” }
    2. Task
      • ResultSelector
        • Manipulate a task’s result using the JSONPAth expression before the ResultPath is applied
      • ResultPath
        • Select the result to pass to the output of a state
        • If the ResultPath is unspecified or $:
          • the task result becomes the output
          • the state input is discarded
        • If the ResultPath is a JSONPath expression:
          • the selected task result will be inserted into the state input
      • OutputPath
        • Select the portion of the result as an JSON output (Filtering)
    3. State Output

Data Flow Simulator

  • The best way to learn the input & output processing is to use the “Data Flow Simulator” from the navigation bar in the Step Functions console.

Error Handling

Error Names

Retries

  • ErrorEquals (required)
  • IntervalSeconds (optional)
    • 1 by default
  • MaxAttempts (optional)
    • 3 by default
  • BackoffRate (optional)
    • The multiplier by which the retry interval denoted by IntervalSeconds increases after each retry attempt.
    • The default is 2.
"Retry": [ {
   "ErrorEquals": [ "States.Timeout" ],
   "IntervalSeconds": 3,
   "MaxAttempts": 2,
   "BackoffRate": 1.5
} ]

Fallback States

  • ErrorEquals (required)
  • Next (required)
    • the name of the next stage
  • ResultPath (optional)
    • If you don’t specify the ResultPath field, it defaults to $, which selects and overwrites the entire input.
"Catch": [ {
   "ErrorEquals": [ "java.lang.Exception" ],
   "ResultPath": "$.error-info",
   "Next": "RecoveryState"
}, {
   "ErrorEquals": [ "States.ALL" ],
   "Next": "EndState"
} ]

Best Practices

  • Use Timeouts
"ActivityState": {
  "Type": "Task",
  "Resource": "arn:aws:states:us-east-1:123456789012:activity:HelloWorld",
  "TimeoutSeconds": 300,
  "Next": "NextState"
}
  • Use Amazon S3 ARNs instead of passing large payloads
{
  "Data": "arn:aws:s3:::MyBucket/data.json"
}
  • Handle Lambda service exceptions
"Retry": [ {
   "ErrorEquals": [ "Lambda.ServiceException", "Lambda.AWSLambdaException", "Lambda.SdkClientException"],
   "IntervalSeconds": 1,
   "MaxAttempts": 3,
   "BackoffRate": 1
} ]

Step Functions vs. Simple Workflow

Step functions replace SWF (Simple Workflow) with a serverless version.

  • To make a decision, SWF uses a decider program, and Step Functions use a JSON-based state machine.
  • “Step Functions” provides the visual workflows.
  • “Step Functions” orchestrates multiple AWS resources.
  • SWF provides complete control over the workflow but increases the complexity.

Leave a Comment