/ DynamoDB

DynamoDB and Java Spring as best friends

4youngpadawans.com presents DynamoDB and Java Spring as best friends featuring DynamoDB | Spring


If you're coming from the world of relational databases and tabular, predefined models and, when you encounter on some NoSql database for the first time, you will be blown away by new concepts: Key-Value database, Document-oriented database or Wide column database (table with rows that can have different column formats!!?). You will think for sure: I always missed this freedom of choice!

But is this freedom endless?
Is "NoSql modeling freedom" served with some hidden price to pay?

Well... it is up to you to decide if you are willing to dig little bit deeper and try to understand what NoSql actually means.

In this article I will try to explore advantages (and point out some "constraints") of DynamoDB NoSql database using Java DynamoDB helper libraries and Spring, powerful back-end framework.

Use case

It is not a secret that DynamoDB is not suitable for all scenarios.
Actually, you can use DynamoDB as any other (NoSql) database but, if you don't embrace its major advantages, you will end up paying a lot of money for very little or without any gain at all.

I'll try to imagine perfect scenario for DynamoDB:

  • fleet of 1000 cars is equipped with GPS tracking devices,
  • cars are (self)driven through busy and crowded city streets doing whatever they need to do (more or less successfully) and sending GPS positions to the back-end every 1 minute,
  • each week city mayor's deputy is randomly announcing that some of majors city streets will be closed due to construction works,
  • at the end of each week, business desperately needs history of car positions to post-process it and try to optimize routes for next week.

In this use case, where 1000 cars will send 1000 x 24 x 60 x 7 = ~ 10 million locations per week, true power of DynamoDB can be seen (with a little help of Java and Spring friends to make data processing and serving more convenient).

Choose table keys wisely

DynamoDB is designed to store related groups of table rows in physical partitions (similar like partitions on hard disk) to speed up access and data reading.

Hash and range keys

Table key responsible for creating partitions is called hash key (or simple primary key). In our case, hash key will be unique carId, internally dividing our CarPositions table into partitions.
We'll need historical data from our table executing ranged queries against car location timestamp. DynamoDB is capable to speed up our search queries by utilizing so called range key (or sort key). We will define our gpsTime as range key.

Hash key (carId) and range key (gpsTime) will form composite primary key uniquely identifying one row in our table.

Our CarPositions table model can be represented with Java POJO class powered by DynamoDBMapper annotations
CarPosition.java

import com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBAttribute;
import com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBHashKey;
import com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBRangeKey;
import com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBTable;

import java.util.Calendar;

@DynamoDBTable(tableName = "CarPositions")
public class CarPosition {

    //table partition (hash) key
    @DynamoDBHashKey(attributeName = "CarId")
    private Long carId;

    //table sort (range) key
    @DynamoDBRangeKey(attributeName = "GpsTime")
    private Calendar gpsTime;

    @DynamoDBAttribute(attributeName = "ServerTime")
    private Calendar serverTime;

    @DynamoDBAttribute(attributeName = "GpsData")
    private GpsData gpsData;

    //getters and setters...
}

Note that in above class attribute GpsData is complex and is represented as Java class too. It means that GpsData will be stored as key-value map inside single column of our table.
GpsData.java

import com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBAttribute;
import com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBDocument;

@DynamoDBDocument
public class GpsData {

    @DynamoDBAttribute(attributeName = "Latitude")
    private Float latitude;

    @DynamoDBAttribute(attributeName = "Longitude")
    private Float longitude;

    @DynamoDBAttribute(attributeName = "Speed")
    private Integer speed;

    @DynamoDBAttribute(attributeName = "Course")
    private Integer course;

    //getters and setters...
}

Now we have data table modeled but where we will store the actual data?
Let's stop for a moment and create DynamoDB database.

Running DynamoDB locally with Docker

DynamoDB is Amazon's in-house solution provided as a service with advantages such as auto scaling, backup, distribution in different geographical regions and powerful Amazon UI to create database and monitor and analyze performance.
Since we are (still) poor engineers, learning how to run DynamoDB locally is essential :)

Note that by running it locally we will loose advantages mentioned above but we will have completely local development environment.

The easiest way is to install and run DynamoDB on your local machine is to use Docker image.
If you are on Windows and you are not so familiar with Docker you can setup Docker Toolbox which comes with handy Kinematic UI.

In no time your DynamoDB will be up and running

dynamodb_docker_kinematic

Now lets go back to development.

Creating DynamoDB schema with Java

Since we aim to serve GpsPositions via web service, database client functionality is needed. To make it handy for development purposes, database client will, upon start, create DynamoDB schema.

We can make our db client singleton and easily accessed by using Spring's @Component annotation
DynamoDbClient.java

import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.client.builder.AwsClientBuilder;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
import com.amazonaws.services.dynamodbv2.document.DynamoDB;
import com.amazonaws.services.dynamodbv2.document.Table;
import com.amazonaws.services.dynamodbv2.model.AttributeDefinition;
import com.amazonaws.services.dynamodbv2.model.DeleteTableResult;
import com.amazonaws.services.dynamodbv2.model.KeySchemaElement;
import com.amazonaws.services.dynamodbv2.model.KeyType;
import com.amazonaws.services.dynamodbv2.model.ProvisionedThroughput;
import com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputDescription;
import com.amazonaws.services.dynamodbv2.model.ScalarAttributeType;
import com.amazonaws.services.dynamodbv2.model.TableDescription;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;

import javax.annotation.PostConstruct;
import java.util.Arrays;
import java.util.List;

@Component
public class DynamoDbClient {
    public static final String TABLE_CAR_POSITIONS = "CarPositions";
    private AmazonDynamoDB dbClient;
    
    @Value("${dynamoDB.endpoint}")
    private String dbEndpoint;

    @Value("${dynamoDB.clientId}")
    private String dbClientId;

    @Value("${dynamoDB.password}")
    private String dbPassword;

    @PostConstruct
    public void init() {
        dbClient = createDynamoDBClient();
        createTableIfNotExists(TABLE_CAR_POSITIONS);
    }

    private AmazonDynamoDB createDynamoDBClient() {
        return AmazonDynamoDBClientBuilder.standard()
                .withEndpointConfiguration(
                        new AwsClientBuilder.EndpointConfiguration(
                                dbEndpoint,
                                Regions.US_EAST_1.getName()
                        )
                )
                .withCredentials(
                        new AWSStaticCredentialsProvider(
                                new BasicAWSCredentials(dbClientId, dbPassword)
                        )
                )
                .build();
    }

    public AmazonDynamoDB getDbClient() {
        return dbClient;
    }
...

Every time db client is constructed (initialized), it will check if database table CarPositions exists. If it does not exist, one will be created with necessary schema keys and attributes (columns) we modeled before

...

    private void createTableIfNotExists(String tableName) {
        if (getTableList().contains(tableName)) return;
        DynamoDB dynamoDB = new DynamoDB(dbClient);
        try {
            //table keys
            KeySchemaElement partitionKey = new KeySchemaElement("CarId", KeyType.HASH);
            KeySchemaElement sortRangeKey = new KeySchemaElement("GpsTime", KeyType.RANGE);
            //table attributes
            AttributeDefinition carIdAttr = new AttributeDefinition("CarId", ScalarAttributeType.N);
            AttributeDefinition gpsTimeAttr = new AttributeDefinition("GpsTime", ScalarAttributeType.S);

            System.out.format("Creating table %s", tableName);
            Table table = dynamoDB.createTable(tableName,
                    Arrays.asList(partitionKey, sortRangeKey),
                    Arrays.asList(carIdAttr, gpsTimeAttr),
                    new ProvisionedThroughput(1L, 1L));
            table.waitForActive();
            System.out.format("Table %s created. Status: %s\n", tableName, table.getDescription().getTableStatus());
            dumpTableInfo(tableName);
        } catch (Exception e) {
            System.err.format("Unable to create table: %s\n", tableName);
            System.err.println(e.getMessage());
        }
    }
    
    private List<String> getTableList() {
        List<String> tableNames = dbClient.listTables().getTableNames();
        if (tableNames.isEmpty()) {
            System.out.println("There are no tables yet");
        } else {
            System.out.println("DB tables:");
            for (String tableName : tableNames) {
                dumpTableInfo(tableName);
                System.out.println("============================");
            }
        }
        return tableNames;
    }
    
...
}

Provisioned Throughput

You probably noticed that I used new ProvisionedThroughput(1L, 1L)) while creating table.

If you are using DynamoDB as Amazon's payed service, pay extra attention on table's Provisioned Throughput because that's where the catch is!

Provisioned Throughput is actually read/write capacity of DynamoDB table.
It determents maximum allowed frequency of reads and writes. If you are not using On Demand setup (auto scaling based on actual frequency of reads/writes), carefully planning provisioned throughput can save you a lot of money.

API endpoint with DynamoDB client

Exposed web service endpoint needs to have 2 request methods

  • GET - to accept search queries, read from database and return car positions
  • POST - to accept positions sent by cars and save them into database
    CarPositionQueryController.java
import com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapper;
import com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapperConfig;
import com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBQueryExpression;
import com.amazonaws.services.dynamodbv2.model.AttributeValue;
import com.dmi.dynamows.components.DynamoDbClient;
import com.dmi.dynamows.entities.CarPosition;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.format.annotation.DateTimeFormat;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.CrossOrigin;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RequestParam;

import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

@Controller
@RequestMapping(path = "api/v1/positions")
public class CarPositionQueryController {

    @Autowired
    DynamoDbClient dynamoDbClient;

    private static final SimpleDateFormat ISO_DATE_TIME_FORMATTER = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'");


    @CrossOrigin(origins = "*")
    @RequestMapping(method = RequestMethod.GET)
    public ResponseEntity<?> queryPositions(@RequestParam(value = "carId") Long carId,
                                            @RequestParam(value = "startTime")
                                            @DateTimeFormat(iso = DateTimeFormat.ISO.DATE_TIME) Date startTime,
                                            @RequestParam(value = "endTime")
                                            @DateTimeFormat(iso = DateTimeFormat.ISO.DATE_TIME) Date endTime) {

        DynamoDBMapper mapper = new DynamoDBMapper(dynamoDbClient.getDbClient());

        String query = "CarId = :carId and GpsTime between :startTime and :endTime";

        /*
        Querying on ServerTime attribute will fail because ServerTime is not defined as ranged key.
        For example, try this
        String query = "CarId = :carId and ServerTime between :startTime and :endTime";
        and it will fail with database exception
        */

        Map<String, AttributeValue> valueMap = new HashMap<>();
        valueMap.put(":carId", new AttributeValue().withN(String.valueOf(carId)));
        valueMap.put(":startTime", new AttributeValue().withS(ISO_DATE_TIME_FORMATTER.format(startTime)));
        valueMap.put(":endTime", new AttributeValue().withS(ISO_DATE_TIME_FORMATTER.format(endTime)));

        DynamoDBQueryExpression<CarPosition> queryExpression = new DynamoDBQueryExpression<>();
        queryExpression.withKeyConditionExpression(query).withExpressionAttributeValues(valueMap);

        List<CarPosition> carPositions = mapper.query(CarPosition.class, queryExpression);

        /*
        Some post-processing can be done at this point
        It should be easy now because we have positions mapped as list of Java POJOs
        */

        return ResponseEntity.ok().body(carPositions);
    }

    //this method will accept positions posted by our remote cars
    @CrossOrigin(origins = "*")
    @RequestMapping(method = RequestMethod.POST)
    ResponseEntity<?> create(@RequestBody CarPosition carPosition) {

        DynamoDBMapper mapper = new DynamoDBMapper(dynamoDbClient.getDbClient());

        Calendar calendar = Calendar.getInstance();
        calendar.setTime(new Date());
        carPosition.setServerTime(calendar);

        mapper.save(carPosition, DynamoDBMapperConfig.DEFAULT);
        //It would be useful for you to peek into DynamoDBMapperConfig.DEFAULT value
        //to learn and understand all available options that can used for "saving" into DynamoDB.

        return ResponseEntity.status(HttpStatus.CREATED).body(carPosition);
    }
}

Querying DynamoDB by schema keys

Now I will explore DynamoDB querying trying to understand advantages and some constraints.
I will use parametrized search/filter criteria example from code above
String query = "CarId = :carId and GpsTime between :startTime and :endTime";

This query will return list of positions for particular car (CarId = :carId) in defined time interval (GpsTime between :startTime and :endTime)

Since carId is hash key and GpsTime is range (sort) key and due to DynamoDB specific optimizations on schema keys, our query will be lighting fast.
But there are some DynamoDB constraints worth noticing:

  • attribute defined as hash key (carId) can be queried only with = (equal) operator
  • attribute defined as range key (GpsTime) can be queried using wider set of operators like =, >,<,>=,<=, between
  • attributes that are not defined as keys cannot be queried at all. For example, query by ServerTime attribute CarId = :carId and GpsTime between :startTime and :endTime will fail with database exception reporting that ServerTime is not a key.

I can guess that, after reading last statement, you screamed in surprise

WHAAAT??? We cannot query database by columns that are not keys!?

Well actually you can query by non-key attributes but DynamoDB creators are not calling it querying but scanning.

DynamoDB query vs scan

DynamoDB query utilizes DynamoDB internal optimizations on schema keys enabling immediate retrieval of (sorted) subsets defined by search criteria and thus performing much faster searches and data retrieval.
On the other side, DynamoDB scan is implemented on straight-forward way: database engine first reads all records and then filters them according to given search criteria.
Difference between query and scan is obvious and requires a lot of planning if we want to take the best out of DynamoDB.

Just for the reference here is Spring endpoint that uses DynamoDB scanning to retrieve car positions
CarPositionScanController.java

import com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapper;
import com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBScanExpression;
import com.amazonaws.services.dynamodbv2.model.AttributeValue;
import com.dmi.dynamows.components.DynamoDbClient;
import com.dmi.dynamows.entities.CarPosition;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.format.annotation.DateTimeFormat;
import org.springframework.http.ResponseEntity;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.CrossOrigin;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RequestParam;

import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

@Controller
@RequestMapping(path = "api/v1/positions/scan")
public class CarPositionScanController {

    private static final SimpleDateFormat ISO_DATE_TIME_FORMATTER = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'");

    @Autowired
    DynamoDbClient dynamoDbClient;

    @CrossOrigin(origins = "*")
    @RequestMapping(method = RequestMethod.GET)
    public ResponseEntity<?> queryPositions(@RequestParam(value = "carId") Long carId,
                                            @RequestParam(value = "startTime")
                                            @DateTimeFormat(iso = DateTimeFormat.ISO.DATE_TIME) Date startTime,
                                            @RequestParam(value = "endTime")
                                            @DateTimeFormat(iso = DateTimeFormat.ISO.DATE_TIME) Date endTime) {

        DynamoDBMapper mapper = new DynamoDBMapper(dynamoDbClient.getDbClient());

        String query = "CarId = :carId and ServerTime between :startTime and :endTime";

        Map<String, AttributeValue> valueMap = new HashMap<>();
        valueMap.put(":carId", new AttributeValue().withN(String.valueOf(carId)));
        valueMap.put(":startTime", new AttributeValue().withS(ISO_DATE_TIME_FORMATTER.format(startTime)));
        valueMap.put(":endTime", new AttributeValue().withS(ISO_DATE_TIME_FORMATTER.format(endTime)));

        DynamoDBScanExpression scanExpression = new DynamoDBScanExpression();
        scanExpression.withFilterExpression(query).withExpressionAttributeValues(valueMap);

        List<CarPosition> carPositions = mapper.scan(CarPosition.class, scanExpression);
        return ResponseEntity.ok().body(carPositions);
    }
}

How to put all this in test?

To see and feel the actual DynamoDB performance, you will need to fill it with millions of records. There are 2 options

{
    "carId": 1,
    "gpsTime": "2017-11-20T12:07:00Z",
    "gpsData": {
        "latitude": 41.123333,
        "longitude": 21.123333,
        "speed": 80,
        "course": 90
    }
}

And do it millions of times occasionally changing carId and gpsTime :)
Time consuming? Then you can try to use some automated tool for this purpose or skip to next option

  • option 2 - use all knowledge that you gained reading this and create a new Java method of DynamoDbClient class that will use available Java functions to generate test list of CarPosition objects and save them to database.
DynamoDB and Java Spring as best friends
Share this