dzone.com
Open in
urlscan Pro
72.44.47.100
Public Scan
Submitted URL: https://widgets.dzone.com/
Effective URL: https://dzone.com/
Submission: On July 17 via api from US — Scanned from US
Effective URL: https://dzone.com/
Submission: On July 17 via api from US — Scanned from US
Form analysis
0 forms found in the DOMText Content
Thanks for visiting DZone today, Edit Profile * Manage Email Subscriptions * How to Post to DZone * Article Submission Guidelines Sign Out View Profile Post * Post an Article * Manage My Drafts Over 2 million developers have joined DZone. Log In / Join Search Please enter at least three characters to search No search results View All Results Search Please enter at least three characters to search No search results 0 View All Results Refcards Trend Reports Events Video Library Refcards Trend Reports Events View Events Video Library Zones Culture and Methodologies Agile Career Development Methodologies Team Management Data Engineering AI/ML Big Data Data Databases IoT Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security Coding Frameworks Java JavaScript Languages Tools Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks Culture and Methodologies Agile Career Development Methodologies Team Management Data Engineering AI/ML Big Data Data Databases IoT Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security Coding Frameworks Java JavaScript Languages Tools Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks Low-Code Development: Leverage low and no code to streamline your workflow so that you can focus on higher priorities. Download the NEW Trend Report LAST CALL. DZone Security Research: Tell us your top security strategies in 2024, and enter for a chance to win $! Take the Survey Vector databases: Learn all about the specialized VDBMS — its initial setup, data preparation, collection creation, data querying, and more. Read the Refcard Launch your software development career: Dive head first into the SDLC and learn how to build high-quality software and teams. *Affiliate Start the Course* DZONE SPOTLIGHT Wednesday, July 17 View All Articles » Datafaker Gen: Leveraging BigQuery Sink on Google Cloud Platform By Roman Rybak This is a continuation of the article Flexible Data Generation With Datafaker Gen about DataFaker Gen. In this section, we will explore the new BigQuery Sink feature for Google Cloud Platform, demonstrating how to utilize different field types based on the DataFaker schema. BigQuery is a fully managed and AI-ready data analytics platform available on Google Cloud Platform that gives anyone the capability to analyze terabytes of data. Let's consider a scenario where we aim to create a dummy dataset, aligned with our actual schema to facilitate executing and testing queries in BigQuery. By using Datafaker Gen, this data can become meaningful and predictable, based on predefined providers, thus allowing for more realistic and reliable testing environments. This solution leverages the BigQuery API Client libraries provided by Google. For more details, refer to the official documentation here: BigQuery API Client Libraries. Quick Start With BigQuery Sink This is a simple example of BigQuery Sink just to show that it requires two simple actions to see the result. This provides clarity on the approach. The other part of this article will cover detailed configuration and the flexibility of this feature. And so, three simple steps need to be done: 1. Download the project here, build it, and navigate to the folder with the BigQuery example: Shell ./mvnw clean verify && cd ./datafaker-gen-examples/datafaker-gen-bigquery 2. Configure schema in config.yaml : YAML default_locale: en-US fields: - name: id generators: [ Number#randomNumber ] - name: lastname generators: [ Name#lastName ] nullRate: 0.1 - name: firstname locale: ja-JP generators: [ Name#firstName ] Configure BigQuery Sink in output.yamlwith the path to the Service Account JSON (which should be obtained from GCP): YAML sinks: bigquery: project_id: [gcp project name] dataset: datafaker table: users service_account: [path to service accout json] Run it: Shell # Format json, number of lines 10000 and new BigQuery Sink bin/datafaker_gen -f json -n 10000 -sink bigquery In-Depth Guide To Using BigQuery Sink To prepare a generator for BigQuery, follow these two steps: Define the DataFaker Schema: The schema defined in config.yaml will be reused for the BigQuery Sink. Configure the BigQuery Sink: In output.yaml, specify the connection credentials, connection properties, and generation parameters. Note: Currently, BigQuery Sink only supports the JSON format. If another format is used, the BigQuery Sink will throw an exception. At the same time, it might be a good opportunity to introduce other formats, such as protobuf. 1. Define the DataFaker Schema One of the most important preparation tasks is defining the schema in the config.yaml file. The schema specifies the field definitions of the record based on the Datafaker provider. It also allows for the definition of embedded fields like array and struct. Consider this example of a schema definition in the config.yaml file. The first step is to define the base locale that should be used for all fields. This should be done at the top of the file in the property default_locale . The locale for a specific field can be customized directly. YAML default_locale: en-US This schema defines the default locale as 'en-EN' and lists the fields. Then all required fields should be defined in fields section. Let’s fill in the details of the field definitions. Datafaker Gen supports three main field types: default, array, and struct. Default Type This is a simple type that allows you to define the field name and how to generate its value using generator property. Additionally, there are some optional parameters that allow for customization of locale and rate nullability. YAML default_locale: en-US fields: - name: id generators: [ Number#randomNumber ] - name: lastname generators: [ Name#lastName ] nullRate: 0.1 - name: firstname locale: ja-JP generators: [ Name#firstName ] name: Defines the field name. generators: Defines the Faker provider methods that generate value. For BigQuery, based on the format provided by the Faker provider generators, it will generate JSON, which will be reused for BigQuery field types. In our example, Number#randomNumber returns a long value from the DataFaker provider, which is then converted to an integer for the BigQuery schema. Similarly, the fields Name#lastName and Name#firstName which are String and convert to STRING in BigQuery. nullRate: Determine how often this field is missing or has a null value. locale: Defines a specific locale for the current field. Array Type This type allows the generation of a collection of values. It reuses the fields from the default type and extends them with two additional properties: minLength and maxLength. In BigQuery, this type corresponds to a field with the REPEATED mode. The following fields need to be configured in order to enable the array type: type: Specify array type for this field. minLenght: Specify min length of array. maxLenght: Specify max length of array. All these properties are mandatory for the array type. YAML default_locale: en-US fields: - name: id generators: [ Number#randomNumber ] - name: lastname generators: [ Name#lastName ] nullRate: 0.1 - name: firstname generators: [ Name#firstName ] locale: ja-JP - name: phone numbers type: array minLength: 2 maxLength: 5 generators: [ PhoneNumber#phoneNumber, PhoneNumber#cellPhone ] It is also worth noting that, generator property can contain multiple sources of value, such as for phone numbers. Struct Type This type allows you to create a substructure that can contain many nested levels based on all existing types. In BigQuery, this type corresponds to RECORD type. struct type doesn’t have a generator property but has a new property called fields, where a substructure based on the default, array or struct type can be defined. There are two main fields that need to be added for the struct type: type: Specify struct type for this field. fields: Defines a list of fields in a sub-structure. YAML default_locale: en-US fields: - name: id generators: [ Number#randomNumber ] - name: lastname generators: [ Name#lastName ] nullRate: 0.1 - name: firstname generators: [ Name#firstName ] locale: ja-JP - name: phone numbers type: array minLength: 2 maxLength: 5 generators: [ PhoneNumber#phoneNumber, PhoneNumber#cellPhone ] - name: address type: struct fields: - name: country generators: [ Address#country ] - name: city generators: [ Address#city ] - name: street address generators: [ Address#streetAddress ] 2. Configure BigQuery Sink As previously mentioned, the configuration for sinks can be added in the output.yaml file. The BigQuery Sink configuration allows you to set up credentials, connection properties, and sink properties. Below is an example configuration for a BigQuery Sink: YAML sinks: bigquery: batchsize: 100 project_id: [gcp project name] dataset: datafaker table: users service_account: [path to service accout json] create_table_if_not_exists: true max_outstanding_elements_count: 100 max_outstanding_request_bytes: 10000 keep_alive_time_in_seconds: 60 keep_alive_timeout_in_seconds: 60 Let's review the entire list of leverages you can take advantage of: batchsize: Specifies the number of records to process in each batch. A smaller batch size can reduce memory usage but may increase the number of API calls. project_id: The Google Cloud Platform project ID where your BigQuery dataset resides. dataset: The name of the BigQuery dataset where the table is located. table: The name of the BigQuery table where the data will be inserted. Google Credentials should be configured with sufficient permissions to access and modify BigQuery datasets and tables. There are several ways to pass service account content: service_account: The path to the JSON file containing the service account credentials. This configuration should be defined in the output.yaml file. SERVICE_ACCOUNT_SECRETThis environment variable should contain the JSON content of the service account. The final option involves using the gcloud configuration from your environment (more details can be found here). This option is implicit and could potentially lead to unpredictable behavior. create_table_if_not_exists: If set to true, the table will be created if it does not already exist. A BigQuery Schema will be created based on the DataFaker Schema. max_outstanding_elements_count: The maximum number of elements (records) allowed in the buffer before they are sent to BigQuery. max_outstanding_request_bytes: The maximum size of the request in bytes allowed in the buffer before they are sent to BigQuery. keep_alive_time_in_seconds: The amount of time(in seconds) to keep the connection alive for additional requests. keep_alive_timeout_in_seconds: The amount of time(in seconds) to wait for additional requests before closing the connection due to inactivity. How to Run BigQuery Sink example has been merged into the main upstream Datafaker Gen project, where it can be adapted for your use. Running this generator is easy and lightweight. However, it requires several preparation steps: 1. Download the GitHub repository. The datafaker-gen-examples folder includes the example with BigQuery Sink, that we will use. 2. Build the entire project with all modules. The current solution uses 2.2.3-SNAPSHOT version of DataFaker library. Shell ./mvnw clean verify 3. Navigate to the 'datafaker-gen-bigquery' folder. This should serve as the working directory for your run. Shell cd ./datafaker-gen-examples/datafaker-gen-bigquery 4. Define the schema for records in the config.yaml file and place this file in the appropriate location where the generator should be run. Additionally, define the sinks configuration in the output.yaml file, as demonstrated previously. Datafake Gen can be executed through two options: 1. Use bash script from the bin folder in the parent project: Shell # Format json, number of lines 100 and new BigQuery Sink bin/datafaker_gen -f json -n 10000 -sink bigquery 2. Execute the JAR directly, like this: Shell java -cp [path_to_jar] net.datafaker.datafaker_gen.DatafakerGen -f json -n 10000 -sink bigquery Query Result and Outcome After applying all the necessary configurations and running in my test environment, it would be nice to check the outcome. This is the SQL query to retrieve the generated result: SQL SELECT id, lastname, firstname, `phone numbers`, address FROM `datafaker.users`; Here is the result of all our work (the result of the query): Only the first four records are shown here with all the fields defined above. It also makes sense to note that the phone numbers array field contains two or more values depending on the entries. The address structure field has three nested fields. Conclusion This newly added BigQuery Sink feature enables you to publish records to Google Cloud Platform efficiently. With the ability to generate and publish large volumes of realistic data, developers and data analysts can more effectively simulate the behavior of their applications and immediately start testing in real-world conditions. Your feedback allows us to evolve this project. Please feel free to leave a comment. The full source code is available here. I would like to thank Sergey Nuyanzin for reviewing this article. Thank you for reading! Glad to be of help. More PlatformCon 2024 Session Recap: Platform Engineering and AI By Caitlin Candelmo Are you curious what experienced practitioners are saying about AI and platform engineering — and its growing impact on development workflows? Look no further than DZone’s latest event with PlatformCon 2024 where our global software community answers these vital questions in an expert panel on all things platform engineering, AI, and beyond. What Developers Must Know About AI and Platform Engineering Moderated by DZone Core member and Director of Data and AI at Silk, Kellyn Pot’Vin-Gorman, panelists Ryan Murray, Sandra Borda, and Chiradeep Vittal discussed the most probing questions and deliberations facing AI and platform engineering today. Check out the panel discussion in its entirety here: Important questions and talking points discussed include: How has AI transformed the platform engineering landscape? Examples of how AI has improved developer productivity within organizations. What are some of the challenges you’ve faced when integrating AI into your development workflow, and how have those been addressed? What are some anti-patterns or caveats when integrating GenAI into engineering platforms and the SDLC more broadly? What are some practical steps or strategies for organizations looking to start incorporating AI into their platform engineering efforts? ….and more! More Trend Report Low-Code Development Low code, no code, citizen development, AI automation, scalability — if you work in the tech world, it's likely that you have been encouraged to use tools in at least one of these spaces. And it's for a good reason as Gartner has projected that by 2025, 70% of applications developed within organizations will have been built using low- and/or no-code technologies. So does the practice live up to the hype? Year over year, the answer is a resounding "yes" as the industry continues to evolve. Organizations have an increased demand for more frequent application releases and updates, and with that comes the need for increased efficiencies. And this is where low-code and no-code development practices shine. Sprinkle AI automation into low- and no-code development, and the scalability opportunities are endless. This Trend Report covers the evolving landscape of low- and no-code development by providing a technical exploration of integration techniques into current development processes, the role AI plays in relation to low- and no-code development, governance, intelligent automated testing, and adoption challenges. In addition to findings from our original research, technical experts from the DZone Community contributed articles addressing important topics in the low code space, including scalability, citizen development, process automation, and much more. To ensure that you, the developer, can focus on higher priorities, this Trend Report aims to provide all the tools needed to successfully leverage low code in your tech stack. Download Refcard #371 Data Pipeline Essentials By Sudip Sengupta CORE Download Refcard #395 Open Source Migration Practices and Patterns By Nuwan Dias CORE Download MORE ARTICLES Comparative Analysis of pgVector and OpenSearch for Vector Databases Vector databases allow for efficient data storage and retrieval by storing them as points or vectors instead of traditional rows and columns. Two popular vector database options are pgVector extension for PostgreSQL and Amazon OpenSearch Service. This article compares the specifications, strengths, limitations, capabilities, and use cases for pgVector and OpenSearch to help inform decision-making when selecting the best-suited option for various needs. Introduction The rapid advancements in artificial intelligence (AI) and machine learning (ML) have necessitated the development of specialized databases that can efficiently store and retrieve high-dimensional data. Vector databases have emerged as a critical component in this landscape, enabling applications such as recommendation systems, image search, and natural language processing. This article compares two prominent vector database solutions, pgVector extension for PostgreSQL and Amazon OpenSearch Service, directly relevant to your roles as technical professionals, database administrators, and AI and ML practitioners. Technical Background Vector databases store data as vectors, enabling efficient similarity searches and other vector operations. pgVector enhances PostgreSQL's capabilities to handle vectors, while OpenSearch provides a comprehensive solution for storing and indexing vectors and metadata, supporting scalable AI applications. Problem Statement Choosing the proper vector database involves understanding the available options' specific requirements, performance characteristics, and integration capabilities. This article provides a practical and detailed comparison to assist in making an informed decision and instill confidence in the process. Methodology or Approach This analysis reviews current practices, case studies, and theoretical models to compare pgVector and OpenSearch comprehensively. It highlights critical differences in technical specifications, performance, and use cases, ensuring the audience feels well-informed. pgVector Extension for PostgreSQL pgVector is an open-source extension for PostgreSQL that enables storing and querying high-dimensional vectors. It supports various distance calculations and provides functionality for exact and approximate nearest-neighbor searches. Key features include: Vector storage: Supports vectors with up to 16,000 dimensions. Indexing: Supports indexing of vector data using IVFFlat for up to 2000 dimensions. Integration: Seamlessly integrates with PostgreSQL, leveraging its ACID compliance and other features. Amazon OpenSearch Service OpenSearch is an open-source, all-in-one vector database that supports flexible and scalable AI applications. Key features include: Scalability: Handles large volumes of data with distributed computing capabilities. Indexing: Supports various indexing methods, including HNSW and IVFFlat. Advanced features: Provides full-text search, security, and anomaly detection features. Comparative Analysis Technical Specifications CAPABILITY PGVECTOR (POSTGRESQL EXTENSION) AMAZON OPENSEARCH Max Vector Dimensions Up to 16,000 Up to 16,000 (various indexing methods) Distance Metrics L2, Inner Product, Cosine L1, L2, Inner Product, Cosine, L-infinity Database Type Relational NoSQL Performance Optimized for vector operations A variable may not match pgVector for intensive vector operations Memory Utilization High control over memory settings Limited granularity CPU Utilization More efficient Higher CPU utilization Fault Tolerance and Recovery PostgreSQL mechanisms Automated backups and recovery Security PostgreSQL features Advanced security features Distributed Computing Capabilities Limited Built for distributed computing GPU Acceleration Supported via libraries Supported by FAISS and NMSLIB Cost Free cost for PostgreSQL AWS infrastructure costs Integration with Other Tools PostgreSQL extensions and tools AWS services and tools Performance pgVector is designed to optimize vector operations, offering several tuning options for performance improvement. In contrast, OpenSearch's performance can vary, particularly with complex queries or large data volumes. Strengths and Limitations pgVector Strengths Open-source and free Seamless integration with PostgreSQL Efficient handling of high-dimensional vectors Detailed tuning options for performance optimization pgVector Limitations Requires knowledge of PostgreSQL and SQL Limited to vector indexing Scalability depends on the PostgreSQL setup OpenSearch Strengths Highly scalable with distributed computing Versatile data type support Advanced features, including full-text search and security Integration with AWS services OpenSearch Limitations Steeper learning curve Variable performance for high-dimensional vectors Higher latency for complex queries Use Cases pgVector Use Cases E-commerce: Recommendation systems and similarity searches. Healthcare: Semantic search for medical records and genomics research. Finance: Anomaly detection and fraud detection. Biotechnology and genomics: Handling complex genetic data. Multimedia analysis: Similarity search for images, videos, and audio files. OpenSearch Use Cases Marketing: Customer behavior analysis. Cybersecurity: Anomaly detection in network events. Supply chain management: Inventory management. Healthcare: Patient data analysis and predictive modeling. Telecommunications: Network performance monitoring. Retail: Recommendation engines and inventory management. Semantic search: Contextually relevant search results. Multimedia analysis: Reverse image search and video recommendation systems. Audio search: Music recommendation systems and audio-based content discovery. Geospatial search: Optimized routing and property suggestions. Conclusion: Future Trends and Developments The field of vector databases is rapidly evolving, driven by the increasing demand for efficient storage and retrieval of high-dimensional data in AI and ML applications. Future developments may include improved scalability, enhanced performance, and new features to support advanced use cases. Understanding these trends can help you make informed decisions and plan for the future. By Jagadish Nimmagadda You Can Shape Trend Reports: Participate in DZone Research Surveys + Enter the Prize Drawings! Hello, DZone Community! We have several surveys in progress as part of our research for upcoming Trend Reports. We would love for you to join us by sharing your experiences and insights (anonymously if you choose) — readers just like you drive the content that we cover in our Trend Reports. check out the details for each research survey below Over the coming months, we will compile and analyze data from hundreds of respondents; results and observations will be featured in the "Key Research Findings" of our Trend Reports. Security Research Security is everywhere; you can’t live with it, and you certainly can’t live without it! We are living in an entirely unprecedented world — one where bad actors are growing more sophisticated and are taking full advantage of the rapid advancements in AI. We will be exploring the most pressing security challenges and emerging strategies in this year’s survey for our August Enterprise Security Trend Report. Our 10-12-minute Enterprise Security Survey explores: Building a security-first organization Security architecture and design Key security strategies and techniques Cloud and software supply chain security At the end of the survey, you're also able to enter the prize drawing for a chance to receive one of two $175 (USD) e-gift cards! Join the Security Research Data Engineering Research As a continuation of our annual data-related research, we're consolidating our database, data pipeline, and data and analytics scopes into a single 12-minute survey that will guide help the narratives of our July Database Systems Trend Report and data engineering report later in the year. Our 2024 Data Engineering Survey explores: Database types, languages, and use cases Distributed database design + architectures Data observability, security, and governance Data pipelines, real-time processing, and structured storage Vector data and databases + other AI-driven data capabilities Join the Data Engineering Research You'll also have the chance to enter the $500 raffle at the end of the survey — five random people will be drawn and will receive $100 each (USD)! Cloud and Kubernetes Research This year, we're combining our annual cloud native and Kubernetes research into one 10-minute survey that dives further into these topics as they relate to both one another and at the intersection of security, observability, AI, and more. DZone's research will be informing these Trend Reports: May – Cloud Native: Championing Cloud Development Across the SDLC September – Kubernetes in the Enterprise Our 2024 Cloud Native Survey covers: Microservices, container orchestration, and tools/solutions Kubernetes use cases, pain points, and security measures Cloud infrastructure, costs, tech debt, and security threats AI for release management + monitoring/observability Join the Cloud Native Research Don't forget to enter the $750 raffle at the end of the survey! Five random people will be selected to each receive $150 (USD). Your responses help inform the narrative of our Trend Reports, so we truly cannot do this without you. Stay tuned for each report's launch and see how your insights align with the larger DZone Community. We thank you in advance for your help! —The DZone Publications team By Caitlin Candelmo Faster Startup With Spring Boot 3.2 and CRaC, Part 1: Automatic Checkpoint With Spring Boot 3.2 and Spring Framework 6.1, we get support for Coordinated Restore at Checkpoint (CRaC), a mechanism that enables Java applications to start up faster. With Spring Boot, we can use CRaC in a simplified way, known as Automatic Checkpoint/Restore at startup. Even though not as powerful as the standard way of using CRaC, this blog post will show an example where the Spring Boot applications startup time is decreased by 90%. The sample applications are from chapter 6 in my book on building microservices with Spring Boot. Overview The blog post is divided into the following sections: Introducing CRaC, benefits, and challenges Creating CRaC-based Docker images with a Dockerfile Trying out CRaC with automatic checkpoint/restore Summary Next blog post Let’s start learning about CRaC and its benefits and challenges. 1. Introducing CRaC, Benefits, and Challenges Coordinated Restore at Checkpoint (CRaC) is a feature in OpenJDK, initially developed by Azul, to enhance the startup performance of Java applications by allowing them to restore to a previously saved state quickly. CRaC enables Java applications to save their state at a specific point in time (checkpoint) and then restore from that state at a later time. This is particularly useful for scenarios where fast startup times are crucial, such as serverless environments, microservices, and, in general, applications that must be able to scale up their instances quickly and also support scale-to-zero when not being used. This introduction will first explain a bit about how CRaC works, then discuss some of the challenges and considerations associated with it, and finally, describe how Spring Boot 3.2 integrates with it. The introduction is divided into the following subsections: 1.1. How CRaC Works 1.2. Challenges and Considerations 1.3. Spring Boot 3.2 integration with CRaC 1.1. How CRaC Works Checkpoint Creation At a chosen point during the application’s execution, a checkpoint is created. This involves capturing the entire state of the Java application, including the heap, stack, and all active threads. The state is then serialized and saved to the file system. During the checkpoint process, the application is typically paused to ensure a consistent state is captured. This pause is coordinated to minimize disruption and ensure the application can resume correctly. Before taking the checkpoint, some requests are usually sent to the application to ensure that it is warmed up, i.e., all relevant classes are loaded, and the JVM HotSpot engine has had a chance to optimize the bytecode according to how it is being used in runtime. Commands to perform a checkpoint: Shell java -XX:CRaCCheckpointTo=<some-folder> -jar my_app.jar # Make calls to the app to warm up the JVM... jcmd my_app.jar JDK.checkpoint State Restoration When the application is started from the checkpoint, the previously saved state is deserialized from the file system and loaded back into memory. The application then continues execution from the exact point where the checkpoint was taken, bypassing the usual startup sequence. Command to restore from a checkpoint: Shell java -XX:CRaCRestoreFrom=<some-folder> Restoring from a checkpoint allows applications to skip the initial startup process, including class loading, warmup initialization, and other startup routines, significantly reducing startup times. For more information, see Azul’s documentation: What is CRaC? 1.2. Challenges and Considerations As with any new technology, CRaC comes with a new set of challenges and considerations: State Management Open files and connections to external resources, such as databases, must be closed before the checkpoint is taken. After the restore, they must be reopened. CRaC exposes a Java lifecycle interface that applications can use to handle this, org.crac.Resource, with the callback methods beforeCheckpoint and afterRestore. Sensitive Information Credentials and secrets stored in the JVM’s memory will be serialized into the files created by the checkpoint. Therefore, these files need to be protected. An alternative is to run the checkpoint command against a temporary environment that uses other credentials and replace the credentials on restore. Linux Dependency The checkpoint technique is based on a Linux feature called CRIU, “Checkpoint/Restore In Userspace”. This feature only works on Linux, so the easiest way to test CRaC on a Mac or a Windows PC is to package the application into a Linux Docker image. Linux Privileges Required CRIU requires special Linux privileges, resulting in Docker commands to build Docker images and creating Docker containers also requiring Linux privileges to be able to run. Storage Overhead Storing and managing checkpoint data requires additional storage resources, and the checkpoint size can impact the restoration time. The original jar file is also required to be able to restart a Java application from a checkpoint. I will describe how to handle these challenges in the section on creating Docker images. 1.3. Spring Boot 3.2 Integration With CRaC Spring Boot 3.2 (and the underlying Spring Framework) helps with the processing of closing and reopening connections to external resources. Before the creation of the checkpoint, Spring stops all running beans, giving them a chance to close resources if needed. After a restore, the same beans are restarted, allowing beans to reopen connections to the resources. The only thing that needs to be added to a Spring Boot 3.2-based application is a dependency to the crac-library. Using Gradle, it looks like the following in the gradle.build file: Groovy dependencies { implementation 'org.crac:crac' Note: The normal Spring Boot BOM mechanism takes care of versioning the crac dependency. The automatic closing and reopening of connections handled by Spring Boot usually works. Unfortunately, when this blog post was written, some Spring modules lacked this support. To track the state of CRaC support in the Spring ecosystem, a dedicated test project, Spring Lifecycle Smoke Tests, has been created. The current state can be found on the project’s status page. If required, an application can register callback methods to be called before a checkpoint and after a restore by implementing the above-mentioned Resource interface. The microservices used in this blog post have been extended to register callback methods to demonstrate how they can be used. The code looks like this: Java import org.crac.*; public class MyApplication implements Resource { public MyApplication() { Core.getGlobalContext().register(this); } @Override public void beforeCheckpoint(Context<? extends Resource> context) { LOG.info("CRaC's beforeCheckpoint callback method called..."); } @Override public void afterRestore(Context<? extends Resource> context) { LOG.info("CRaC's afterRestore callback method called..."); } } Spring Boot 3.2 provides a simplified alternative to take a checkpoint compared to the default on-demand alternative described above. It is called automatic checkpoint/restore at startup. It is triggered by adding the JVM system property -Dspring.context.checkpoint=onRefresh to the java -jar command. When set, a checkpoint is created automatically when the application is started. The checkpoint is created after Spring beans have been created but not started, i.e., after most of the initialization work but before that application starts. For details, see Spring Boot docs and Spring Framework docs. With an automatic checkpoint, we don’t get a fully warmed-up application, and the runtime configuration must be specified at build time. This means that the resulting Docker images will be runtime-specific and contain sensitive information from the configuration, like credentials and secrets. Therefore, the Docker images must be stored in a private and protected container registry. Note: If this doesn’t meet your requirements, you can opt for the on-demand checkpoint, which I will describe in the next blog post. With CRaC and Spring Boot 3.2’s support for CRaC covered, let’s see how we can create Docker images for Spring Boot applications that use CRaC. 2. Creating CRaC-Based Docker Images With a Dockerfile While learning how to use CRaC, I studied several blog posts on using CRaC with Spring Boot 3.2 applications. They all use rather complex bash scripts (depending on your bash experience) using Docker commands like docker run, docker exec, and docker commit. Even though they work, it seems like an unnecessarily complex solution compared to producing a Docker image using a Dockerfile. So, I decided to develop a Dockerfile that runs the checkpoint command as a RUN command in the Dockerfile. It turned out to have its own challenges, as described below. I will begin by describing my initial attempt and then explain the problems I stumbled into and how I solved them, one by one until I reach a fully working solution. The walkthrough is divided into the following subsections: 2.1. First attempt 2.2. Problem #1, privileged builds with docker build 2.3. Problem #2, CRaC returns exit status 137, instead of 0 2.4. Problem #3, Runtime configuration 2.5. Problem #4, Spring Data JPA 2.6. The resulting Dockerfile Let’s start with a first attempt and see where it leads us. 2.1. First Attempt My initial assumption was to create a Dockerfile based on a multi-stage build, where the first stage creates the checkpoint using a JDK-based base image, and the second step uses a JRE-based base image for runtime. However, while writing this blog post, I failed to find a base image for a Java 21 JRE supporting CRaC. So I changed my mind to use a regular Dockerfile instead, using a base image from Azul: azul/zulu-openjdk:21-jdk-crac Note: BellSoft also provides base images for CraC; see Liberica JDK with CRaC Support as an alternative to Azul. The first version of the Dockerfile looks like this: Dockerfile FROM azul/zulu-openjdk:21-jdk-crac ADD build/libs/*.jar app.jar RUN java -Dspring.context.checkpoint=onRefresh -XX:CRaCCheckpointTo=checkpoint -jar app.jar EXPOSE 8080 ENTRYPOINT ["java", "-XX:CRaCRestoreFrom=checkpoint"] This Dockerfile is unfortunately not possible to use since CRaC requires a build to run privileged commands. 2.2. Problem #1, Privileged Builds With Docker Build As mentioned in section 1.2. Challenges and Considerations, CRIU, which CRaC is based on, requires special Linux privileges to perform a checkpoint. The standard docker build command doesn’t allow privileged builds, so it can’t be used to build Docker images using the above Dockerfile. Note: The --privileged - flag that can be used in docker run commands is not supported by docker build. Fortunately, Docker provides an improved builder backend called BuildKit. Using BuildKit, we can create a custom builder that is insecure, meaning it allows a Dockerfile to run privileged commands. To communicate with BuildKit, we can use Docker’s CLI tool buildx. The following command can be used to create an insecure builder named insecure-builder: Shell docker buildx create --name insecure-builder --buildkitd-flags '--allow-insecure-entitlement security.insecure' Note: The builder runs in isolation within a Docker container created by the docker buildx create command. You can run a docker ps command to reveal the container. When the builder is no longer required, it can be removed with the command: docker buildx rm insecure-builder. The insecure builder can be used to build a Docker image with a command like: Shell docker buildx --builder insecure-builder build --allow security.insecure --load . Note: The --load flag loads the built image into the regular local Docker image cache. Since the builder runs in an isolated container, its result will not end up in the regular local Docker image cache by default. RUN commands in a Dockerfile that requires privileges must be suffixed with --security=insecure. The --security-flag is only in preview and must therefore be enabled in the Dockerfile by adding the following line as the first line in the Dockerfile: Dockerfile # syntax=docker/dockerfile:1.3-labs For more details on BuildKit and docker buildx, see Docker Build architecture. We can now perform the build; however, the way the CRaC is implemented stops the build, as we will learn in the next section. 2.3. Problem #2, CRaC Returns Exit Status 137 Instead of 0 On a successful checkpoint, the java -Dspring.context.checkpoint=onRefresh -XX:CRaCCheckpointTo... command is terminated forcefully (like using kill -9) and returns the exit status 137 instead of 0, causing the Docker build command to fail. To prevent the build from stopping, the java command is extended with a test that verifies that 137 is returned and, if so, returns 0 instead. The following is added to the java command: || if [ $? -eq 137 ]; then return 0; else return 1; fi. Note: || means that the command following will be executed if the first command fails. With CRaC working in a Dockerfile, let’s move on and learn about the challenges with runtime configuration and how to handle them. 2.4. Problem #3, Runtime Configuration Using Spring Boot’s automatic checkpoint/restore at startup, there is no way to specify runtime configuration on restore; at least, I haven’t found a way to do it. This means that the runtime configuration has to be specified at build time. Sensitive information from the runtime configuration, such as credentials used for connecting to a database, will written to the checkpoint files. Since the Docker images will contain these checkpoint files they also need to be handled in a secure way. The Spring Framework documentation contains a warning about this, copied from the section Automatic checkpoint/restore at startup: As mentioned above, and especially in use cases where the CRaC files are shipped as part of a deployable artifact (a container image, for example), operate with the assumption that any sensitive data “seen” by the JVM ends up in the CRaC files, and assess carefully the related security implications. So, let’s assume that we can protect the Docker images, for example, in a private registry with proper authorization in place and that we can specify the runtime configuration at build time. In Chapter 6 of the book, the source code specifies the runtime configuration in the configuration files, application.yml, in a Spring profile named docker. The RUN command, which performs the checkpoint, has been extended to include an environment variable that declares what Spring profile to use: SPRING_PROFILES_ACTIVE=docker. Note: If you have the runtime configuration in a separate file, you can add the file to the Docker image and point it out using an environment variable like SPRING_CONFIG_LOCATION=file:runtime-configuration.yml. With the challenges of proper runtime configuration covered, we have only one problem left to handle: Spring Data JPA’s lack of support for CRaC without some extra work. 2.5. Problem #4, Spring Data JPA Spring Data JPA does not work out-of-the-box with CRaC, as documented in the Smoke Tests project; see the section about Prevent early database interaction. This means that auto-creation of database tables when starting up the application, is not possible when using CRaC. Instead, the creation has to be performed outside of the application startup process. Note: This restriction does not apply to embedded SQL databases. For example, the Spring PetClinic application works with CRaC without any modifications since it uses an embedded SQL database by default. To address these deficiencies, the following changes have been made in the source code of Chapter 6: Manual creation of a SQL DDL script, create-tables.sql Since we can no longer rely on the application to create the required database tables, a SQL DDL script has been created. To enable the application to create the script file, a Spring profile create-ddl-script has been added in the review microservice’s configuration file, microservices/review-service/src/main/resources/application.yml. It looks like: YAML spring.config.activate.on-profile: create-ddl-script spring.jpa.properties.jakarta.persistence.schema-generation: create-source: metadata scripts: action: create create-target: crac/sql-scripts/create-tables.sql The SQL DDL file has been created by starting the MySQL database and, next, the application with the new Spring profile. Once connected to the database, the application and database are shut down. Sample commands: Shell docker compose up -d mysql SPRING_PROFILES_ACTIVE=create-ddl-script java -jar microservices/review-service/build/libs/review-service-1.0.0-SNAPSHOT.jar # CTRL/C once "Connected to MySQL: jdbc:mysql://localhost/review-db" is written to the log output docker compose down The resulting SQL DDL script, crac/sql-scripts/create-tables.sql, has been added to Chapter 6’s source code. The Docker Compose file configures MySQL to execute the SQL DDL script at startup. A CraC-specific version of the Docker Compose file has been created, crac/docker-compose-crac.yml. To create the tables when the database is starting up, the SQL DDL script is used as an init script. The SQL DDL script is mapped into the init-folder /docker-entrypoint-initdb.d with the following volume-mapping in the Docker Compose file: Dockerfile volumes: - "./sql-scripts/create-tables.sql:/docker-entrypoint-initdb.d/create-tables.sql" Added a runtime-specific Spring profile in the review microservice’s configuration file. The guidelines in the Smoke Tests project’s JPA section have been followed by adding an extra Spring profile named crac. It looks like the following in the review microservice’s configuration file: YAML spring.config.activate.on-profile: crac spring.jpa.database-platform: org.hibernate.dialect.MySQLDialect spring.jpa.properties.hibernate.temp.use_jdbc_metadata_defaults: false spring.jpa.hibernate.ddl-auto: none spring.sql.init.mode: never spring.datasource.hikari.allow-pool-suspension: true Finally, the Spring profile crac is added to the RUN command in the Dockerfile to activate the configuration when the checkpoint is performed. 2.6. The Resulting Dockerfile Finally, we are done with handling the problems resulting from using a Dockerfile to build a Spring Boot application that can restore quickly using CRaC in a Docker image. The resulting Dockerfile, crac/Dockerfile-crac-automatic, looks like: Dockerfile # syntax=docker/dockerfile:1.3-labs FROM azul/zulu-openjdk:21-jdk-crac ADD build/libs/*.jar app.jar RUN --security=insecure \ SPRING_PROFILES_ACTIVE=docker,crac \ java -Dspring.context.checkpoint=onRefresh \ -XX:CRaCCheckpointTo=checkpoint -jar app.jar \ || if [ $? -eq 137 ]; then return 0; else return 1; fi EXPOSE 8080 ENTRYPOINT ["java", "-XX:CRaCRestoreFrom=checkpoint"] Note: One and the same Dockerfile is used by all microservices to create CRaC versions of their Docker images. We are now ready to try it out! 3. Trying Out CRaC With Automatic Checkpoint/Restore To try out CRaC, we will use the microservice system landscape used in Chapter 6 of my book. If you are not familiar with the system landscape, it looks like the following: Chapter 6 uses Docker Compose to manage (build, start, and stop) the system landscape. Note: If you don’t have all the tools used in this blog post installed in your environment, you can look into Chapters 21 and 22 for installation instructions. To try out CRaC, we need to get the source code from GitHub, compile it, and create the Docker images for each microservice using a custom insecure Docker builder. Next, we can use Docker Compose to start up the system landscape and run the end-to-end validation script that comes with the book to ensure that everything works as expected. We will wrap up the try-out section by comparing the startup times of the microservices when they start with and without using CRaC. We will go through each step in the following subsections: 3.1. Getting the source code 3.2. Building the CRaC-based Docker images 3.3. Running end-to-end tests 3.4. Comparing startup times without CRaC 3.1. Getting the Source Code Run the following commands to get the source code from GitHub, jump into the Chapter06 folder, check out the branch SB3.2-crac-automatic, and ensure that a Java 21 JDK is used (Eclipse Temurin is used here): Shell git clone https://github.com/PacktPublishing/Microservices-with-Spring-Boot-and-Spring-Cloud-Third-Edition.git cd Microservices-with-Spring-Boot-and-Spring-Cloud-Third-Edition/Chapter06 git checkout SB3.2-crac-automatic sdk use java 21.0.3-tem 3.2. Building the CRaC-Based Docker Images Start with compiling the microservices source code: Shell ./gradlew build If not already created, create the insecure builder with the command: Shell docker buildx create --name insecure-builder --buildkitd-flags '--allow-insecure-entitlement security.insecure' Now we can build a Docker image, where the build performs a CRaC checkpoint for each of the microservices with the commands: Shell docker buildx --builder insecure-builder build --allow security.insecure -f crac/Dockerfile-crac-automatic -t product-composite-crac --load microservices/product-composite-service docker buildx --builder insecure-builder build --allow security.insecure -f crac/Dockerfile-crac-automatic -t product-crac --load microservices/product-service docker buildx --builder insecure-builder build --allow security.insecure -f crac/Dockerfile-crac-automatic -t recommendation-crac --load microservices/recommendation-service docker buildx --builder insecure-builder build --allow security.insecure -f crac/Dockerfile-crac-automatic -t review-crac --load microservices/review-service 3.3. Running End-To-End Tests To start up the system landscape, we will use Docker Compose. Since CRaC requires special Linux privileges, a CRaC-specific docker-compose file comes with the source code, crac/docker-compose-crac.yml. Each microservice is given the required privilege, CHECKPOINT_RESTORE, by specifying: YAML cap_add: - CHECKPOINT_RESTORE Note: Several blog posts on CRaC suggest using privileged containers, i.e., starting them with run --privleged or adding privileged: true in the Docker Compose file. This is a really bad idea since an attacker who gets control over such a container can easily take control of the host that runs Docker. For more information, see Docker’s documentation on Runtime privilege and Linux capabilities. The final addition to the CRaC-specific Docker Compose file is the volume mapping for MySQL to add the init file described above in section 2.5. Problem #4, Spring Data JPA: Dockerfile volumes: - "./sql-scripts/create-tables.sql:/docker-entrypoint-initdb.d/create-tables.sql" Using this Docker Compose file, we can start up the system landscape and run the end-to-end verification script with the following commands: Shell export COMPOSE_FILE=crac/docker-compose-crac.yml docker compose up -d Let’s start with verifying that the CRaC afterRestore callback methods were called: Shell docker compose logs | grep "CRaC's afterRestore callback method called..." Expect something like: Shell ...ReviewServiceApplication : CRaC's afterRestore callback method called... ...RecommendationServiceApplication : CRaC's afterRestore callback method called... ...ProductServiceApplication : CRaC's afterRestore callback method called... ...ProductCompositeServiceApplication : CRaC's afterRestore callback method called... Now, run the end-to-end verification script: Shell ./test-em-all.bash If the script ends with a log output similar to: Shell End, all tests OK: Fri Jun 28 17:40:43 CEST 2024 …it means all tests run ok, and the microservices behave as expected. Bring the system landscape down with the commands: Shell docker compose down unset COMPOSE_FILE After verifying that the microservices behave correctly when started from a CRaC checkpoint, we can compare their startup times with microservices started without using CRaC. 3.4. Comparing Startup Times Without CRaC Now over to the most interesting part: How much faster does the microservice startup when performing a restore from a checkpoint compared to a regular cold start? The tests have been run on a MacBook Pro M1 with 64 GB memory. Let’s start with measuring startup times without using CRaC. 3.4.1. Startup Times Without CRaC To start the microservices without CRaC, we will use the default Docker Compose file. So, we must ensure that the COMPOSE_FILE environment variable is unset before we build the Docker images for the microservices. After that, we can start the database services, MongoDB and MySQL: Shell unset COMPOSE_FILE docker compose build docker compose up -d mongodb mysql Verify that the databases are reporting healthy with the command: docker compose ps. Repeat the command until both report they are healthy. Expect a response like this: Shell NAME ... STATUS ... chapter06-mongodb-1 ... Up 13 seconds (healthy) ... chapter06-mysql-1 ... Up 13 seconds (healthy) ... Next, start the microservices and look in the logs for the startup time (searching for the word Started). Repeat the logs command until logs are shown for all four microservices: Shell docker compose up -d docker compose logs | grep Started Look for a response like: Shell ...Started ProductCompositeServiceApplication in 1.659 seconds ...Started ProductServiceApplication in 2.219 seconds ...Started RecommendationServiceApplication in 2.203 seconds ...Started ReviewServiceApplication in 3.476 seconds Finally, bring down the system landscape: Shell docker compose down 3.4.2. Startup Times With CRaC First, declare that we will use the CRaC-specific Docker Compose file and start the database services, MongoDB and MySQL: Shell export COMPOSE_FILE=crac/docker-compose-crac.yml docker compose up -d mongodb mysql Verify that the databases are reporting healthy with the command: docker compose ps. Repeat the command until both report they are healthy. Expect a response like this: Shell NAME ... STATUS ... crac-mongodb-1 ... Up 10 seconds (healthy) ... crac-mysql-1 ... Up 10 seconds (healthy) ... Next, start the microservices and look in the logs for the startup time (this time searching for the word Restored). Repeat the logs command until logs are shown for all four microservices: Shell docker compose up -d docker compose logs | grep Restored Look for a response like: Shell ...Restored ProductCompositeServiceApplication in 0.131 seconds ...Restored ProductServiceApplication in 0.225 seconds ...Restored RecommendationServiceApplication in 0.236 seconds ...Restored ReviewServiceApplication in 0.154 seconds Finally, bring down the system landscape: Shell docker compose down unset COMPOSE_FILE Now, we can compare the startup times! 3.4.3. Comparing Startup Times Between JVM and CRaC Here is a summary of the startup times, along with calculations of how many times faster the CRaC-enabled microservice starts and the reduction of startup times in percentage: MICROSERVICE WITHOUT CRAC WITH CRAC CRAC TIMES FASTER CRAC REDUCED STARTUP TIME product-composite 1.659 0.131 12.7 92% product 2.219 0.225 9.9 90% recommendation 2.203 0.236 9.3 89% review 3.476 0.154 22.6 96% Generally, we can see a 10-fold performance improvement in startup times or 90% shorter startup time; that’s a lot! Note: The improvement in the Review microservice is even better since it no longer handles the creation of database tables. However, this improvement is irrelevant when comparing improvements using CRaC, so let’s discard the figures for the Review microservice. 4. Summary Coordinated Restore at Checkpoint (CRaC) is a powerful feature in OpenJDK that improves the startup performance of Java applications by allowing them to resume from a previously saved state, a.k.a., a checkpoint. With Spring Boot 3.2, we also get a simplified way of creating a checkpoint using CRaC, known as automatic checkpoint/restore at startup. The tests in this blog post indicate a 10-fold improvement in startup performance, i.e., a 90% reduction in startup time when using automatic checkpoint/restore at startup. The blog post also explained how Docker images using CRaC can be built using a Dockerfile instead of the complex bash scripts suggested by most blog posts on the subject. This, however, comes with some challenges of its own, like using custom Docker builders for privileged builds, as explained in the blog post. Using Docker images created using automatic checkpoint/restore at startup comes with a price. The Docker images will contain runtime-specific and sensitive information, such as credentials to connect to a database at runtime. Therefore, they must be protected from unauthorized use. The Spring Boot support for CRaC does not fully cover all modules in Spring’s eco-system, forcing some workaround to be applied, e.g., when using Spring Data JPA. Also, when using automatic checkpoint/Restore at startup, the JVM HotSpot engine cannot be warmed up before the checkpoint. If optimal execution time for the first requests being processed is important, automatic checkpoint/restore at startup is probably not the way to go. 5. Next Blog Post In the next blog post, I will show you how to use regular on-demand checkpoints to solve some of the considerations with automatic checkpoint/restore at startup. Specifically, the problems with specifying the runtime configuration at build time, storing sensitive runtime configuration in the Docker images, and how the Java VM can be warmed up before performing the checkpoint. By Magnus Larsson Agile Teams as Investors Stakeholders often regard Scrum and other Agile teams as cost centers, primarily focused on executing projects within budgetary confines. This conventional view, however, undervalues their strategic potential. If we reconsider Agile teams as investors — carefully allocating their resources to optimize returns — they can significantly impact an organization’s strategic objectives and long-term profitability. This perspective not only redefines their role but also enhances the effectiveness of their contributions to the business by solving the customers’ problems. Strategic Benefits of Viewing Agile Teams as Investors Viewing Agile teams merely as task executors or internal development agencies misses a significant opportunity to harness their strategic potential. Instead, when we envision these Agile teams as investors within the organization’s strategic framework, their role undergoes a radical transformation. This shift in perspective not only emphasizes the intrinsic value Agile teams contribute but also ensures that their daily activities directly support and drive the company’s broader financial and strategic objectives. The following article will explore the multiple strategic benefits of adopting this investor-like viewpoint for Agile teams. For example, by treating each Sprint as a calculated investment with measurable returns, organizations can foster a more dynamic, responsive, and profitable development environment, maximizing operational efficiency and business outcomes. The advantages of such a viewpoint are apparent: Dynamic allocation of resources: Agile teams prioritize work that promises the highest return on investment (ROI), adjusting their focus as market conditions and customer needs evolve. This dynamic resource allocation is akin to managing a flexible investment portfolio where the allocation is continuously optimized in response to changing externalities. Cultivation of ownership and accountability: Teams that view their roles through an investor lens develop a more profound sense of ownership over the products they build. This mindset fosters a culture where every resource expenditure is scrutinized for value, encouraging more thoughtful and result-oriented work and avoiding typical blunders such as gold plating. Alignment with organizational goals: The investor perspective also helps bridge the gap between Agile teams and corporate strategy. It ensures that every Sprint and every project contributes directly to the organization’s overarching goals, aligning day-to-day activities with long-term business objectives. There is a reason why Scrum introduced the Product Goal with the Scrum Guide 2020. Investor Mindset Within Agile Frameworks When Agile teams operate as investors, they manage a portfolio of product development opportunities, each akin to a financial asset. This paradigm shift necessitates a robust understanding of value from a product functionality standpoint and a market and business perspective. Every decision to pursue a new feature, enhance an existing product, or pivot direction is an investment decision with potential returns measured in customer satisfaction, market share, revenue growth, and long-term business viability. Supportive Practices for Agile Teams as Investors To harness the full potential of Agile teams as investors and maximize the returns on their investments, organizations must create a conducive environment that supports this refined role. The following practices are crucial for empowering Agile teams to operate effectively within this concept: Autonomy within guided parameters: Similar to how a fund manager operates within the confines of an investment mandate, Agile teams require the freedom to make decisions independently while adhering to the broader strategic objectives set by the organization. This autonomy empowers them to make quick, responsive decisions that align with real-time market conditions and customer feedback. Leaders must trust these teams to navigate the details, allowing them to innovate and adjust their strategies without micromanagement. Agile teams as investors require agency with known constraints. Emphasis on continuous learning: The “investment realm” is dynamic, with continuous shifts that demand ongoing education and adaptability. Agile teams similarly benefit from a continuous learning environment where they can stay updated on the latest technological trends, market dynamics, and customer preferences. This knowledge is critical for making informed decisions, anticipating market needs, and responding proactively. Organizations should facilitate this learning by providing access to training, workshops, and industry conferences and encouraging knowledge sharing within and across teams, for example, by hosting events for the Agile community. Transparent and open communication: Effective communication channels between Agile teams and stakeholders are essential for understanding project expectations, organizational goals, and resource availability. This transparency helps teams make informed decisions about where to allocate their efforts for the best possible returns. Therefore, Agile teams should collaborate with stakeholders and establish regular check-ins, such as Sprint Reviews, Retrospectives, and joint exercises and workshops, to ensure all stakeholders are on the same page and can provide timely feedback that could influence investment decisions. Strategic resource allocation: Just as investors decide how best to distribute assets to maximize portfolio returns, Agile teams must strategically allocate their time and resources. This involves prioritizing tasks based on their potential impact and aligning them with the organization’s key performance indicators (KPIs). Multiple tools, such as value stream mapping or user story mapping, can help identify the most valuable activities that contribute directly to customer satisfaction and business success. Risk management and mitigation: Risk management and mitigation are paramount in the investment world. Agile teams, too, must develop competencies in identifying, assessing, and responding to risks associated with their projects. For example, working iteratively and incrementally in Scrum helps to quickly create feedback loops and adjust course if Increments do not live up to the anticipated response, preventing the team from pouring more time into something less valuable, diluting the potential ROI of the team. (Typically, risk mitigation starts even earlier in the process, based on product discovery and refinement activities. Performance metrics and feedback loops: To understand the effectiveness of their investment decisions, Agile teams need robust metrics and feedback mechanisms to guide future improvements. Metrics such as return on investment (ROI), customer satisfaction scores, and market penetration rates are valuable in assessing the success of Agile initiatives. Establishing a culture of feedback where insights and learning from each project cycle are systematically collected and analyzed will enable teams to refine their approaches continually, hence the importance of Sprint Reviews and Retrospectives in Scrum for optimizing a team’s contributions to the company’s strategic goals and ensuring sustained business growth and agility. Top Ten Anti-Patterns Limiting Agile Teams as Investors It could all be so simple if it weren’t for corporate reality. Despite the usefulness of Agile teams as investors concept, teams typically face numerous obstacles. Consequently, identifying and addressing these anti-patterns is crucial for Agile teams to succeed. Here, we explore the top ten anti-patterns that can severely restrict Agile teams from maximizing their investment capabilities and suggest strategies for overcoming them: Siloed operations: When teams operate in silos, they miss critical insights from other parts of the organization that could influence strategic decisions. To break down these silos, promote cross-functional teams and encourage regular interdepartmental meetings where teams can share insights and collaborate on broader organizational goals. Open Spaces or Barcamps are a good starting point. Rigid adherence to roadmaps: While roadmaps help guide development, strict adherence can prevent teams from adapting to new information or capitalizing on emerging opportunities. Implementing a flexible roadmap approach, where adjustments are possible and expected, can help teams stay Agile and responsive. Short-term focus: Focusing solely on short-term outcomes can lead to decisions that sacrifice long-term value. Encourage teams to adopt a balanced scorecard approach that includes short-term and long-term goals, ensuring immediate achievements do not undermine future success. Insufficient stakeholder engagement: Agile teams often lack deep engagement with stakeholders, leading to misalignments and missed opportunities. To combat this, develop structured engagement plans that include regular updates and involvement opportunities for stakeholders throughout the product lifecycle, starting with Sprint Reviews, stakeholder Retrospectives, and collaborative workshops and exercises. Aversion to risk: A culture that penalizes failure stifles innovation and risk-taking. Establishing a risk-tolerant culture that rewards calculated risks and views failures as learning opportunities can encourage teams to pursue higher-return projects. Leadership needs to lead this effort — no pun intended — by sharing their experiences; “failure nights” are suitable for that purpose. Resource hoarding: When teams withhold resources to safeguard against uncertainties, it prevents those resources from being used where they could generate value. Encourage a culture of transparency and shared responsibility where resources are allocated based on strategic priorities rather than preserved for hypothetical needs. Neglect of technical debt: Ignoring technical debt can increase costs and reduce system efficiency in the long run. Task the Agile team to maintain technical excellence and allocate time for debt reduction in each Sprint, treating these efforts as critical investments in the product’s future. There is no business agility without technical excellence. Mismatched incentives: When team incentives, or, worse, personal incentives, are not aligned with organizational goals, it can lead to misdirected efforts. Align reward systems with desired outcomes, such as customer satisfaction, market growth, or innovation metrics, to ensure that everyone’s efforts contribute directly to business objectives. Poor market understanding: Teams cannot make informed investment decisions without a strong understanding of the market and customer needs. Invest in market research and customer interaction programs to keep teams informed and responsive to the external environment. All team members must participate in product discovery and customer research activities regularly. Resistance to organizational change: Resistance to new methodologies, practices, or tools can limit a team’s ability to adapt and grow. Foster a culture of continuous improvement and openness to change by regularly reviewing and updating practices and providing training and support for new approaches. By addressing these anti-patterns, organizations can empower their Agile teams as investors, making smarter decisions that align with long-term strategic goals and enhance the company’s overall market position. Conclusion In conclusion, reimagining Scrum and Agile teams as investors is not merely a shift in perspective but a transformative approach that aligns these teams more closely with the organization’s broader objectives. By viewing every Sprint and project through the investment lens, these teams are empowered to prioritize initiatives that promise the best returns regarding customer value and contributions to the organization’s success. This investor mindset encourages Agile teams to operate with an enhanced sense of ownership and accountability, making decisions that are not just beneficial in the short term but are sustainable and profitable over the long haul. It fosters a deeper level of strategic engagement with projects, where Agile teams are motivated to maximize efficiency and effectiveness, understanding their direct impact on the company’s performance. Moreover, the practices that support Agile teams as investors—such as granting autonomy, emphasizing continuous learning, and ensuring open communication—are foundational to creating a culture of innovation and responsiveness. These practices help break down silos, encourage risk-taking, and align team incentives with corporate goals, driving the organization forward in a competitive marketplace. It is critical to address the common anti-patterns that hinder this investment-centric approach. By actively working to eliminate these barriers, organizations can unlock the true potential of their Agile teams, transforming them into critical drivers of business value and strategic advantage. Ultimately, when Scrum and Agile teams are empowered to act as investors, they contribute not only to the immediate product development goals but also to the long-term viability and growth of the organization. This holistic integration of Agile practices with business strategy ensures that the investments made in every Sprint yield substantial and sustained returns, securing a competitive edge in the dynamic business landscape. Do you view your Agile teams as investors? Please share with us in the comments. By Stefan Wolpers CORE 7 Essential Tips for a Production ClickHouse Cluster ClickHouse is the fastest, most resource-efficient OLAP database which can query billions of rows in milliseconds and is trusted by thousands of companies for real-time analytics. Here are seven tips to help you spin up a production ClickHouse cluster and avoid the most common mistakes. Tip 1: Use Multiple Replicas While testing ClickHouse, it’s natural to deploy a configuration with only one host because you may not want to use additional resources or take on unnecessary expenses. There’s nothing wrong with this in a development or testing environment, but that can come at a cost if you want to use only one host in production. If there’s a failure and you only have one replica and a single host, you’re at risk of losing all your data. For production loads, you should use several hosts and replicate data across them. Not only does it ensure that data remains safe when a host fails, but also allows you to balance the user load on several hosts, which makes resource-intensive queries faster. Tip 2: Don’t Be Shy With RAM ClickHouse is fast, but its speed depends on available resources, especially RAM. You can see great performance when running a ClickHouse cluster with the minimum amount of RAM in a development or testing environment, but that may change when the load increases. In a production environment with a lot of simultaneous read and write operations, a lack of RAM will be more noticeable. If your ClickHouse cluster doesn’t have enough memory, it will be slower, and executing complex queries will take longer. On top of that, when ClickHouse is performing resource-intensive operations, it may compete with the OS itself for RAM, and that eventually leads to OOM, downtime, and data loss. Developers of ClickHouse recommend using at least 16 GB of RAM to ensure that the cluster is stable. You can opt for less memory, but only do so when you know that the load won’t be high. Tip 3: Think Twice When Choosing a Table Engine ClickHouse supports several table engines with different characteristics, but a MergeTree engine will most likely be ideal. Specialized tables are tailored for specific uses, but have limitations that may not be obvious at first glance. Log Family engines may seem ideal for logs, but they don’t support replication and their database size is limited. Table engines in the MergeTree family are the default choice, and they provide the core data capabilities that ClickHouse is known for. Unless you know for sure why you need a different table engine, use an engine from a MergeTree family, and it will cover most of your use cases. Tip 4: Don’t Use More Than Three Columns for the Primary Key Primary keys in ClickHouse don’t serve the same purpose as in traditional databases. They don’t ensure uniqueness, but instead define how data is stored and then retrieved. If you use all columns as the primary key, you may benefit from faster queries. Yet, ClickHouse performance doesn’t only depend on reading data, but on writing it, too. When the primary key contains many columns, the whole cluster slows down when data is written to it. The optimal size of the primary key in ClickHouse is two or three columns, so you can run faster queries but not slow down data inserts. When choosing the columns, think of the requests that will be made and go for columns that will often be selected in filters. Tip 5: Avoid Small Inserts When you insert data in ClickHouse, it first saves a part with this data to a disk. It then sorts this data, merges it, and inserts it into the right place in the database in the background. If you insert small chunks of data very often, ClickHouse will create a part for every small insert. It will slow down the whole cluster and you may get the “Too many parts” error. To insert data efficiently, add data in big chunks and avoid sending more than one insert statement per second. ClickHouse can insert a lot of data at a high pace — even 100K rows per second is okay — but it should be one bulk insert instead of multiple smaller ones. If your data comes in small portions, consider using an external system such as <a>Managed Kafka</a> for making batches of data. ClickHouse is well integrated with Kafka and can efficiently consume data from it. Tip 6: Think of How You Will Get Rid of Duplicate Data Primary keys in ClickHouse don’t ensure that data is unique. Unlike other databases, if you insert duplicate data in ClickHouse, it will be added as is. Thus, the best option would be to ensure that the data is unique before inserting it. You can do it, for example, in a stream processing application, like Apache Kafka. If it’s not possible, there are ways to deal with it when you run queries. One option is to use `argMax` to select only the last version of the duplicate row. You can also use the ReplacingMergeTree engine that removes duplicate entries by design. Finally, you can run `OPTIMIZE TABLE ... FINAL` to merge data parts, but that’s a resource-demanding operation, and you should only run it when you know it won’t affect the cluster performance. Tip 7: Don’t Create an Index for Every Column Just like with primary keys, you may want to use multiple indexes to improve performance. This may be the case when you query data with the filters that match an index, but overall it won’t help you make queries faster. At the same time, you’ll certainly experience the downsides of this strategy. Multiple indexes significantly slow down data inserts because ClickHouse will need to both write the data in the correct place and then update indexes. When you want to create indexes in a production cluster, select the columns that correlate with the primary key. By Vladimir Ivoninskii Cache Wisely: How You Can Prevent Distributed System Failures Caching is often implemented as a generic solution when we think about improving the latency and availability characteristics of dependency service calls. Latency improves as we avoid the need to make the network round trip to the dependency service, and availability improves as we don’t need to worry about temporary downtimes of the dependency service given that the cache serves the required response that we are looking for. It is important to note that caching does not help if our requests to a dependency service lead to a distinct response every time, or if a client makes vastly different request types with not much overlap between responses. There are also additional constraints to using caching if our service cannot tolerate stale data. We won’t be delving into caching types, techniques, and applicability as those are covered broadly on the internet. Instead, we will focus on the less talked about risk with caching that gets ignored as systems evolve, and this puts the system at risk of a broad outage. When To Use Caching In many cases, caching is deployed to mask known scaling bottlenecks with dependency service or caching takes over the role to hide a potential scaling deficiency of dependency service over time. For instance, as our service starts making reduced calls to dependency service, they start believing that this is the norm for steady-state traffic. If our cache hit rate is 90%, meaning 9/10 calls to the dependency service are served by the cache, then the dependency service only sees 10% of the actual traffic. If client-side caching stops working due to an outage or bug, the dependency service would see a surge in traffic by 9x! In almost all cases, this surge in traffic will overload the dependency service causing an outage. If the dependency service is a data store, this will bring down multiple other services that depend on that data store. To prevent such outages, both the client and service should consider following recommendations to protect their systems. Recommendations For clients, it is important to stop treating the cache as a "good to have" optimization, and instead treat it as a critical component that needs the same treatment and scrutiny as a regular service. This includes monitoring and alarming on cache hit ratio threshold as well as overall traffic that is sent to the dependency service. Any update or changes to caching business logic also need to go through the same rigor for testing in development environments and in the pre-production stages. Deployments to servers participating in caching should ensure that the stored state is transferred to new servers that are coming up post-deployment, or the drop in cache hit rate during deployment is tolerable for the dependency service. If a large number of cache-serving servers are taken down during deployments, it can lead to a proportional drop in cache hit ratio putting pressure on dependency service. Clients also need to implement guardrails to control the overall traffic, measured as transaction per service (TPS), to dependency service. Algorithms like token buckets can help restrict TPS from the fleet when the caching fleet goes down. This needs to be periodically tested by taking down caching instances and seeing how clients send traffic to the dependency service. Clients should also think about implementing a negative caching strategy with a smaller Time-to-live (TTL). Negative caching means that the client will store the error response from the dependency service to ensure the dependency service is not bombarded with retry requests when it is having an extended outage. Similarly, on the service side, load-shedding mechanisms need to be implemented to protect the service from getting overloaded. Overloaded in this case means that the service is unable to respond within the client-side timeout. Note that as the service load increases, it is usually manifested with increased latency as server resources are overused, leading to slower response. We want to respond before the client-side timeout for a request and start rejecting requests if the overall latency starts breaching the client-side timeout. There are different techniques to prevent overloading; one of the simplest techniques is to restrict the number of connections from the Application Load Balancer (ALB) to your service host. However, this could mean indiscriminate dropping of requests, and if that is not desirable, then prioritization techniques could be implemented in the application layer of service to drop less important requests. The objective of load shedding is to ensure that the service protects the goodput, i.e., requests served within the client side timeout, as overall load grows on the service. The service also needs to periodically run load tests to validate the maximum TPS handled by the service host, which allows fine-tuning of the ALB connection limit. We introduced a couple of techniques to protect the goodput of a service which should be widely applicable but there are more approaches that readers can explore depending on their service need. Conclusion Caching offers immediate benefits for availability and latency at a low cost. However, neglecting the areas we discussed above can expose hidden scaling bottlenecks when the cache goes down, potentially leading to system failures. Regular diligence to ensure the proper functioning of the system even when the cache is down is crucial to prevent catastrophic outages that could affect your system's reliability. Here is an interesting read about a large-scale outage triggered by cache misses. By Tejas Ghadge Using Zero-Width Assertions in Regular Expressions Anchors ^ $ \b \A \Z Anchors in regular expressions allow you to specify the context in a string where your pattern should be matched. There are several types of anchors: ^ matches the start of a line (in multiline mode) or the start of the string (by default). $ matches the end of a line (in multiline mode) or the end of the string (by default). \A matches the start of the string. \Z or \z matches the end of the string. \b matches a word boundary (before the first letter of a word or after the last letter of a word). \B matches a position that is not a word boundary (between two letters or between two non-letter characters). These anchors are supported in Java, PHP, Python, Ruby, C#, and Go. In JavaScript, \A and \Z are not supported, but you can use ^ and $ instead of them; just remember to keep the multiline mode disabled. For example, the regular expression ^abc will match the start of a string that contains the letters "abc". In multiline mode, the same regex will match these letters at the beginning of a line. You can use anchors in combination with other regular expression elements to create more complex matches. For example, ^From: (.*) matches a line starting with From: The difference between \Z and \z is that \Z matches at the end of the string but also skips a possible newline character at the end. In contrast, \z is more strict and matches only at the end of the string. If you have read the previous article, you may wonder if the anchors add any additional capabilities that are not supported by the three primitives (alternation, parentheses, and the star for repetition). The answer is that they do not, but they change what is captured by the regular expression. You can match a line starting with abc by explicitly adding the newline character: \nabc, but in this case, you will also match the newline character itself. When you use ^abc, the newline character is not consumed. In a similar way, ing\b matches all words ending with ing. You can replace the anchor with a character class containing non-letter characters (such as spaces or punctuation): ing\W, but in this case, the regular expression will also consume the space or punctuation character. If the regular expression starts with ^ so that it only matches at the start of the string, it's called anchored. In some programming languages, you can do an anchored match instead of a non-anchored search without using ^. For example, in PHP (PCRE), you can use the A modifier. So the anchors don't add any new capabilities to the regular expressions, but they allow you to manage which characters will be included in the match or to match only at the beginning or end of the string. The matched language is still regular. Zero-Width Assertions (?= ) (?! ) (?<= ) (?<! ) Zero-width assertions (also called lookahead and lookbehind assertions) allow you to check that a pattern occurs in the subject string without capturing any of the characters. This can be useful when you want to check for a pattern without moving the match pointer forward. There are four types of lookaround assertions: (?=abc) The next characters are “abc” (a positive lookahead) (?!abc) The next characters are not “abc” (a negative lookahead) (?<=abc) The previous characters are “abc” (a positive lookbehind) (?<!abc) The previous characters are not “abc” (a negative lookbehind) Zero-width assertions are generalized anchors. Just like anchors, they don't consume any character from the input string. Unlike anchors, they allow you to check anything, not only line boundaries or word boundaries. So you can replace an anchor with a zero-width assertion, but not vice versa. For example, ing\b could be rewritten as ing(?=\W|$). Zero-width lookahead and lookbehind are supported in PHP, JavaScript, Python, Java, and Ruby. Unfortunately, they are not supported in Go. Just like anchors, zero-width assertions still match a regular language, so from a theoretical point of view, they don't add anything new to the capabilities of regular expressions. They just make it possible to skip certain things from the captured string, so you only check for their presence but don't consume them. Checking Strings After and Before the Expression The positive lookahead checks that there is a subexpression after the current position. For example, you need to find all div selectors with the footer ID and remove the div part: Search for Replace to Explanation div(?=#footer) “div” followed by “#footer” (?=#footer) checks that there is the #footer string here, but does not consume it. In div#footer, only div will match. A lookahead is zero-width, just like the anchors. In div#header, nothing will match, because the lookahead assertion fails. Of course, this can be solved without any lookahead: Search for Replace to Explanation div#footer #footer A simpler equivalent Generally, any lookahead after the expression can be rewritten by copying the lookahead text into a replacement or by using backreferences. In a similar way, a positive lookbehind checks that there is a subexpression before the current position: Search for Replace to Explanation (?<=<a href=")news/ blog/ Replace “news/” preceded by “<a href="” with “blog/” <a href="news/ <a href="blog/ The same replacement without lookbehind The positive lookahead and lookbehind lead to a shorter regex, but you can do without them in this case. However, these were just basic examples. In some of the following regular expressions, the lookaround will be indispensable. Testing the Same Characters for Multiple Conditions Sometimes you need to test a string for several conditions. For example, you want to find a consonant without listing all of them. It may seem simple at first: [^aeiouy] However, this regular expression also finds spaces and punctuation marks, because it matches anything except a vowel. And you want to match any letter except a vowel. So you also need to check that the character is a letter. (?=[a-z])[^aeiouy] A consonant [bcdfghjklmnpqrstvwxz] Without lookahead There are two conditions applied to the same character here: After (?=[a-z]) is checked, the current position is moved back because a lookahead has a width of zero: it does not consume characters, but only checks them. Then, [^aeiouy] matches (and consumes) one character that is not a vowel. For example, it could be H in HTML. The order is important: the regex [^aeiouy](?=[a-z]) will match a character that is not a vowel, followed by any letter. Clearly, it's not what is needed. This technique is not limited to testing one character for two conditions; there can be any number of conditions of different lengths: border:(?=[^;}]*\<solid\>)(?=[^;}]*\<red\>)(?=[^;}]*\<1px\>)[^;}]* Find a CSS declaration that contains the words solid, red, and 1px in any order. This regex has three lookahead conditions. In each of them, [^;}]* skips any number of any characters except ; and } before the word. After the first lookahead, the current position is moved back and the second word is checked, etc. The anchors \< and \> check that the whole word matches. Without them, 1px would match in 21px. The last [^;}]* consumes the CSS declaration (the previous lookaheads only checked the presence of words, but didn't consume anything). This regular expression matches {border: 1px solid red}, {border: red 1px solid;}, and {border:solid green 1px red} (different order of words; green is inserted), but doesn't match {border:red solid} (1px is missing). Simulating Overlapped Matches If you need to remove repeating words (e.g., replace the the with just the), you can do it in two ways, with and without lookahead: Search for Replace to Explanation \<(\w+)\s+(?=\1\>) Replace the first of repeating words with an empty string \<(\w+)\s+\1\> \1 Replace two repeating words with the first word The regex with lookahead works like this: the first parentheses capture the first word; the lookahead checks that the next word is the same as the first one. The two regular expressions look similar, but there is an important difference. When replacing 3 or more repeating words, only the regex with lookahead works correctly. The regex without lookahead replaces every two words. After replacing the first two words, it moves to the next two words because the matches cannot overlap: However, you can simulate overlapped matches with lookaround. The lookahead will check that the second word is the same as the first one. Then, the second word will be matched against the third one, etc. Every word that has the same word after it will be replaced with an empty string: The correct regex without lookahead is \<(\w+)(\s+\1)+\> It matches any number of repeating words (not just two of them). Checking Negative Conditions The negative lookahead checks that the next characters do NOT match the expression in parentheses. Just like a positive lookahead, it does not consume the characters. For example, (?!toves) checks that the next characters are not “toves” without including them in the match. <\?(?!php) “<?” without “php” after it This pattern will match <? in <?echo 'text'?> or in <?xml. Another example is an anagram search. To find anagrams for “mate”, check that the first character is one of M, A, T, or E. Then, check that the second character is one of these letters and is not equal to the first character. After that, check the third character, which has to be different from the first and the second one, etc. \<([mate])(?!\1)([mate])(?!\1)(?!\2)([mate])(?!\1)(?!\2)(?!\3)([mate])\> Anagram for “mate” The sequence (?!\1)(?!\2) checks that the next character is not equal to the first subexpression and is not equal to the second subexpression. The anagrams for “mate” are: meat, team, and tame. Certainly, there are special tools for anagram search, which are faster and easier to use. A lookbehind can be negative, too, so it's possible to check that the previous characters do NOT match some expression: \w+(?<!ing)\b A word that does not end with “ing” (the negative lookbehind) In most regex engines, a lookbehind must have a fixed length: you can use character lists and classes ([a-z] or \w), but not repetitions such as * or +. Aba is free from this limitation. You can go back by any number of characters; for example, you can find files not containing a word and insert some text at the end of such files. Search for Replace to Explanation (?<!Table of contents.*)$$ <a href="/toc">Contents</a> Insert the link to the end of each file not containing the words “Table of contents” ^^(?!.*Table of contents) <a href="/toc">Contents</a> Insert it to the beginning of each file not containing the words However, you should be careful with this feature because an unlimited-length lookbehind can be slow. Controlling Backtracking A lookahead and a lookbehind do not backtrack; that is, when they have found a match and another part of the regular expression fails, they don't try to find another match. It's usually not important, because lookaround expressions are zero-width. They consume nothing and don't move the current position, so you cannot see which part of the string they match. However, you can extract the matching text if you use a subexpression inside the lookaround. For example: Search for Replace to Explanation (?=\<(\w+)) \1 Repeat each word Since lookarounds don't backtrack, this regular expression never matches: (?=(\N*))\1\N A regex that doesn't backtrack and always fails \N*\N A regex that backtracks and succeeds on non-empty lines The subexpression (\N*) matches the whole line. \1 consumes the previously matched subexpression and \N tries to match the next character. It always fails because the next character is a newline. A similar regex without lookahead succeeds because when the engine finds that the next character is a newline, \N* backtracks. At first, it has consumed the whole line (“greedy” match), but now it tries to match less characters. And it succeeds when \N* matches all but the last character of the line and \N matches the last character. It's possible to prevent excessive backtracking with a lookaround, but it's easier to use atomic groups for that. In a negative lookaround, subexpressions are meaningless because if a regex succeeds, negative lookarounds in it must fail. So, the subexpressions are always equal to an empty string. It's recommended to use a non-capturing group instead of the usual parentheses in a negative lookaround. (?!(a))\1 A regex that always fails: (not A) and A By Peter Kankowski Test Smells: Cleaning up Unit Tests In practical terms, knowing how not to write code might be as important as knowing how to write it. This goes for test code, too; and today, we're going to look at common mistakes that happen when writing unit tests. Although writing unit tests is common practice for programmers, tests are still often treated as second-class code. Writing good tests isn't easy — just as in any programming field, there are patterns and anti-patterns. There are some really helpful chapters on test smells in Gerard Meszaros's book about xUnit patterns — and more great stuff around the internet; however, it's always helpful to have practical examples. Here, we're going to write one unit test quick and dirty, and then improve it like we would in our work. The full example is available on GitHub. One Test's Evolution To begin with, what are we testing? A primitive function: Java public String hello(String name) { return "Hello " + name + "!"; } We begin writing a unit test for it: Java @Test void test() { } And just like that, our code already smells. 1. Uninformative Name Naturally, it's much simpler to just write test, test1, test2, than to write an informative name. Also, it's shorter! But having code that is easy to write is much less important than having code that is easy to read - we spend much more time reading it, and bad readability wastes a lot of time. A name should communicate intent; it should tell us what is being tested. A test communicating its intent So maybe we could name the test testHello, since it's testing the hello function? Nope, because we're not testing a method, we're testing behavior. So a good name would be shouldReturnHelloPhrase: Java @Test void shouldReturnHelloPhrase() { assert(hello("John")).matches("Hello John!"); } Nobody (apart from the framework) is going to call the test method directly, so it's not a problem if the name seems too long. It should be a descriptive and meaningful phrase (DAMP). 2. No arrange-act-assert The name is okay, but now there is too much code stuffed into one line. It's a good idea to separate the preparation, the behavior we're testing, and the assertion about that behavior (arrange-act-assert). Arrange, act, assert Like this: Java @Test void shouldReturnHelloPhrase() { String a = "John"; String b = hello("John"); assert(b).matches("Hello John!"); } In BDD, it's customary to use the Given-When-Then pattern, and in this case, it's the same thing. 3. Bad Variable Names and No Variable Re-Usage But it still looks like it's been written in a hurry. What's "a"? What's "b"? You can sort of infer that, but imagine that this is just one test among several dozen others that have failed in a test run (perfectly possible in a test suite of several thousand tests). That's a lot of inferring you have to do when sorting test results! So — we need proper variable names. Something else we've done in a hurry — all our strings are hard-coded. It's okay to hard-code some stuff — only as long as it's not related to other hard-coded stuff! Meaning, that when you're reading your test, the relationships between data should be obvious. Is "John" in 'a' the same as "John" in the assertion? This is not a question we should be wasting time on when reading or fixing the test. So we rewrite the test like this: Java @Test void shouldReturnHelloPhrase() { String name = "John"; String result = hello(name); String expectedResult = "Hello " + name + "!"; assert(result).contains(expectedResult); } 4. The Pesticide Effect Here's another thing to think about: automated tests are nice because you can repeat them at very little cost — but that also means their effectiveness falls over time because you're just testing the exact same thing over and over. That's called the pesticide paradox (a term coined by Boris Beizer back in the 1980s): bugs build resistance to the thing you're killing them with. It's probably not possible to overcome the pesticide paradox completely — but there are tools that reduce its effect by introducing more variability into our tests, for instance, Java Faker. Let's use it to create a random name: Java @Test void shouldReturnHelloPhrase() { Faker faker = new Faker(); String name = faker.name().firstName(); String result = hello(name); String expectedResult = "Hello " + name + "!"; assert(result).contains(expectedResult); } Good thing we've changed the name to a variable in the previous step — now we don't have to look over the test and fish out all the "Johns." 5. Uninformative Error Messages Another thing we've probably not thought about if we've written the test in a hurry — is the error message. You need as much data as possible when sorting test results, and the error message is the most important source of information. However, the default one is pretty uninformative: java.lang.AssertionError at org.example.UnitTests.shouldReturnHelloPhrase(UnitTests.java:58) Great. Literally the only thing this we know is that the assertion hasn't passed. Thankfully, we can use assertions from JUnit's `Assertions` class. Here's how: Java @Test void shouldReturnHelloPhrase4() { Faker faker = new Faker(); String name = faker.name().firstName(); String result = hello(name); String expectedResult = "Hello " + name + ""; Assertions.assertEquals( result, expectedResult ); } And here's the new error message: Expected :Hello Tanja! Actual :Hello Tanja ...which immediately tells us what went wrong: we've forgotten the exclamation mark! Lessons Learned And with that, we've got ourselves a good unit test. What lessons can we glean from the process? A lot of the problems were caused by us being a bit lazy. Not the good kind of lazy, where you think hard about how to do less work. The bad kind, where you follow the path of least resistance, to just "get it over with." Hard-coding test data, doing cut and paste, using "test" + method name (or "test1", "test2", "test3") as the name of the test are marginally easier to do in the short run, but make the test base much harder to maintain. On the one hand, it is a bit ironic that we've been talking about readability and making tests easier on the eyes, and at the same time turned a 1-line test into 9 lines. However, as the number of tests you're running grows, the practices we're proposing here will save you a lot of time and effort. By Natalia Poliakova AI: Do You Trust It? We have lived in a period of AI shift for the past few years. AI is everywhere: searching, learning, text processing, code review, code writing assistance, and many other systems have arisen in recent years. It seems everyone is eager to apply AI wherever possible even where it might not be needed. I'm not an exception. Under the influence of this wave, I decided to try to create something on my own that would help me in everyday life. So here I will tell you my own story of writing an application with the use of AI, along with some thoughts about it, of course, which are rather contradictory. What Is the Task? As a developer in a distributed team, I usually need to explain my weekly progress to my colleagues. I know that for some it might look contradictory, but we prefer text-based reports over face-to-face communication. All the benefits of this approach have been mentioned many times already (like here, here, and here), and it’s just how we prefer to do it. So, after a while, we came up with a particular document format and structure for our weekly reports. It is called SIMBA. This format is extremely simple: Markdown From: Team Coordinator To: Big Boss CC: Programmer #1, Programmer #2, Friend #1, etc. Subject: WEEK13 Dataset, Requirements, XYZ Hi all, Last week achievements: - Added 100 new files to the Dataset [100%] - Fixed the deployment of XYZ [50%] - Refined the requirements [80%] Next week plans: - To publish ABC package draft - To review first draft of the report Risks: - The server is weak, we may fail the delivery of the dataset, report milestone will be missed. Bye. As you can see, there are only three key parts ("Last week's achievements", "Next week's plans", and "Risks") which we are usually interested in. So, this report is typically short and straightforward. But if you're doing this every week, it can get tedious. Extremely tedious, I would say. Sometimes, it's a real challenge to recall what you were up to at the start of the previous week, which issues you planned to solve, and which are better to leave for the next week. Moreover, you have to keep in mind all possible risks and problems that might arise from the changes you make along the way. So why don't we generate this report automatically? We can create a small app that will generate weekly reports based on developers' GitHub activity. This information should be sufficient to build a detailed weekly report. However, the activity data is often poorly formatted due to the lack of rigid conventions for commits, issues, and pull requests. Even if such formatting existed, it might vary between repositories and projects. And frankly, we don't want to create these strict rules and style guidelines — it’s boring. Instead, we have AI to extract and format all the parts of the report for us. Can You Just Generate It for Us? We don't have much time to write a complex application for this task. We have many other responsibilities at our job, so we simply can't allocate much time for it. Let's start with a straightforward and fast attempt to generate the report. We'll focus on the "Last week's achievements" section now and delegate as much work as possible to AI. Typically, we can assess a developer's work by reviewing completed pull requests — the actual code provided. So, we'll fetch a list of closed pull requests from the previous week using the GitHub API, convert their titles and bodies to simple strings, join them with a ____ delimiter, and send them to the AI with the following prompt: Context: Plain Text You are a developer tasked with composing a concise report detailing your activities and progress for the previous week, intended for submission to your supervisor. Prompt (it's boring): Plain Text Please compile a summary of the work completed in the following Pull Requests (PRs). Each PR should be summarized in a single sentence, focusing more on the PR title and less on implementation details. Group the sentences by repositories, each identified by its name mentioned in the 'repository:[name]' attribute of the PR. The grouping is important an should be precise. Ensure that each sentence includes the corresponding issue number as an integer value. If a PR doesn't mention an issue number, just print [#chore]. Combine all the information from each PR into a concise and fluent sentence, as if you were a developer reporting on your work. Please strictly adhere to the example template provided. Example of a report: #{example}. List of Pull Requests: [#{prs}]"} That is it. We didn't do any grouping programmatically; we didn't prepare data; we didn't even write the prompt ourselves. I asked AI to generate it for us, of course. (So, am I a prompt engineer?) And... we have great results. Plain Text Last week achievements. jeo-maven-plugin: - Implemented disassembling of remaining instructions [#509] - Identified the problem with switch statement disassembling [#488] - Updated Qulice version to 0.22.2 [#chore] - Handled all bytecode instructions and updated plugin version [#488] - Improved performance of integration tests by optimizing cache usage [#499] - Made label identifier accessible in XmlLabel class [#497] opeo-maven-plugin: - Updated Jeo version to 0.3.4 [#190] - Enabled all integration tests and improved label handling [#189] It did manage to combine parts of each PR into concise, readable sentences — exactly what we need. However, it's hard to show you here, but the AI got confused and mixed up several pull requests across different repositories, losing some items from the report in the process. So, for now, we can review the text in the report manually, add any missing points, and fix a few sentences to restore their meaning. Once that's done, we'll be ready to send the first version of our report. Good. Going further, I won't include all the results because they would make the text excessively long. However, if you are really interested, I have published the complete history of the results I obtained along the way. Additionally, I have the repository with all the code, so you can check it as well. What About the Future? For the "Next week's plans" section, we can follow a similar approach since there is nothing special. The only difference is the source of data. In our team, we don't have any special software to track tasks like boards, backlog, and similar. We use plain GitHub issues, as many other open-source projects do. Hence, we can focus on issues opened by a developer in the last month, as these are the ones we will likely address sooner. Of course, most of them won't be resolved during the next week, so the developer will need to remove the ones they won't solve during the following week. In other words, we can get a list of issues created by a developer for the last month, join them using ____ delimiter, and send them with the following prompt. Plain Text Please compile a summary of the plans for the next week using the following GitHub Issues descriptions. Each issue should be summarized in a single sentence, focusing more on the issue title and less on implementation details. Group the sentences by repositories, each identified by its name mentioned in the 'repository:[name]' attribute of the issue. Pay attention, that you din't loose any issue. The grouping is important an should be precise. Ensure that each sentence includes the corresponding issue number as an integer value. If an issue doesn't mention an issue number, just print [#chore]. Combine all the information from each Issue into a concise and fluent sentences, as if you were a developer reporting on your work. Please strictly adhere to the example template provided: #{example_plans}. List of GitHub issues to aggregate: [#{issues}]. And again, we got more or less appropriate results in a human-readable format that are almost ready to be presented to the team. Plain Text Next week plans: jeo-maven-plugin: - Refactor Annotations Implementation in BytecodeAnnotation.java for simplification and readability [#532] - Investigate and fix the issue of automatic frame computation in CustomClassWriter to prevent test failures [#528] - Enable 'spring' Integration Test in pom.xml by adding support for various Java features [#488] Moreover, sometimes AI is smart enough to improve the report even without any special instructions from us. For example, once it was able to group a list of separate issues with similar content. Plain Text opeo-maven-plugin: - Add unit tests for the XmlParam class [#598], XmlAttributes class [#595], XmlAttribute class [#594], DirectivesNullable class [#593], DirectivesAttributes class [#592], and DirectivesAttribute class [#591] to improve code coverage and code quality. However, here we also encountered the same problems with the structure, formatting, and confusion as in the "Last week's achievements" section. So, we still need to perform some editing before sending the report. P.S. After several weeks, cleaning up plans that we don't want to address soon might become extremely tedious. To simplify this task, we might add (which I did) a label for the issues we plan to solve in the near future. Risks Now let's move to the most exciting part: risk identification, specifically our last "Risks" section in the report. Typically, developers mention some risks and possible problems in PR descriptions. Actually, they can be mentioned anywhere, but let's start with something simple. We can ask AI to generate the following prompt to identify risks from pull request descriptions: Plain Text Please compile a summary of the risks identified in some repositories. If you can't find anything, just leave answer empty. Add some entries to a report only if you are sure it's a risk. Developers usually mention some risks in pull request descriptions. They either mention 'risk' or 'issue'. I will give you a list of pull requests. Each risk should be summarized in a single sentence. Ensure that each sentence includes the corresponding issue number or PR number as an integer value. If a PR or an issue doesn't mention an issue number, just print [#chore]. Combine all the information from each PR into a concise and fluent sentence, as if you were a developer reporting on your work. Please strictly adhere to the example template provided. Example of a report: #{example_risks}. List of Pull Requests: ```#{all}```. Unfortunately, this time it doesn't work as expected. Not all code changes carry risks, so the AI often tries to invent new risks where there are none. Sometimes, it simply repeats the PR description without identifying any problems. Other times, it prints risks from the example provided instead of from the real data. It also frequently confuses PR numbers. In other words, it is a mess. Most likely, the key problem is with our prompt. I tried several modifications, but the results remain more or less the same. So, the only option we have is to give some clues to the AI and start writing all PR descriptions as clearly as possible. And... surprisingly, it helps. For this PR description: Plain Text During the implementation of this issue, I identified some problems which might cause issues in the future: Some of the decompiled object values look rather strange, especially the field default values - they have the '--' value. We need to pay attention to the mapping of these values and fix the problem. For now, it doesn't create any issues, but it's better to deal with it somehow. We successfully identified the risk: Plain Text Risks: jeo-maven-plugin: - In PR 'Update All Project Dependencies', there is a risk related to strange decompiled object values with -- default values that may need attention in the future [#199]. The more human-readable messages we leave, the easier it is for AI to analyze results. (Who would've thought, right?) As a result, we've now developed much better-styled, grammatically correct, and descriptive messages in our issues and pull requests that are more understandable. So, it’s a nice improvement for people who read our PRs, not just for AI processing. However, I should admit that in some cases when I need to go beyond that, I can leave additional markers like "Risk 1: ..., 'Risk 2:..." in the text (as I did here) to get more precise answers from the AI. By doing this, the AI almost doesn't make any mistakes. But do we really need the AI in this case at all? As you can see, it's exactly what we initially didn't want to do at all – structure text and add meta information to PRs and issues. How ironic. Let’s Improve It? Even though we've implemented all these parts, we still have to handle much of the work, including structuring, formatting, and ensuring each generated sentence makes sense. I'm unsure if we can somehow fix the issue related to meaning verification. For now, it's just easier to do it manually. Consequently, we're left with structural and formatting problems. We have several options that we can apply to improve our reports. The first thing we can improve is the entire report style. Since we made three separate requests, the responses predictably came back in different formats. To illustrate this, take a look at the report we generated. Plain Text Last week achievements: jeo-maven-plugin: * Remove Mutable Methods [#352] Next week plans: opeo-maven-plugin: - Fix 'staticize' optimization [#207] Risks: jeo-maven-plugin: - The server is weak, we may fail the delivery of the dataset, report milestone will be missed [#557]. We have at least one simple and fast solution to this problem. Can you guess which one? That's right, let's throw even more AI at it. More and more AI! Alright, let's not get carried away. For now, we can just add one more request. Plain Text I have a weekly report with different parts that use various formatting styles. Please format the entire report into a single cohesive format while preserving the original text without any changes. Ensure that the formatting is consistent throughout the document. Here is the report: #{report} And it works. Plain Text Last week achievements: jeo-maven-plugin: - Remove Mutable Methods [#352] Next week plans: opeo-maven-plugin: - Fix 'staticize' optimization [#207] Risks: jeo-maven-plugin: - The server is weak, we may fail the delivery of the dataset, report milestone will be missed [#557]. However, we have different formatting styles between reports now, which is okay for our task, though it looks a bit strange since each week we send differently formatted reports. Maybe it only gives the impression of a real person. The second improvement we can apply to our reports is to use a better AI model. I haven't mentioned this yet; all the previous requests we made were with an old but relatively cheap model, gpt-3.5-turbo. So, to provide a clear experiment, let's spend a bit more money to check out the newest gpt-4o model. It works much better. It is subjective, of course, but my perception tells me that the results look better in most cases. Again, you can check the difference here. The final improvement involves the format of the input data for the pull requests and issues we submit to the AI. Initially, as you remember, we didn't spend much time preparing the data. However, we can switch from unstructured text with delimiters to JSON. And it appears that the AI makes fewer mistakes with well-formatted data. In summary, we can continue building more pipelines with chained requests, spending more money, formatting the input data, and so on. While this may yield some gains, do we really need to spend more time on these tasks? I don't think so. Moreover, I strongly feel that these problems could be solved more easily programmatically, even without using AI. Therefore, I believe that our current solution is sufficient, and it's better to stop now. What Do We Have in the End? Let's agree: we completely changed the original task. We formatted the pull request and issue descriptions and added meta information like the labels and 'Risk' markers. Moreover, we spent significant time developing these scripts, configuring data, and adjusting prompts, which we initially wanted to avoid altogether. We still need to validate the report; we can't blindly trust it. And I wonder if, after all these changes, we still need an AI at all. However, did we fail in our attempt to build an AI-based application? I can't say that. Things are not so dramatically bad. Let's take a look at what we have. We started the development very quickly. Very quickly. Initially, we didn't do anything special in terms of formatting or data preparation for AI analysis. Just a simple prompt with data, and we got raw full-of-mistakes results. But we got results! In a few minutes. Later, when we needed to make our system more precise, we gradually added more code to it. We specified the solution, added meta-information, improved prompts, built a chain of requests, and so on. So, I can illustrate my observations about this development process as follows: The more you develop, the more you trust it. Final Note These days, we are experiencing significant growth in AI tools. Many of these tools have already been integrated into our work processes. They can generate code or unit tests very effectively, as well as documentation or well-written code comments. Moreover, as I have mentioned, in some cases, AI indirectly improves our systems. So, there is definite progress in many areas. Most importantly, AI might significantly change the software development process itself in the future. However, in our example with programmers' activity, the situation is still far from perfect. Clearly, we still can't assign such tasks to AI without our intervention, and I'm unsure if we ever will. If we look at other similar systems for code review or PR description summarization, for example, they lack accuracy and also produce many errors. Hence, over time, we start to view the outputs of such systems as noise and simply ignore the results. In other words, we just can't trust them. While it is possible and even likely that this will change in the future, for now, I'm still rather skeptical about AI. We still need to control and verify its outputs, refine the code to increase precision, build sophisticated chains of prompts, and more. And even after all these efforts, we still cannot blindly trust AI. Perhaps these are just my concerns. What about you? Do you trust it? By Volodya Lombrozo API Implementation on AWS Serverless Architecture This article describes the implementation of RESTful API on AWS serverless architecture. It provides a detailed overview of the architecture, data flow, and AWS services that can be used. This article also describes the benefits of the serverless architecture over the traditional approach. What Is Serverless Architecture? Serverless architecture, also known as serverless computing or function as a service, is a software design approach that allows developers to build and run applications without managing the underlying infrastructure. A cloud service provider is responsible for managing and scaling the cloud infrastructure, including provisioning servers to run applications, databases, and storage. Importance of Serverless Architecture Businesses only pay for the computing resources they use (e.g., number of requests, execution time, and resources consumed), so there are no upfront costs for hardware or software. This eliminates the need to pay for idle infrastructure, leading to significant cost savings. Serverless architectures automatically scale up and down in response to the workload. This ensures that applications can handle varying levels of traffic. Each function can scale independently, ensuring that resources are allocated efficiently based on demand. Serverless architecture is well-suited for event-driven applications, where functions are triggered by specific events such as HTTP requests, database changes, or message queue updates. AWS Services To Be Used for Implementation The following AWS services can be incorporated into the implementation of the REST API. The list below mentions the AWS service along with its purpose in the API implementation. Route53 Route53 can be used for domain registration, DNS routing, traffic flow, traffic management, health checks, and monitoring. API Gateway Use the API Gateway for creating, publishing, maintaining, monitoring, and securing REST APIs at any scale. HTTP methods (GET,POST, PUT, DELETE, PATCH, OPTION) can be created under the API Gateway. These methods can be integrated into the respective front controller Lambda function. WAF AWS WAF (web application firewall) helps you protect against common web exploits and bots that can affect availability, compromise security, or consume excessive resources. We can associate the WAF with an API gateway to filter out malicious requests. With WAF we can configure the following: Web ACLs – Rules and rule groups to determine the traffic to be allowed Custom rule - IP set match conditions, string and regex match conditions, geo match conditions, rate-based rules Bot Control Lambda Lambda Function for Authorization The Lambda authorizer takes the caller's identity as the input and returns an IAM policy as the output. Use a Lambda authorizer to implement a custom authentication and authorization. Lambda after authentication and authorization lambda returns two types of policies to the API Gateway: Allow Deny Lambda Functions for Business Logic Lambda functions to implement business logic, call other lambda functions, downstream services, and databases. Other AWS Services CloudWatch – Use AWS CloudWatch to monitor your application and store logs, dashboards, and alerts that can also be created for reports and proactive monitoring. SQS and SNS – Use AWS SQS to store asynchronous messages and SNS to push notifications to lambda functions. Dynamo DB or RDS – Application database IAM – Identity and access management service to define roles and accesses to your AWS resources VPC, Subnet, Security Groups - VPC isolates AWS resources in a secure network, Subnets segment the VPC for organization, and Security Groups control traffic with firewall rules. Architecture and Data Flow The architecture diagram below describes the set of AWS services used, data flow, and integration with other services. At a high level, the client sends an HTTP request to Amazon API Gateway, which triggers an AWS Lambda function. The Lambda function processes the request, interacts with other AWS services if needed (such as DynamoDB for data storage), and returns a response back to API Gateway, which then sends the response to the client. Data Flow Steps The user makes an HTTP request to API with valid authorization headers (i.e., JWT token, API keys, etc.). Route 53 forwards the request to API Gateway which will be intercepted by web application firewall. Web application firewalls have different rules configured to protect applications from web attacks. If the firewall detects any such malicious request, it blocks the request immediately, or else forwards it to the API Gateway. Lambda Authorizer configured with API Gateway intercepts the request and authenticates and authorizes the user request. If the user is authorized to access the underlying resource, the request will be forwarded to the front controller lambda. Front controller lambda delegates the request to respective service lambda functions. As per the business logic, service lambda processes the request and returns the appropriate response to the client. While processing the request, service lambda functions can call downstream REST APIs or databases. Service lambda functions also listen to SNS queues or subscribe to SNS. Identity and access management (IAM) service is used to define roles to resources and provide access to those roles. All resources will push the application logs to CloudWatch for monitoring and troubleshooting purposes. Typical Use Cases Serverless architecture can be utilized for event-driven applications where data needs to be processed in real-time, such as data stream or notification processing. Microservices can be implemented and deployed independently and in isolation on serverless architecture for better scalability. The application to process scheduled tasks can be implemented and deployed on serverless architecture which can be triggered based on a particular time. All those use cases where cost is a critical component can go for serverless architecture. Infrastructure Provisioning and Deployment In an enterprise, there are multiple environments available apart from production for development and testing purposes. Creating the same set of resources in different environments and tracking configuration changes manually can be a challenging task and may introduce errors. To address this issue, Terraform (infrastructure as a code) can be used. Terraform helps to replicate the resources from one environment to another. Along with that, it also tracks the state of the infrastructure. Deployment can be automated by any CI/CD tool (such as Jenkins or GitLab) with Terraform. Conclusion In conclusion, leveraging AWS serverless architecture for developing REST APIs offers multiple advantages in terms of scalability, cost-effectiveness, and ease of management. By adopting a serverless approach, developers can focus more on building robust APIs without the overhead of managing servers. AWS Lambda's event-driven model allows for seamless scaling, ensuring your APIs can handle varying workloads efficiently. By Shailesh Hurdale CULTURE AND METHODOLOGIES Agile Career Development Methodologies Team Management In-Sprint Software Automation: Revolutionizing Agile Development July 16, 2024 by Sandip Gami Agile Teams as Investors July 16, 2024 by Stefan Wolpers CORE Operational Testing Tutorial: Comprehensive Guide With Best Practices May 16, 2023 by Harshit Paul DATA ENGINEERING AI/ML Big Data Databases IoT When Not to Choose Google Apache Kafka for BigQuery July 16, 2024 by Kai Wähner CORE Oracle SQL: Understanding Execution Plan and Performance Tuning July 16, 2024 by Raosaheb Ghule Explainable AI: Making the Black Box Transparent May 16, 2023 by Yifei Wang CORE SOFTWARE DESIGN AND ARCHITECTURE Cloud Architecture Integration Microservices Performance API Versioning in Microservices Architecture July 16, 2024 by Gaurav Shekhar Datafaker Gen: Leveraging BigQuery Sink on Google Cloud Platform July 16, 2024 by Roman Rybak Low Code vs. Traditional Development: A Comprehensive Comparison May 16, 2023 by Tien Nguyen CODING Frameworks Java JavaScript Languages Tools When Not to Choose Google Apache Kafka for BigQuery July 16, 2024 by Kai Wähner CORE Oracle SQL: Understanding Execution Plan and Performance Tuning July 16, 2024 by Raosaheb Ghule Scaling Event-Driven Applications Made Easy With Sveltos Cross-Cluster Configuration May 15, 2023 by Gianluca Mardente TESTING, DEPLOYMENT, AND MAINTENANCE Deployment DevOps and CI/CD Maintenance Monitoring and Observability Selenium Grid Tutorial: Parallel Testing Guide With Examples July 16, 2024 by Faisal Khatri API Versioning in Microservices Architecture July 16, 2024 by Gaurav Shekhar Low Code vs. Traditional Development: A Comprehensive Comparison May 16, 2023 by Tien Nguyen POPULAR AI/ML Java JavaScript Open Source Understanding Big O Notation in Python July 16, 2024 by Shital Kat Virtual Threads: A Game-Changer for Concurrency July 16, 2024 by Gautham Krishnan Five IntelliJ Idea Plugins That Will Change the Way You Code May 15, 2023 by Toxic Dev * * * * ABOUT US * About DZone * Send feedback * Community research * Sitemap ADVERTISE * Advertise with DZone CONTRIBUTE ON DZONE * Article Submission Guidelines * Become a Contributor * Core Program * Visit the Writers' Zone LEGAL * Terms of Service * Privacy Policy CONTACT US * 3343 Perimeter Hill Drive * Suite 100 * Nashville, TN 37211 * support@dzone.com Let's be friends: * * * *