Week 33 / 2023
Categories: aws frappe/erpnext data engineering
Tags: python
AWS
- cli installation guide
- 3 zones of networking
Frappe/ERPNext
frappe.flags
a module used to store and manage global variables or flags that can be accessed throughout the application. allows developers to set and access variables that can affect the behavior of different parts of the application. These flags are global in nature and can be used to control various aspects of the application's behavior on a per-request basis. Keep in mind that using global flags should be done with caution.
FrappeCloud
- A managed hosting service purpose-built for Frappe Framework based sites.
- However, these sites are limited by CPU hours per day. This means if there are a lot of users using the site, the site may slow down if the CPU usage limit is reached.
- You can see which tables/doctypes are using most space within your site itself. Search for Database storage report
- More often than not, log tables can take up a lot of space. You can control size of log tables with: Log Settings
ch9: Serving Data for Analytics, Machine Learning, and Reverse ETL
General Considerations for Serving Data
- Trust:
- trust is the root consideration in serving data; end users need to trust the data they’re receiving.
- A loss of trust is often a silent death knell for a data project, even if the project isn’t officially canceled until months or years later.
- High-quality data is of little value if it’s not available as expected when it’s time to make a critical business decision.
- an SLA tells users what to expect from your data product; it is a contract between you and your stakeholders.
- “Data will be reliably available and of high quality.”
- An SLO is a key part of an SLA and describes the ways you’ll measure performance against what you’ve agreed to.
- “Our data pipelines to your dashboard or ML workflow will have 99% uptime, with 95% of data free of defects.”
- What’s the Use Case, and Who’s the User?:
- The use case for data goes well beyond viewing reports and dashboards.
- Data is at its best when it leads to action.
- in seeking use cases, always ask, “What action will this data trigger, and who will be performing this action?,” with the appropriate follow-up question, “Can this action be automated?”
- Whenever possible, prioritize use cases with the highest possible ROI. Data engineers love to obsess over the technical implementation details of the systems they build while ignoring the basic question of purpose. Engineers want to do what they do best: engineer things. When engineers recognize the need to focus on value and use cases, they become much more valuable and effective in their roles.
- Again, always approach data engineering from the perspective of the user and their use case.
- ...
- A classic engineering mistake is simply building without understanding the requirements, needs of the end user, or product/market fit. A data product no one uses is a failure, no matter how well it’s built.
- Nothing will ruin the adoption of a data product more than unwanted utility and loss of trust in the data outputs.
- Self-Service or Not?
- Self-service data is a powerful concept, but it’s not always the right choice.
- What’s better than just giving the end user the ability to directly create reports, analyses, and ML models?
- The first data-serving use case you’ll likely encounter is analytics, which is discover‐ ing, exploring, identifying, and making visible key insights and patterns within data.
- the differences between business analytics (usually BI), operational analytics, and embedded analytics. Each of these analytics categories has different goals and unique serving requirements.
- Business Analytics:
- Business analytics uses historical and current data to make strategic and actionable decisions.
- Business analytics typically falls into a few big areas—dashboards, reports, and ad hoc analysis.
- Operational Analytics:
- Operational analytics versus business analytics = immediate action versus actionable insights
- The big difference between operational and business analytics is time.
- Embedded Analytics:
- Embedded analytics is the integration of analytics capabilities and data into an application or product.
- Machine Learning:
- Here are some areas of ML that we think a data engineer should be familiar with:
- The difference between supervised, unsupervised, and semisupervised learning.
- The difference between classification and regression techniques.
- The various techniques for handling time-series data. This includes time-series analysis, as well as time-series forecasting.
- When to use the “classical” techniques (logistic regression, tree-based learning, support vector machines) versus deep learning. We constantly see data scientists immediately jump to deep learning when it’s overkill. As a data engineer, your basic knowledge of ML can help you spot whether an ML technique is appropriate and scales the data you’ll need to provide.
- When would you use automated machine learning (AutoML) versus handcrafting an ML model? What are the trade-offs with each approach regarding the data being used?
- What are data-wrangling techniques used for structured and unstructured data?
- All data that is used for ML is converted to numbers. If you’re serving structured or semistructured data, ensure that the data can be properly converted during the feature-engineering process.
- How to encode categorical data and the embeddings for various types of data.
- The difference between batch and online learning. Which approach is appropriate for your use case?
- How does the data engineering lifecycle intersect with the ML lifecycle at your company? Will you be responsible for interfacing with or supporting ML technologies such as feature stores or ML observability?
- Know when it’s appropriate to train locally, on a cluster, or at the edge. When would you use a GPU over a CPU? The type of hardware you use largely depends on the type of ML problem you’re solving, the technique you’re using, and the size of your dataset.
Cryptography
- A PEM file, which stands for "Privacy Enhanced Mail," is a widely used file format that encompasses various types of encoded data, particularly digital certificates and cryptographic keys
- PEM files are encoded in Base64, a format that converts binary data into a plain text format using a set of 64 different printable characters.
- PEM files can contain one or more elements, often encoded between "BEGIN" and "END" delimiters. These elements are encoded in Base64 and surrounded by header and footer lines that describe the type of encoded data.
- The most common elements found in PEM files include:
-
- Certificates (CRT or CER): These contain public key information and are used to verify the authenticity of a particular entity, such as a website or an email server. Certificates are often issued by Certificate Authorities (CAs) and are a crucial component of the Transport Layer Security (TLS) protocol, which ensures secure communication over the internet.
-
- Private Keys (KEY): These are sensitive cryptographic keys that correspond to a public key in a certificate. Private keys are used to decrypt data encrypted with the corresponding public key and to digitally sign data, providing authentication and integrity for online transactions.
-
- Certificate Signing Requests (CSR): A CSR is generated when a user or organization requests a digital certificate from a CA. The CSR contains the public key and other identifying information. After receiving a CSR, the CA can issue a signed certificate.
-
- Certificate Revocation Lists (CRL): A CRL is a list of certificates that have been revoked before their expiration date. This is important for maintaining the security of a PKI (Public Key Infrastructure) system.
-
- PKCS#7 and PKCS#12: These are formats that can encapsulate multiple certificates, private keys, and related information in a single file. PKCS#7 files often use the .p7b or .p7c extension, while PKCS#12 files use the .p12 or .pfx extension.
-
- Trusted Root Certificates: These are certificates issued by trusted Certificate Authorities and are pre-installed in many operating systems and browsers. They are used to establish a chain of trust for verifying other certificates.
- X.509 is a standard defining the format of public-key certificates. So this format describes a public key, among other information. The public key is used to encrypt data, and the private key is used to decrypt it. The public key is made available to anyone (often through a certificate authority), and the private key is kept secret.
- A PEM file also contains a header and footer describing the type of encoded data:
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
PKCS
- PKCS11 Intro
- PKCS#11 defines the interface between an application and a cryptographic device.
- If unfamiliar with PKCS#11, the reader is strongly advised to refer to PKCS #11: Cryptographic Token Interface Standard.
- PKCS#11 is used as a low-level interface to perform cryptographic operations without the need for the application to directly interface a device through its driver.
- PKCS#11 libraries are specific implementations of the PKCS#11 standard. These libraries provide the actual implementation of the cryptographic operations and device interactions. The libraries are often provided by hardware manufacturers and software vendors to enable applications to utilize the cryptographic capabilities of their devices.
- PKCS#11 libraries are typically distributed as shared libraries (dynamic link libraries) with various file extensions depending on the operating system (e.g., .dll for Windows, .so for Linux, .dylib for macOS).
- If you're looking for a PKCS#11 library to work with, you'll need to obtain it from the manufacturer of the hardware security module or smart card you intend to use.
- Different manufacturers provide their own PKCS#11 libraries that are tailored to their devices. These libraries might come as part of their device's software package or as separate downloads from their official websites.
- Certificate search attributes are criteria or properties that are used to search for specific certificates within a cryptographic token (such as a hardware security module or smart card) using the PKCS#11 API
- A thumbprint, often referred to as a fingerprint or hash value, is a fixed-size representation of a larger data object, typically used to uniquely identify the content of the data. In the context of certificates and cryptographic operations, a thumbprint is commonly used to uniquely identify a digital certificate.
Interview
OOP
- OOP is a programming paradigm that focuses on objects and data rather than actions and logic.
- Formally, an object is a collection of data and associated behaviors.
- It is defined by describing a collection of interacting objects via their data and behavior.
- how to create these levels of abstraction. There are a variety of ways to do this? most design patterns rely on two basic object-oriented principles known as composition and inheritance.
- Composition is the act of collecting several objects together to create a new one. A fossil-fueled car is composed of an engine, transmission, starter, headlights, and windshield, among numerous other parts. The engine, in turn, is composed of pistons, a crank shaft, and valves. The Car object can provide the interface required by a driver, while also giving access to its component parts, which offers the deeper level of abstraction suitable for a mechanic. Those component parts can, of course, be further decomposed into details if the mechanic needs more information to diagnose a problem or tune the engine.
- Because computer systems involve a lot of peculiar concepts, identifying the component objects does not happen as naturally as with real-world valves and pistons.
- Aggregation is almost exactly like composition. The difference is that aggregate objects can exist independently.
- Also, keep in mind that composition is aggregation; aggregation is simply a more general form of composition. Any composite relationship is also an aggregate relationship, but not vice versa.
- three types of relationships between objects: association, composition, and aggregation.
- The
is a
relationship is formed by inheritance. - The
has a
relationship is formed by composition or aggregation. - association is a relationship between two objects. It is the most general form of relationship. In an association, an object can be related to another object without any constraints. For example, a professor can be associated with a student. This association does not impose any constraints on the objects. The professor can be associated with any student, and the student can be associated with any professor.
- Polymorphism is the ability to treat a class differently, depending on which subclass is implemented.
- The literal meaning of polymorphism is the condition of occurrence in different forms.
- It refers to the use of a single type entity (method, operator or object) to represent different types in different scenarios.
1 + 1
,1.0 + 1.0
, and"1" + "1"
are all examples of polymorphism.- There are some functions in Python which are compatible to run with multiple data types.
len()
is one such function. It can run with strings, lists, tuples, and dictionaries. This is polymorphism. - This sort of polymorphism in Python is typically referred to as duck typing: if it walks like a duck or swims like a duck, we call it a duck.
- Duck typing also allows a programmer to extend a design, creating completely different drop-in behaviors the original designers never planned for.
- Multiple inheritance: Most often, it can be used to create objects that have two distinct sets of behaviors. For example, an object designed to connect to a scanner to make an image and send a fax of the scanned image might be created by inheriting from two separate scanner and faxer objects.
- Python has a defined method resolution order (MRO) to help us understand which of the alternative methods will be used. While the MRO rules are simple, avoiding overlap is even simpler. Multiple inheritance as a "mixin" technique for combining unrelated aspects can be helpful. In many cases, though, a composite object may be easier to design.
- MRO is used primarily to obtain the order in which methods should be inherited in the presence of multiple inheritances.
- Python 3 uses the C3 linearization algorithm for MRO.
SOLID
- SOLID is an acronym for five principles of object-oriented programming and design intended to make software designs more understandable, flexible, and maintainable.
- Single Responsibility Principle (SRP): A class should have only one reason to change. This means that a class should have only one responsibility. This principle is closely related to the concept of separation of concerns
- Open/Closed Principle (OCP): Software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification. This means that a class should be easily extendable without modifying the class itself.
- Liskov Substitution Principle (LSP): Objects in a program should be replaceable with instances of their subtypes without altering the correctness of that program.
- Interface Segregation Principle (ISP): Many client-specific interfaces are better than one general-purpose interface. A class should have the smallest interface possible. This is, perhaps, the most important of these principles. Classes should be relatively small and isolated. Interfaces belong to clients, not to hierarchies. Clients should not be forced to depend upon methods that they do not use. In other words, if a class doesn’t use particular methods or attributes, then those methods and attributes should be segregated into more specific classes.
- Dependency Inversion Principle (DIP): One should depend upon abstractions, not concretions. This means that a class should not depend on a concrete class, but on an abstraction (interface or abstract class). The DIP is the most complex of these principles, and it is often misunderstood. It is not about dependency injection, although dependency injection can be used to implement the DIP. The DIP is about decoupling layers in an application. Abstractions should not depend upon details. Details should depend upon abstractions.
Dependency Injection
- In a traditional programming approach, when a class requires the use of another class or service, it creates an instance of that class directly within its own code. This creates tight coupling between classes, making it difficult to change or replace dependencies without affecting the entire system. It also makes unit testing more challenging because you can't easily substitute real dependencies with mock objects or stubs.
- Dependency Injection addresses these issues by inverting the control of object creation and management. Instead of a class creating its own dependencies, the dependencies are provided from the outside (usually in the form of constructor parameters or method arguments).
- This approach has several benefits:
- 1. Decoupling By injecting dependencies from the outside, classes become more independent and can be easily replaced or changed without affecting the rest of the system.
- 2. Testability It becomes easier to create unit tests for classes because you can inject mock objects or stubs for dependencies, enabling isolated testing of individual components.
- 3. Flexibility Swapping or upgrading components becomes less complex, as long as the new component adheres to the same interface or contract. Because dependencies are provided from the outside, they can be changed at runtime, enabling dynamic reconfiguration of the system.
- 4. Configuration: Dependencies can be configured and managed externally, allowing for better control and flexibility in choosing implementations.
- There are several forms of Dependency Injection:
- 1. Constructor Injection: Dependencies are injected through a class's constructor. This is the most common form of DI.
- 2. Method Injection: Dependencies are injected through a class's methods.
- 3. Property Injection: Dependencies are injected through public properties of a class.
- In addition to these forms, there are also frameworks and containers that automate the process of injecting dependencies. These frameworks manage the lifecycle of objects and handle the resolution of dependencies, often through configuration files or annotations.
Design Patterns
- Design patterns are applied to solve a common problem faced by developers in some specific situation. The design pattern is a suggestion as to the ideal solution for that problem, in terms of object-oriented design
- Any one design pattern proposes a set of objects interacting in a specific way to solve a general problem. The job of the programmer is to recognize when they are facing a specific version of such a problem, then to choose and adapt the general pattern to their precise needs.
Iterator Pattern
for
is lightweight wrapper around a set of object-oriented principles. It is a way of iterating over a sequence of items, one at a time, without having to know the details of how the sequence is implemented.- an iterator is an object with a next() method and a done() method; the latter returns True if there are no items left in the sequence.
- Rather than a done() method, Python's iterator protocol raises the StopIteration exception to notify the client that the iterator has completed.
Decorator Pattern
- This is used to combine different kinds of functionality into a single resulting object.
- The Decorator pattern allows us to wrap an object that provides core functionality with other objects that alter this functionality.
- two primary uses of the Decorator pattern:
-
- Enhancing the response of a component as it sends data to a second component
-
- Supporting multiple optional behaviors
- The second option is often a suitable alternative to multiple inheritance. We can construct a core object, and then create a decorator wrapping that core. Since the decorator object has the same interface as the core object, we can even wrap the new object in other decorators.
- When called, the decorator does some added processing before or after calling its wrapped interface. The wrapped object may be another decorator, or the core functionality.
- In Python, because of duck typing, we don't need to formalize these relationships with an official abstract interface definition.
Observer Pattern
- The Observer pattern is useful for state monitoring and event handling situations.
- This pattern allows a given object to be monitored by an unknown and dynamic group of observer objects. The core object being observed needs to implement an interface that makes it observable.
- Whenever a value on the core object changes, it lets all the observer objects know that a change has occurred, by calling a method announcing there's been a change of state
- In Python, the observer can be notified via the
__call()__
method, making each observer behave like a function or other callable object. - This allows tremendous flexibility by decoupling the response to a state change from the change itself.
- the Core class must adhere to a common understanding of observability; specifically, it must provide a list of observers and a way to attach new observers.
- the Observer pattern is useful for saving intermediate states of objects.
- When defining a Protocol, we're relying on mypy to confirm that all observers actually implement the required method.
- An observable object can append an observer, remove an observer, and – most important – notify all the observers of a state change.
Strategy Pattern
- The pattern implements different solutions to a single problem, each in a different object. The core class can then choose the most appropriate implementation dynamically at runtime.
- Each of the implementations should perform the same task, but in different ways. The implementation interfaces need to be identical, and it's often helpful to leverage an abstract base class to make sure the implementations match.
- One common example of the Strategy pattern is sort routines
- startegy pattern vs. abstract factory pattern.
State Pattern
- The goal of the State pattern is to represent state transition systems: systems where an object's behavior is constrained by the state it's in, and there are narrowly defined transitions to other states.
- To make this work, we need a manager or context class that provides an interface for switching states. Internally, this class contains a pointer to the current state. Each state knows what other states it is allowed to be in and will transition to those states depending on the actions invoked upon it.
- The State pattern decomposes the problem into two types of classes: the Core class and multiple State classes. The Core class maintains the current state, and forwards actions to a current state object.
- The State objects are typically hidden from any other objects that are calling the Core object; it acts like a black box that happens to perform state management internally.
Singleton Pattern
- In Python, if someone is using the Singleton pattern, they're almost certainly doing something wrong, probably because they're coming from a more restrictive programming language.
- Singleton is useful in overly object-oriented languages and is a vital part of traditional object-oriented programming
- More relevantly, the idea behind singleton is useful, even if we implement the concept in a totally different way in Python.
- The basic idea behind the Singleton pattern is to allow exactly one instance of a certain object to exist.
- In most programming environments, singletons are enforced by making the constructor private (so no one can create additional instances of it), and then providing a static method to retrieve the single instance. This method creates a new instance the first time it is called, and then returns that same instance for all subsequent calls.
- We don't actually need this. Python provides two built-in Singleton patterns we can leverage:
-
- Modules are singletons. When we import a module, Python creates an instance of that module and stores it in
sys.modules
. All subsequent imports of that module will return the same instance.
- Modules are singletons. When we import a module, Python creates an instance of that module and stores it in
-
- A Python class definition can also be pressed into service as a singleton. A class can only be created once in a given namespace. Consider using a class with class-level attributes as a singleton object. This means defining methods with the @staticmethod decorator because there will never be an instance created, and there's no self variable.
Asynchronous Programming
- Asynchronous IO (async IO): a language-agnostic paradigm (model) that has implementations across a host of programming languages
- Coroutines (specialized generator functions) are the heart of async IO in Python, and we’ll dive into them later on.
- Parallelism consists of performing multiple operations at the same time.
- Multiprocessing is a means to effect parallelism, and it entails spreading tasks over a computer’s central processing units (CPUs, or cores).Multiprocessing is well-suited for CPU-bound tasks
- Concurrency is a slightly broader term than parallelism.
- Threading is a concurrent execution model whereby multiple threads take turns executing tasks.
- concurrency encompasses both multiprocessing (ideal for CPU-bound tasks) and threading (suited for IO-bound tasks).
- Multiprocessing is a form of parallelism, with parallelism being a specific type (subset) of concurrency.
- async IO is a single-threaded, single-process design: it uses cooperative multitasking.
- async IO gives a feeling of concurrency despite using a single thread in a single process.
- To reiterate, async IO is a style of concurrent programming, but it is not parallelism. It’s more closely aligned with threading than with multiprocessing but is very much distinct from both of these and is a standalone member in concurrency’s bag of tricks.
- The white terms represent concepts, and the green terms represent ways in which they are implemented or effected:
![concurrency_bag_of_tricks.png](concurrency’s bag of tricks.)
- Q: How does something that facilitates concurrent code use a single thread and a single CPU core?
Synchronous version: Judit plays one game at a time, never two at the same time, until the game is complete. Each game takes
(55 + 5) * 30 == 1800
seconds, or 30 minutes. The entire exhibition takes24 * 30 == 720
minutes, or 12 hours. Asynchronous version: Judit moves from table to table, making one move at each table. She leaves the table and lets the opponent make their next move during the wait time. One move on all 24 games takes Judit24 * 5 == 120
seconds, or 2 minutes. The entire exhibition is now cut down to120 * 30 == 3600
seconds, or just 1 hour.
-
So, cooperative multitasking is a fancy way of saying that a program’s event loop (more on that later) communicates with multiple tasks to let each take turns running at the optimal time.
-
Luckily, asyncio has matured to a point where most of its features are no longer provisional, while its documentation has received a huge overhaul and some quality resources on the subject are starting to emerge as well.
-
Python’s asyncio package (introduced in Python 3.4) and its two keywords, async and await, serve different purposes but come together to help you declare, build, execute, and manage asynchronous code.
-
a coroutine is a function that can suspend its execution before reaching return, and it can indirectly pass control to another coroutine for some time.
-
The syntax async def introduces either a native coroutine or an asynchronous generator.
-
The keyword
await
passes function control back to the event loop. (It suspends the execution of the surrounding coroutine.) If Python encounters anawait f()
expression in the scope ofg()
, this is how await tells the event loop, “Suspend execution of g() until whatever I’m waiting on—the result of f()—is returned. In the meantime, go let something else run.” -
There’s also a strict set of rules around when and how you can and cannot use async/await.
-
A function that you introduce with async def is a coroutine. It may use await, return, or yield, but all of these are optional.
- Using await and/or return creates a coroutine function. To call a coroutine function, you must await it to get its results.
- to use yield in an async def block. This creates an asynchronous generator, which you iterate over with async for. To call an asynchronous generator, you must use
async for
to get its results. - Anything defined with async def may not use yield from, and anything defined with def may not use await.
-
For now, just know that an awaitable object is either (1) another coroutine or (2) an object defining an .await() dunder method that returns an iterator.
-
Generator-based coroutines will be removed in Python 3.10.
-
A key feature of coroutines is that they can be chained together. (Remember, a coroutine object is awaitable, so another coroutine can await it.) This allows you to break programs into smaller, manageable, recyclable coroutines
HTTP
- an application-layer protocol for transmitting hypermedia documents, such as HTML.
- HTTP follows a classical client-server model, with a client opening a connection to make a request, then waiting until it receives a response.
- HTTP is a stateless protocol, meaning that the server does not keep any data (state) between two requests.
- It is an application layer protocol that is sent over TCP, or over a TLS-encrypted TCP connection, though any reliable transport protocol could theoretically be used.
- Between the client and the server there are numerous entities, collectively called proxies, which perform different operations and act as gateways or caches, for example.
- The user-agent is any tool that acts on behalf of the user. This role is primarily performed by the Web browser, but it may also be performed by programs used by engineers and Web developers to debug their applications.
- A server appears as only a single machine virtually; but it may actually be a collection of servers sharing the load (load balancing), or other software (such as caches, a database server, or e-commerce servers), totally or partially generating the document on demand.
- HTTP headers make this protocol easy to extend and experiment with. New functionality can even be introduced by a simple agreement between a client and a server about a new header's semantics.
- HTTP is stateless: there is no link between two requests being successively carried out on the same connection. This immediately has the prospect of being problematic for users attempting to interact with certain pages coherently, for example, using e-commerce shopping baskets. But while the core of HTTP itself is stateless, HTTP cookies allow the use of stateful sessions. Using header extensibility, HTTP Cookies are added to the workflow, allowing session creation on each HTTP request to share the same context, or the same state.
- A connection is controlled at the transport layer
- Among the two most common transport protocols on the Internet, TCP is reliable and UDP isn't. HTTP therefore relies on the TCP standard, which is connection-based.
- Before a client and server can exchange an HTTP request/response pair, they must establish a TCP connection, a process which requires several round-trips.
REST
- REST is an architectural style.
- REST stands for representational state transfer
- REST is a set of architectural constraints, not a protocol or a standard. API developers can implement REST in a variety of ways.
Monolith vs. Microservices
-
"Monolith" and "Microservices" are architectural styles used in software development to structure and organize applications. Each approach has its own advantages and challenges, and the choice between them depends on various factors, including the project's complexity, team size, scalability requirements, and more. Let's explore both concepts:
-
Monolith:
-
A monolithic architecture involves building the entire application as a single, cohesive unit. In a monolithic application, all components and functionality are tightly integrated and deployed together. Common characteristics of a monolith include:
-
Simplicity: Monoliths are simpler to develop, test, and deploy since everything is bundled together.
-
Ease of Development: Developing features is straightforward as everything shares the same codebase.
-
Single Technology Stack: Monoliths usually use a single technology stack for the entire application.
-
Lower Latency: Communication between components within the application is usually faster, as they are all part of the same process.
-
However, there are challenges associated with monolithic architectures:
-
Scalability: It can be difficult to scale individual components independently. You might need to scale the entire application, even if only certain parts require more resources.
-
Maintenance Complexity: As the application grows, it can become harder to manage and understand the entire codebase.
-
Deployment Risks: Deploying updates or changes might require downtime for the entire application.
-
Technology Diversity: If your application requires different technologies for different components, a monolith might not be the best fit.
-
Microservices:
-
Microservices architecture involves breaking down the application into smaller, loosely-coupled services that can be developed, deployed, and scaled independently. Each service typically focuses on a specific business function or feature. Characteristics of microservices include:
-
Modularity: Services are independently developed and deployed, allowing teams to focus on specific parts of the application.
-
Scalability: Services can be scaled independently, allowing you to allocate resources where needed.
-
Technology Diversity: Different services can use different technologies, languages, and frameworks based on their requirements.
-
Isolation: Failures in one service generally don't affect others, enhancing overall system resilience.
-
Continuous Deployment: Smaller services are easier to deploy frequently without affecting the entire application.
-
However, microservices also have challenges:
-
Complexity: The increased number of services can introduce complexity in terms of communication, coordination, and deployment.
-
Communication Overhead: Communication between services, often through APIs, can introduce latency and potential failure points.
-
Distributed Systems Complexity: Debugging and monitoring distributed services can be more complex.
-
Operational Overhead: Operating and managing multiple services requires additional tools and expertise.
-
In summary, monolithic architectures are well-suited for simpler projects or when you're just starting out, while microservices are beneficial for complex applications that require scalability, flexibility, and are being developed and maintained by larger teams. The choice between the two depends on factors such as project requirements, team expertise, and long-term goals. It's also worth noting that there's a spectrum between these two extremes, and some projects might benefit from a hybrid approach that combines elements of both monoliths and microservices.