diff --git a/.ci/check-format.sh b/.ci/check-format.sh index b272faf7..c6da0c97 100755 --- a/.ci/check-format.sh +++ b/.ci/check-format.sh @@ -6,7 +6,7 @@ set -x for file in ${SOURCES}; do - clang-format-12 ${file} > expected-format + clang-format-18 ${file} > expected-format diff -u -p --label="${file}" --label="expected coding style" ${file} expected-format done -exit $(clang-format-12 --output-replacements-xml ${SOURCES} | egrep -c "") +exit $(clang-format-18 --output-replacements-xml ${SOURCES} | egrep -c "") diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index d4311349..ac0fe073 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -4,7 +4,7 @@ on: [push, pull_request] jobs: semu-linux: - runs-on: ubuntu-22.04 + runs-on: ubuntu-24.04 steps: - uses: actions/checkout@v4 - name: install-dependencies @@ -37,11 +37,11 @@ jobs: shell: bash coding_style: - runs-on: ubuntu-22.04 + runs-on: ubuntu-24.04 steps: - uses: actions/checkout@v4 - name: coding convention run: | - sudo apt-get install -q -y clang-format-12 + sudo apt-get install -q -y clang-format-18 .ci/check-format.sh shell: bash diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 4cd2fd03..3b090198 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -11,6 +11,24 @@ Use your best judgment, and feel free to propose changes to this document in a p This project uses GitHub Issues to track ongoing development, discuss project plans, and keep track of bugs. Be sure to search for existing issues before you create another one. +Initially, it is advisable to create an issue on GitHub for bug reports, feature requests, +or substantial pull requests, as this offers a platform for discussion with both the community and project maintainers. + +Engaging in a conversation through a GitHub issue before making a contribution is crucial to ensure the acceptance of your work. +We aim to prevent situations where significant effort is expended on a pull request that might not align with the project's design principles. +For example, it might turn out that the feature you propose is more suited as an independent module that complements this project, +in which case we would recommend that direction. + +For minor corrections, such as typo fixes, small refactoring, or updates to documentation/comments, +filing an issue is not typically necessary. +What constitutes a "minor" fix involves discretion; however, examples include: +- Correcting spelling mistakes +- Minor code refactoring +- Updating or editing documentation and comments + +Nevertheless, there may be instances where, upon reviewing your pull requests, +we might request an issue to be filed to facilitate discussion on broader design considerations. + Visit our [Issues page on GitHub](https://github.com/sysprog21/semu/issues) to search and submit. ## Coding Convention @@ -21,15 +39,19 @@ However, participation requires adherence to fundamental ground rules: While there is some flexibility in basic style, it is crucial to stick to the current coding standards. Complex algorithmic constructs without proper comments will not be accepted. * External pull requests should include thorough documentation in the pull request comments for consideration. +* When composing documentation, code comments, and other materials in English, + please adhere to the American English (`en_US`) dialect. + This variant should be considered the standard for all documentation efforts. + For instance, opt for "initialize" over "initialise" and "color" rather than "colour". -Software requirement: [clang-format](https://clang.llvm.org/docs/ClangFormat.html) version 12 or later. +Software requirement: [clang-format](https://clang.llvm.org/docs/ClangFormat.html) version 18 or later. This repository consistently contains an up-to-date `.clang-format` file with rules that match the explained ones. -For maintaining a uniform coding style, execute the command `clang-format -i *.[ch]`. +For maintaining a uniform coding style, execute the command `clang-format -i *.{c,h}`. ## Coding Style for Modern C -This coding style is a variant of the K&R style. +This coding style is a variant of the [K&R style](https://en.wikipedia.org/wiki/Indentation_style#K&R). Adhere to established practices while being open to innovation. Maintain consistency, adopt the latest C standards, and embrace modern compilers along with their advanced static analysis capabilities and sanitizers. @@ -37,7 +59,7 @@ and embrace modern compilers along with their advanced static analysis capabilit ### Indentation In this coding style guide, the use of 4 spaces for indentation instead of tabs is strongly enforced to ensure consistency. -Consistently apply a single space before and after comparison and assignment operators to maintain readable code. +Always apply a single space before and after comparison and assignment operators to maintain readable code. Additionally, it is crucial to include a single space after every comma. e.g., ```c @@ -47,19 +69,24 @@ for (int i = 0; i < 10; i++) { } ``` +The tab character (ASCII 0x9) should never appear within any source code file. +When indentation is needed in the source code, align using spaces instead. +The width of the tab character varies by text editor and programmer preference, +making consistent visual layout a continual challenge during code reviews and maintenance. + ### Line length -All lines should typically stay within 80 characters, and longer lines should be wrapped. -There are valid rationales for this practice: -* It encourages concise code writing by developers. +All lines should typically remain within 80 characters, with longer lines wrapped as needed. +This practice is supported by several valid rationales: +* It encourages developers to write concise code. * Smaller portions of information are easier for humans to process. -* It helps users of vi/vim (and potentially other editors) who use vertical splits. +* It assists users of vi/vim (and potentially other editors) who use vertical splits. +* It is especially helpful for those who may want to print code on paper. ### Comments -Multi-line comments shall have the opening and closing characters -in a separate line, with the lines containing the content prefixed by a space -and the `*` characters for alignment, e.g., +Multi-line comments should have the opening and closing characters on separate lines, +with the content lines prefixed by a space and an asterisk (`*`) for alignment, e.g., ```c /* * This is a multi-line comment. @@ -68,20 +95,39 @@ and the `*` characters for alignment, e.g., /* One line comment. */ ``` -Use multi-line comments for more elaborative descriptions or before more -significant logical block of code. +Use multi-line comments for more elaborate descriptions or before significant logical blocks of code. -Single-line comments shall be written in C89 style: +Single-line comments should be written in C89 style: ```c return (uintptr_t) val; /* return a bitfield */ ``` Leave two spaces between the statement and the inline comment. +Avoid commenting out code directly. +Instead, use `#if 0` ... `#endif` when it is intentional. + +All assumptions should be clearly explained in comments. +Use the following markers to highlight issues and make them searchable: +* `WARNING`: Alerts a maintainer to the risk of changing this code. + e.g., a delay loop counter's terminal value was determined empirically and may need adjustment when the code is ported or the optimization level is tweaked. +* `NOTE`: Provides descriptive comments about the "why" of a chunk of code, + as opposed to the "how" usually included in comments. + e.g., a chunk of driver code may deviate from the datasheet due to a known erratum in the chip, + or an assumption made by the original programmer is explained. +* `TODO`: Indicates an area of the code that is still under construction and explains what remains to be done. + When appropriate, include an all-caps programmer name or set of initials before the word `TODO`. + +Keep the documentation as close to the code as possible. ### Spacing and brackets -Use one space after the conditional or loop keyword, no spaces around -their brackets, and one space before the opening curly bracket. e.g., +Ensure that the keywords `if`, `while`, `for`, `switch`, and `return` are always followed by a single space when there is additional code on the same line. +Follow these spacing guidelines: +* Place one space after the keyword in a conditional or loop. +* Do not use spaces around the parentheses in conditionals or loops. +* Insert one space before the opening curly bracket. + +For example: ```c do { /* some operations */ @@ -99,38 +145,87 @@ unsigned obj_len = sizeof(obj_t); Use brackets to avoid ambiguity and with operators such as `sizeof`, but otherwise avoid redundant or excessive brackets. -### Variable names and declarations +Assignment operators (`=`, `+=`, `-=`, `*=`, `/=`, `%=`, `&=`, `|=`, `^=`, `~=`, and `!=`) should always have a space before and after them. +For example: +```c +count += 1; +``` + +Binary operators (`+`, `-`, `*`, `/`, `%`, `<`, `<=`, `>`, `>=`, `==`, `!=`, `<<`, `>>`, `&`, `|`, `^`, `&&`, and `||`) should also be surrounded by spaces. +For example: +```c +current_conf = prev_conf | (1 << START_BIT); +``` + +Unary operators (`++`, `--`, `!`, and `~`) should be written without spaces between the operator and the operand. +For example: +```c +bonus++; +if (!play) + return STATE_QUITE; +``` + +The ternary operator (`?` and `:`) should have spaces on both sides. +For example: +```c +uint32_t max(uint32_t a, uint32_t b) +{ + return (a > b) ? a : b; +} +``` + +Structure pointer (`->`) and member (`.`) operators should not have surrounding spaces. +Similarly, array subscript operators (`[` and `]`) and function call parentheses should be written without spaces around them. -- Ensure that functions, variables, and comments are consistently named using English names/text. +### Parentheses -- Use descriptive names for global variables and short names for locals. -Find the right balance between descriptive and succinct. +Avoid relying on C’s operator precedence rules, as they may not be immediately clear to those maintaining the code. +To ensure clarity, always use parentheses to enforce the correct execution order within a sequence of operations, +or break long statements into multiple lines if necessary. -- Use [snakecase](https://en.wikipedia.org/wiki/Snake_case). -Do not use "camelcase". +When using logical AND (`&&`) and logical OR (`||`) operators, each operand should be enclosed in parentheses, +unless it is a single identifier or constant. +For example: +```c +if ((count > 100) && (detected == false)) { + character = C_ASSASSIN; +} +``` -- Do not use Hungarian notation or other unnecessary prefixing or suffixing. +### Variable names and declarations -- Use the following spacing for pointers: +Ensure that functions, variables, and comments are consistently named using English words. +Global variables should have descriptive names, while local variables can have shorter names. +It is important to strike a balance between being descriptive and concise. +Each variable's name should clearly reflect its purpose. + +Use [snake_case](https://en.wikipedia.org/wiki/Snake_case) for naming conventions, +and avoid using "camelCase." +Additionally, do not use Hungarian notation or any other unnecessary prefixes or suffixes. + +When declaring pointers, follow these spacing conventions: ```c -const char *name; /* const pointer; '*' with the name and space before it */ -conf_t * const cfg; /* pointer to a const data; spaces around 'const' */ -const uint8_t * const charmap; /* const pointer and const data */ -const void * restrict key; /* const pointer which does not alias */ +const char *name; /* Const pointer; '*' with the name and a space before it */ +conf_t * const cfg; /* Pointer to const data; spaces around 'const' */ +const uint8_t * const charmap; /* Const pointer and const data */ +const void * restrict key; /* Const pointer that does not alias */ ``` -- Local variables of the same type should be declared in the same line. +Local variables of the same type should be declared on the same line. +For example: ```c void func(void) { - char a, b; /* OK */ + char a, b; /* OK */ char a; - char b; /* Incorrect: A variable with char type already exists. */ + char b; /* Incorrect: a variable with char type already exists. */ } ``` -- Always include a trailing comma in the last element of structure initialization, including its children, to assist clang-format in correctly formatting structures. However, this can be omitted in very simple and short structures. +Always include a trailing comma in the last element of a structure initialization, +including its nested elements, to help `clang-format` correctly format the structure. +However, this comma can be omitted in very simple and short structures. ```c typedef struct { int width, height; @@ -167,6 +262,20 @@ typedef struct { ### Initialization +Do not initialize static and global variables to `0`; the compiler will do this. +When a variable is declared inside a function, it is not automatically initialized. + +```c +static uint8_t a; /* Global variable 'a' is set to 0 by the compiler */ + +void foo() +{ + /* 'b' is uninitialized and set to whatever happens to be in memory */ + uint8_t b; + ... +} +``` + Embrace C99 structure initialization where reasonable, e.g., ```c static const crypto_ops_t openssl_ops = { @@ -184,13 +293,24 @@ static const uint8_t tcp_fsm[TCP_NSTATES][2][TCPFC_COUNT] = { [TCPS_CLOSED] = { [FLOW_FORW] = { /* Handshake (1): initial SYN. */ - [TCPFC_SYN] = TCPS_SYN_SENT, + [TCPFC_SYN] = TCPS_SYN_SENT, }, }, ... } ``` +Any pointer variable that does not have an initial address should be explicitly initialized to `NULL`. +This practice helps prevent undefined behavior caused by dereferencing uninitialized pointers. + +In accordance with modern C standards (such as C99 and later), it is preferable to define local variables as needed, +rather than declaring them all at the beginning of a function. +Declaring variables close to their first use enhances code readability and helps maintain a clear, logical flow within the function. + +Additionally, static analysis tools can be employed to scan the entire source code before each build, +providing warnings about variables that are used before being properly initialized. +This helps catch potential bugs early and ensures code quality and safety. + ### Control structures Try to make the control flow easy to follow. Avoid long convoluted logic @@ -222,7 +342,16 @@ the semicolon when `for` has expressions: #### Avoid unnecessary nesting levels -Avoid: +It is generally preferred to place the shorter clause (measured in lines of code) first in `if` and `else if` statements. +Long clauses can distract the reader from the core decision-making logic, making the code harder to follow. +By placing the shorter clause first, the decision path becomes clearer and easier to understand, which can help reduce bugs. + +Avoid nesting `if`-`else` statements deeper than two levels. +Instead, consider using function calls or `switch` statements to simplify the logic and enhance readability. +Deeply nested `if`-`else` statements often indicate a complex and fragile state machine implementation, +which can be refactored into a safer and more maintainable structure. + +For example, avoid this: ```c int inspect(obj_t *obj) { @@ -236,19 +365,18 @@ int inspect(obj_t *obj) } ``` -Consider: +Instead, consider this approach: ```c int inspect(obj_t *obj) { if (!cond) return -1; - ... return 0; } ``` -However, do not make logic more convoluted. +However, be careful not to make the logic more convoluted in an attempt to simplify nesting. ### `if` statements @@ -336,6 +464,117 @@ ssize_t hex_write(FILE *stream, const void *buf, size_t len) Do not use old style K&R style C definitions. +Introduced in C99, `restrict` is a pointer qualifier that informs the compiler no other pointer will access the same object during its lifetime, +enabling optimizations such as vectorization. Violating this assumption leads to undefined behavior. +Use `restrict` judiciously. + +For function parameters, place one space after each comma, except at the end of a line. + +### Function-like Macros + +When using function-like macros (parameterized macros), adhere to the following guidelines: +- Enclose the entire macro body in parentheses. +- Surround each parameter usage with parentheses. +- Limit the use of each parameter to no more than once within the macro to avoid unintended side effects. +- Never include control flow statements (e.g., `return`) within a macro. +- If the macro involves multiple statements, encapsulate them within a `do`-`while (0)` construct. + +For example: +```c +#define SET_POINT(p, x, y) \ + do { \ + (p)->px = (x); \ + (p)->py = (y); \ + } while (0) +``` + +While the extensive use of parentheses, as shown above, helps minimize some risks, +it cannot prevent issues like unintended double increments from calls such as `MAX(i++, j++)`. + +Other risks associated with macros include comparing signed and unsigned data or testing floating-point values. +Additionally, macros are not visible at runtime, making them impossible to step into with a debugger. +Therefore, use them with caution. + +In general, macro names are typically written in all capitals, except in cases where readability is improved by using lowercase. +For example: +``` +#define countof(a) (size)(sizeof(a) / sizeof(*(a))) +#define lengthof(s) (countof(s) - 1) +``` + +Although all capitals are generally preferred for constants, +lowercase can be used for function-like macros to improve readability. +These function-like macros do not share the same namespace concerns as other macros. + +For example, consider the implementation of a simple memory allocator. +An arena can be represented by a memory buffer and an offset that begins at zero. +To allocate an object, record the pointer at the current offset, +advance the offset by the size of the object, and return the pointer. +Additional considerations, such as alignment and checking for available space, are also required. +```c +#define new(a, n, t) alloc(a, n, sizeof(t), _Alignof(t)) + +typedef struct { + char *begin, *end; +} arena_t; + +void *alloc(arena_t *a, ptrdiff_t count, ptrdiff_t size, ptrdiff_t align) +{ + ptrdiff_t pad = -(uintptr_t)a->begin & (align - 1); + assert(count < (a->end - a->begin - pad) / size); + + void *result = a->begin + pad; + a->begin += pad + (count * size); + return memset(result, 0, count * size); +} +``` + +Using the `new` macro helps prevent several common errors in C programs. +If types are mixed up, the compiler generates errors or warnings. +Moreover, naming a macro `new()` does not conflict with variables or fields named `new`, +because the macro form does not resemble a function call. + +### Use `const` and `static` effectively + +The `static` keyword should be used for any variables that do not need to be accessible outside the module where they are declared. +This is particularly important for global variables defined in C files. +Declaring variables and functions as `static` at the module level protects them from external access, reducing coupling between modules and improving encapsulation. + +For functions that do not need to be accessible outside the module, use the `static` keyword. +This is especially important for private functions, where `static` should always be applied. + +For example: +```c +static bool verify_range(uint16_t x, uint16_t y); +``` + +The `const` keyword is essential for several key purposes: +- Declaring variables that should not change after initialization. +- Defining fields within a `struct` that must remain immutable, such as those in memory-mapped I/O peripheral registers. +- Serving as a strongly typed alternative to `#define` for numerical constants. + +For example, instead of using: +```c +#define MAX_SKILL_LEVEL (100U) +``` + +Use: +```c +const uint8_t max_skill_level = 100; +``` + +Maximizing the use of `const` provides the advantage of compiler-enforced protection against unintended modifications to data that should be read-only, +thereby enhancing code reliability and safety. + +Additionally, when one of your function arguments is a pointer to data that will not be modified within the function, +you should use the `const` keyword. +This is particularly useful when comparing a character array with predefined strings without altering the array’s contents. + +For example: +```c +static bool is_valid_cmd(const char *cmd); +``` + ### Object abstraction Objects are often "simulated" by the C programmers with a `struct` and @@ -423,13 +662,22 @@ Use `unsigned` for general iterators; use `size_t` for general sizes; use `ssize_t` to return a size which may include an error. Of course, consider possible overflows. -Avoid using `uint8_t` or `uint16_t` or other sub-word types for general -iterators and similar cases, unless programming for micro-controllers or -other constrained environments. +Avoid using fixed-width types like `uint8_t`, `uint16_t`, or other smaller integer types for general iterators or similar cases unless there is a specific need for size-constrained operations, +such as in fixed-width data processing or resource-limited environments. C has rather peculiar _type promotion rules_ and unnecessary use of sub-word types might contribute to a bug once in a while. +Boolean variables should be declared using the `bool` type. +Non-Boolean values should be converted to Boolean by using relational operators (e.g., `<` or `!=`) rather than by casting. + +For example: +```c +#include +... +bool inside = (value < expected_range); +``` + ### Embrace portability #### Byte-order @@ -440,24 +688,48 @@ cases where it is appropriate. #### Types -- Do not assume a particular 32-bit vs 64-bit architecture, e.g., do not -assume the size of `long` or `unsigned long`. Use `int64_t` or `uint64_t` -for the 8-byte integers. +Do not assume a particular 32-bit or 64-bit architecture; for example, do not assume the size of `long` or `unsigned long`. +Instead, use `int64_t` or `uint64_t` for 8-byte integers. + +Fixed-width types, such as `uint32_t`, are particularly useful when memory size is critical, +as in embedded systems, communication protocols requiring specific data sizes, +or when interacting with hardware registers that require precise bit-width operations. +In these scenarios, fixed-width types ensure consistent behavior across different platforms and compilers. -- Do not assume `char` is signed; for example, on Arm it is unsigned. +Do not assume `char` is signed; for example, on Arm architectures, it is unsigned by default. -- Use C99 macros for constant prefixes or formatting of the fixed-width -types. +Avoid defining bit-fields within signed integer types. +Additionally, do not use bitwise operators (such as `&`, `|`, `~`, `^`, `<<`, and `>>`) on signed integer data. +Refrain from combining signed and unsigned integers in comparisons or expressions, as this can lead to unpredictable results. + +When using `#define` to declare decimal constants, append a `U` to ensure they are treated as unsigned. +For example: +```c +#define SOME_CONSTANT (6U) + +uint16_t unsigned_a = 6; +int16_t signed_b = -9; +if (unsigned_a + signed_b < 4) { + /* This block might appear logically correct, as -9 + 6 is -3 */ + ... +} +/* but compilers with 16-bit int may legally interpret it as (0xFFFF – 9) + 6. */ +``` + +It is important to note that certain aspects of manipulating binary data within signed integer containers are implementation-defined behaviors according to ISO C standards. +Additionally, mixing signed and unsigned integers can lead to data-dependent results, as demonstrated in the example above. + +Use C99 macros for constant prefixes or formatting of the fixed-width types. Use: ```c -#define SOME_CONSTANT (UINT64_C(1) << 48) +#define SOME_CONSTANT (UINT64_C(1) << 48) printf("val %" PRIu64 "\n", SOME_CONSTANT); ``` Do not use: ```c -#define SOME_CONSTANT (1ULL << 48) +#define SOME_CONSTANT (1ULL << 48) printf("val %lld\n", SOME_CONSTANT); ``` @@ -467,6 +739,43 @@ Avoid assuming that unaligned access is safe. It is not secure on architectures like Arm, POWER, and others. Additionally, even on x86, unaligned access can be slower. +#### Structures and Unions + +Care should be taken to prevent the compiler from inserting padding bytes within `struct` or `union` types, +as this can affect memory layout and portability. +To control padding and alignment, consider using structure packing techniques specific to your compiler. + +Additionally, take precautions to ensure that the compiler does not alter the intended order of bits within bit-fields. +This is particularly important when working with hardware registers or communication protocols where bit order is crucial. + +According to the C standard, the layout of structures, including padding and bit-field ordering, +is implementation-defined, meaning it can vary between different compilers and platforms. +Therefore, it is essential to verify that the structure's layout meets your expectations, especially when writing portable code. + +For example: +```c +typedef struct { + uint16_t count; /* offset 0 */ + uint16_t max_count; /* offset 2 */ + uint16_t unused0; /* offset 4 */ + uint16_t enable : 2; /* offset 6 bits 15-14 */ + uint16_t interrupt : 1; /* offset 6 bit 13 */ + uint16_t unused1 : 7; /* offset 6 bits 12-6 */ + uint16_t complete : 1; /* offset 6 bit 5 */ + uint16_t unused2 : 4; /* offset 6 bits 4-1 */ + uint16_t periodic : 1; /* offset 6 bit 0 */ +} mytimer_t; + +/* Preprocessor check of timer register layout byte count. */ +#if (sizeof(mytimer_t) != 8) +#error mytimer_t struct size incorrect (expected 8 bytes) +#endif +``` + +To enhance portability, use standard-defined types (e.g., `uint16_t`, `uint32_t`) and avoid relying on compiler-specific behavior. +Where precise control over memory layout is required, such as in embedded systems or when interfacing with hardware, +always verify the structure size and layout using static assertions or preprocessor checks. + #### Avoid extreme portability Unless programming for micro-controllers or exotic CPU architectures, diff --git a/utils.h b/utils.h index 159002b5..b6c872e2 100644 --- a/utils.h +++ b/utils.h @@ -87,7 +87,7 @@ static inline void list_del_init(struct list_head *node) }) #else #define container_of(ptr, type, member) \ - ((type *) ((char *) (ptr) -offsetof(type, member))) + ((type *) ((char *) (ptr) - offsetof(type, member))) #endif #define list_entry(node, type, member) container_of(node, type, member)