Porting Linux applications to 64-bit systems
Tips and techniques for a smooth transition
Harsha S. Adiga (haradiga@in.ibm.com), Software Engineer, IBM
12 Apr 2006
With the pervasiveness of 64-bit architectures, it's more important than ever that your Linux® software be 64-bit ready. Learn how to avoid portability pitfalls when making declarations and assignments, bit shifting, typing, formatting strings, and more.
Linux was one of the first cross-platform operating systems to use 64-bit processors, and now 64-bit systems are becoming commonplace in servers and desktops. Many developers are now facing the need to port applications from 32-bit to 64-bit environments. With the introduction of Intel® Itanium® and other 64-bit processors, making software 64-bit-ready has become increasingly important.
As with UNIX® and other UNIX-like operating systems, Linux uses the LP64 standard, where pointers and long integers are 64 bits but regular integers remain 32-bit entities. Although some high-level languages are not affected by the size differences, others such as the C language may be.
The effort to port an application from 32 bits to 64 bits might range from trivial to very difficult, depending on how these applications were written and maintained. Many subtle issues can cause problems even in a well-written, highly portable application, so this article outlines these issues and suggests ways to deal with them.
32-bit platforms have a number of limitations that are increasingly frustrating to developers of large applications such as databases, especially those developers who wish to take advantage of advances in computer hardware. While scientific calculations normally rely on floating-point mathematics, a few applications such as financial calculations need a narrower numeric range but higher precision than floating point offers. 64-bit math provides this higher precision fixed-point math, with an adequate range. There is much discussion today in the computer industry about the barrier presented by 32-bit addresses. 32-bit pointers can address only 4GB of virtual address space. You can overcome this limitation, but application development becomes more complicated, and performance is significantly reduced.
As far as language implementation is concerned, the current C language standard allows the "long long" data type to be at least 64 bits. However, an implementation may define it as a larger size.
Another area that requires improvement is dates. In Linux, dates are expressed as signed 32-bit integers representing the number of seconds since January 1, 1970. This turns negative in 2038. But in 64-bit systems, dates are expressed as signed 64-bit integers, which extends the usable range.
In summary, the 64-bit architecture has the following advantages:
- A 64-bit application can directly access 4 exabytes of virtual memory, and the Intel Itanium processor provides a contiguous linear address space.
- 64-bit Linux allows for file sizes up to 4 exabytes (2 to the power of 63), a very significant advantage to servers accessing large databases.
Unfortunately, the C programming language does not provide a mechanism for adding new fundamental data types. Thus, providing 64-bit addressing and integer arithmetic capabilities involves changing the bindings or mappings of the existing data types, or adding new data types to the language.
|
|
ILP32
|
LP64
|
LLP64
|
ILP64
|
|
char
|
8
|
8
|
8
|
8
|
|
short
|
16
|
16
|
16
|
16
|
|
int
|
32
|
32
|
32
|
64
|
|
long
|
32
|
64
|
32
|
64
|
|
long long
|
64
|
64
|
64
|
64
|
|
pointer
|
32
|
64
|
64
|
64
|
The difference among the three 64-bit models (LP64, LLP64, and ILP64) lies in the non-pointer data types. When the width of one or more of the C data types changes from one model to another, applications may be affected in various ways. These effects fall into two main categories:
- Size of data objects. The compilers align data types on a natural boundary; in other words, 32-bit data types are aligned on a 32-bit boundary on 64-bit systems, and 64-bit data types are aligned on a 64-bit boundary on 64-bit systems. This means that the size of data objects such as a structure or a union will be different on 32-bit and 64-bit systems.
- Size of fundamental data types. Common assumptions about the relationships between the fundamental data types may no longer be valid in a 64-bit data model. Applications that depend on those relationships will fail when compiled on a 64-bit platform. For example, the assumption sizeof (int) = sizeof (long) = sizeof (pointer) is valid for the ILP32 data model, but not valid for others.
In summary, the compilers align data types on a natural boundary, which means that "padding" will be inserted by the compiler to enforce this alignment, as in a C structure or union. The members of the structure or union are aligned based on their widest member. Listing 1 illustrates this structure.
|
struct test {
int i1;
double d;
int i2;
long l;
}
|
Table 2 shows the size of each member of the structure and the structure size itself on 32-bit and 64-bit systems.
|
Structure member
|
Size on 32-bit system
|
Size on 64-bit system
|
|
struct test {
|
|
|
|
int i1;
|
32-bits
|
32-bits
|
|
|
|
32-bits filler
|
|
double d;
|
64-bits
|
64-bits
|
|
int i2;
|
32 bits
|
32 bits
|
|
|
|
32-bits filler
|
|
long l;
|
32 bits
|
64 bits
|
|
};
|
Structure size 20 bytes
|
Structure size 32 bytes
|
Note here that on a 32-bit system, the compiler may not align the variable d, even though it is a 64-bit object, because the hardware treats it as two 32-bit objects. However, a 64-bit system aligns both d and l causing two 4-byte fillers to be added.
This section shows you how to correct common trouble spots:
- Declarations
- Expressions
- Assignments
- Numeric constants
- Endianism
- Type definitions
- Bit shifting
- Formatting strings
- Function parameters
To enable your code to work on both 32-bit and 64-bit systems, note the following regarding declarations:
- Declare integer constants using "L" or "U", as appropriate.
- Ensure that an unsigned int is used where appropriate to prevent sign extension.
- If you have specific variables that need to be 32-bits on both platforms, define the type to be int.
- If the variable should be 32-bits on 32-bit systems and 64-bits on 64-bit systems, define them to be long.
- Declare numeric variables as int or long for alignment and performance. Don’t try to save bytes using char or short.
- Declare character pointers and character bytes as unsigned to avoid sign extension problems with 8-bit characters.
Expressions
In C/C++, expressions are based upon associativity, precedence of operators and a set of arithmetic promotion rules. To enable your expression to work correctly on both 32-bit and 64-bit systems, note the following rules:
- Addition of two signed ints results in a signed int.
- Addition of an int and a long results in a long.
- If one of the operands is unsigned and the other is a signed int, the expression becomes an unsigned.
- Addition of an int and a double results in a double. Here, the int is converted to a double before addition.
Assignments
Since pointer, int, and long are no longer the same size on 64-bit systems, problems may arise depending on how the variables are assigned and used within an application. A few tips in this regard:
- Do not use int and long interchangeably because of the possible truncation of significant digits. For example, don't do this:
|
int i;
long l;
i = l;
|
- Do not use an int to store a pointer. The following example works on a 32-bit system but fails on a 64-bit system, because a 32-bit integer cannot hold a 64-bit pointer. For example, don't do this:
|
unsigned int i, *ptr;
i = (unsigned) ptr;
|
- Do not use a pointer to store an int. For example, don't do this:
|
int *ptr;
int i;
ptr = (int *) i;
|
- In cases where unsigned and signed 32-bit integers are mixed in an expression and assigned to a signed long, cast one of the operands to its 64-bit type. This will cause the other operands to be promoted to 64-bits and no further conversion is needed when the expression is assigned. Another solution is to cast the entire expression such that sign extension occurs on assignment. For example, consider the problem caused by the following:
|
long n;
int i = -2;
unsigned k = 1;
n = i + k;
|
Arithmetically, the result should be -1 in the expression shown in bold above. But since the expression is unsigned, no sign extension occurs. The solution is to cast one of the operands to its 64-bit type (as in the first line below) or cast the entire expression (as in the second line below):
|
n = (long) i + k;
n = (int) (i + k);
|
Numeric constants
Hexadecimal constants are commonly used as masks or specific bit values. Hexadecimal constants without a suffix are defined as an unsigned int if it will fit into 32-bits and if the high order bit is turned on.
For example, the constant OxFFFFFFFFL is a signed long. On a 32-bit system, this sets all the bits, but on a 64-bit system, only the lower order 32-bits are set, resulting in the value 0x00000000FFFFFFFF.
If you want to turn all the bits on, a portable way to do this is to define a signed long constant with a value of -1. This turns all the bits on since twos-compliment arithmetic is used:
|
long x = -1L;
|
Another problem that might arise is the setting of the most significant bit. On a 32-bit system, the constant 0x80000000 is used. But a more portable way of doing this is to use a shift expression:
|
1L << ((sizeof(long) * 8) - 1);
|
Endianism
Endianism refers to the way in which data is stored, and defines how bytes are addressed in integral and floating point data types.
Little-endian means that the least significant byte is stored at the lowest memory address and the most significant byte is stored at the highest memory address.
Big-endian means that the most significant byte is stored at the lowest memory address and the least significant byte is stored at the highest memory address.
Table 3 shows a sample layout of a 64-bit long integer.
|
|
Low address
|
|
|
|
|
|
|
High address
|
|
Little endian
|
Byte 0
|
Byte 1
|
Byte 2
|
Byte 3
|
Byte 4
|
Byte 5
|
Byte 6
|
Byte 7
|
|
Big endian
|
Byte 7
|
Byte 6
|
Byte 5
|
Byte 4
|
Byte 3
|
Byte 2
|
Byte 1
|
Byte 0
|
For example, the 32-bit word 0x12345678 will be laid out on a big endian machine as follows:
|
Memory offset
|
0
|
1
|
2
|
3
|
|
Memory content
|
0x12
|
0x34
|
0x56
|
0x78
|
If we view 0x12345678 as two half words, 0x1234 and 0x5678, we would see the following in a big endian machine:
|
Memory offset
|
0
|
2
|
|
Memory content
|
0x1234
|
0x5678
|
However, on a little endian machine, the word 0x12345678 will be laid out as follows:
|
Memory offset
|
0
|
1
|
2
|
3
|
|
Memory content
|
0x78
|
0x56
|
0x34
|
0x12
|
Similarly, the two half-words 0x1234 and 0x5678 would look like the following:
|
Memory offset
|
0
|
2
|
|
Memory content
|
0x5678
|
0x1234
|
The following example illustrates the difference in byte order between big endian and little endian machines.
The C program below will print out "Big endian" when compiled and run on a big endian machine, and "Little endian" when compiled and run on a little endian machine.
Listing 2. Big endian vs. little endian
|
#include <stdio.h>
main () {
int i = 0x12345678;
if (*(char *)&i == 0x12)
printf ("Big endian/n");
else if (*(char *)&i == 0x78)
printf ("Little endian/n");
}
|
Endianism is important when:
- Bit masks are used
- Indirect pointers address portions of an object
We have bit fields in C and C++ that help to deal with endian issues. I recommend the use of bit fields rather than mask fields or hexadecimal constants. There are several functions that are used to convert 16-bit and 32-bit from "host-byte-order" to "net-byte-order." For example, htonl (3), ntohl (3) are used to convert 32-bit integers. Similarly, htons (3), ntohs (3) are used for 16-bit integers. However, there is no standard set of functions for 64-bit. But Linux provides the following macros on both big and little endian systems:
- bswap_16
- bswap_32
- bswap_64
Type definitions
I recommend that you do not code your applications with the native C/C++ data types that change size on a 64-bit operating system, but rather use type definitions or macros that explicitly call out the size and type of data contained in a variable. Some type definitions help make the code more portable.
- ptrdiff_t:
A signed integer type that results from subtracting two pointers. - size_t:
An unsigned integer and the result of the sizeof operator. This is used when passing parameters to functions such as malloc (3), and returned from several functions such as fred (2). - int32_t, uint32_t etc.:
Define integer types of a predefined width. - intptr_t and uintptr_t:
Define integer types to which any valid pointer to void can be converted.
Example 1:
The 64-bit return value from sizeof in the following statement is truncated to 32-bits when assigned to bufferSize.
int bufferSize = (int) sizeof (something);
The solution is to cast the return value using size_t and assign it to bufferSize declared as size_t as shown below:
size_t bufferSize = (size_t) sizeof (something);
Example 2:
On a 32-bit system, int and long are of the same size. Due to this, some developers use them interchangeably. This can cause pointers to be assigned to int and vice-versa. But on a 64-bit system, assigning a pointer to an int causes the truncation of the high-order 32-bits.
The solution is to store pointers as pointer types or the special types defined for this purpose, such as intptr_t and uintptr_t.
Bit shifting
Untyped integral constants are of type (unsigned) int. This might lead to unexpected truncation while shifting.
For example, in the following code snippet, the maximum value for a can be 31. This is because the type of 1 << a is int.
long t = 1 << a;
To get the shift done on a 64-bit system, 1L should be used as shown below:
long t = 1L << a;
Formatting strings
The function printf (3) and related functions can be a major source of problems. For example, on 32-bit platforms, using %d to print either an int or a long will usually work, but on 64-bit platforms, this would truncate a long to its least significant 32-bits. The proper specification for a long is %ld.
Similarly, when a small integer (char, short, int) is passed into printf (3), it will be widened to 64-bits and the sign will be extended if appropriate. In the example below, the printf (3) assumes that a pointer is 32-bits.
char *ptr = &something;
printf (%x/n", ptr);
printf (%x/n", ptr);
The above code snippet will fail on 64-bit systems and will display only the lower 4 bytes.
The solution for this is to use the %p specification as shown below, which will work fine on both 32-bit and 64-bit systems.
char *ptr = &something;
printf (%p/n", ptr);
printf (%p/n", ptr);
Function parameters
There are a few things that you need to remember while passing parameters to functions:
- In the case where the data type of the parameter is defined by a function prototype, the parameter is converted to that type according the standard rules.
- When the type of the parameter is not specified, the parameter is promoted to the larger type.
- On a 64-bit system, integral types are converted to 64-bit integral types, and single precision floating point types are promoted to double precision.
- If a return value is not otherwise specified, the default return value for a function is int.
The problem arises when passing the sum of signed and unsigned ints as long. Consider the following case:
Listing 3. Passing the sum of signed and unsigned ints as long
|
long function (long l);
int main () {
int i = -2;
unsigned k = 1U;
long n = function (i + k);
}
|
The above code snippet will fail on 64-bit systems, because the expression (i + k) is an unsigned 32-bit expression, and when promoted to a long, the sign doesn’t extend. The solution is to cast one of the operands to its 64-bit type.
There is another problem on register-based systems where registers are used to pass parameters to functions rather than the stack. Consider the following example:
float f = 1.25;
printf ("The hex value of %f is %x", f, f);
printf ("The hex value of %f is %x", f, f);
On a stack-based system, the appropriate hexadecimal value is printed. But on a register-based system, the hexadecimal value is read from an integer register, not the floating point register.
The solution is to cast the address of the floating point variable to a pointer to an int, which is then de-referenced as shown below:
printf ("The hex value of %f is %x", f, *(int *)&f);
Major hardware vendors have recently expanded their 64-bit offerings because of the performance, value, and scalability that 64-bit platforms can provide. The constraints of 32-bit systems, particularly the 4GB virtual memory ceiling, have spurred companies to consider migrating to 64-bit platforms. Knowing how to port applications to comply with a 64-bit architecture can help you write portable and efficient code.
Learn
- 64-Bit Programming Models: Why LP64? provides more detail on the various 64-bit programming models and argues for LP64.
- Read about the Year 2038 problem that 32-bit systems have in Wikipedia.
- Read "Porting enterprise apps from UNIX to Linux" (developerWorks, February 2005) for tips and insights on porting large, multithreaded applications to Linux.
- "Porting Intel applications to 64 bit Linux PowerPC" gives advice on some of issues to consider when porting Linux from IA32 to PowerPC.
- The Linux distributions site on Linux Online (linux.org) offers an extensive listing of distributions, including those for 64-bit systems.
- The developerWorks Linux on Power Architecture developer's corner is a resource for programmers and developers writing applications for Linux running on POWER-based hardware.
- penguinppc.org is a community site devoted to users of Linux on PowerPC systems.
- In the developerWorks Linux zone, find more resources for Linux developers.
- Stay current with developerWorks technical events and Webcasts.
Get products and technologies
- Order the SEK for Linux, a two-DVD set containing the latest IBM trial software for Linux from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
- With IBM trial software, available for download directly from developerWorks, build your next development project on Linux.
Discuss
- Check out developerWorks blogs and get involved in the developerWorks community.
Harsha Adiga works in the IBM Software Group in Bangalore, India, and is heavily involved in various Linux and open source communities and working groups.
Linux
应用程序移植到64位系统
随着 64 位体系结构的普及,针对 64 位系统准备好您的 Linux 软件已经变得比以前更为重要。在本文中,您将学习如何在进行语句声明、赋值、位移、类型转换、字符串格式化以及更多操作时,防止出现可移植性缺陷。
Linux 是可以使用 64 位处理器的跨平台操作系统之一,现在 64 位的系统在服务器和桌面端都已经非常常见了。很多开发人员现在都面临着需要将自己的应用程序从 32 位环境移植到 64 位环境中。随着 Intel Itanium 和其他 64 位处理器的引入,使软件针对 64 位环境做好准备变得日益重要了。
与 UNIX 和其他类 UNIX 操作系统一样,Linux 使用了 LP64 标准,其中指针和长整数都是 64 位的,而普通的整数则依然是 32 位的。尽管有些高级语言并不会受到这种类型大小不同的影响,但是另外一些语言(例如 C 语言)却的确会受到这种影响。
将应用程序从 32 位系统移植到 64 位系统上的工作可能会非常简单,也可能会非常困难,这取决于这些应用程序是如何编写和维护的。很多琐碎的问题都可能导致产生问题,即使在一个编写得非常好的高度可移植的应用程序中也是如此,因此本文将对这些问题进行归纳总结,并给出解决这些问题的一些方法建议。
Linux 是可以使用 64 位处理器的跨平台操作系统之一,现在 64 位的系统在服务器和桌面端都已经非常常见了。很多开发人员现在都面临着需要将自己的应用程序从 32 位环境移植到 64 位环境中。随着 Intel Itanium 和其他 64 位处理器的引入,使软件针对 64 位环境做好准备变得日益重要了。
与 UNIX 和其他类 UNIX 操作系统一样,Linux 使用了 LP64 标准,其中指针和长整数都是 64 位的,而普通的整数则依然是 32 位的。尽管有些高级语言并不会受到这种类型大小不同的影响,但是另外一些语言(例如 C 语言)却的确会受到这种影响。
将应用程序从 32 位系统移植到 64 位系统上的工作可能会非常简单,也可能会非常困难,这取决于这些应用程序是如何编写和维护的。很多琐碎的问题都可能导致产生问题,即使在一个编写得非常好的高度可移植的应用程序中也是如此,因此本文将对这些问题进行归纳总结,并给出解决这些问题的一些方法建议。
64
位的优点
32 位平台有很多限制,这些限制正在阻碍大型应用程序(例如数据库)开发人员的工作进展,尤其对那些希望充分利用计算机硬件优点的开发人员来说更是如此。科学计算通常要依赖于浮点计算,而有些应用程序(例如金融计算)则需要一个比较狭窄的数字范围,但是却要求更高的精度,其精度高于浮点数所提供的精度。64 位数学运算提供了这种更高精度的定点数学计算,同时还提供了足够的数字范围。现在在计算机业界中有很多关于 32 位地址空间所表示的地址空间的讨论。32 位指针只能寻址 4GB 的虚拟地址空间。我们可以克服这种限制,但是应用程序开发就变得非常复杂了,其性能也会显著降低。
在语言实现方面,目前的 C 语言标准要求 “long long” 数据类型至少是 64 位的。然而,其实现可能会将其定义为更大。
另外一个需要改进的地方是日期。在 Linux 中,日期是使用 32 位整数来表示的,该值所表示的是从 1970 年 1 月 1 日至今所经过的秒数。这在 2038 年就会失效。但是在 64 位的系统中,日期是使用有符号的 64 位整数表示的,这可以极大地扩充其可用范围。
在语言实现方面,目前的 C 语言标准要求 “long long” 数据类型至少是 64 位的。然而,其实现可能会将其定义为更大。
另外一个需要改进的地方是日期。在 Linux 中,日期是使用 32 位整数来表示的,该值所表示的是从 1970 年 1 月 1 日至今所经过的秒数。这在 2038 年就会失效。但是在 64 位的系统中,日期是使用有符号的 64 位整数表示的,这可以极大地扩充其可用范围。
总之,64 位具有以下优点:
1. 64
位的应用程序可以直接访问 4EB 的虚拟内存,Intel Itanium 处理器提供了连续的线性地址空间。
2. 64
位的 Linux 允许文件大小最大达到 4 EB(2 的 63 次幂),其重要的优点之一就是可以处理对大型数据库的访问。
Linux 64 位体系结构
不幸的是,C 编程语言并没有提供一种机制来添加新的基本数据类型。因此,提供 64 位的寻址和整数运算能力必须要修改现有数据类型的绑定或映射,或者向 C 语言中添加新的数据类型。
表 1. 32 位和 64 位数据模型
|
|
ILP32
|
LP64
|
LLP64
|
ILP64
|
|
char
|
8
|
8
|
8
|
8
|
|
short
|
16
|
16
|
16
|
16
|
|
int
|
32
|
32
|
32
|
64
|
|
long
|
32
|
64
|
32
|
64
|
|
long long
|
64
|
64
|
64
|
64
|
|
指针
|
32
|
64
|
64
|
64
|
这 3 个 64 位模型(LP64、LLP64 和 ILP64)之间的区别在于非浮点数据类型。当一个或多个 C 数据类型的宽度从一种模型变换成另外一种模型时,应用程序可能会受到很多方面的影响。这些影响主要可以分为两类:
数 据对象的大小。编译器按照自然边界对数据类型进行对齐;换而言之,32 位的数据类型在 64 位系统上要按照 32 位边界进行对齐,而 64 位的数据类型在 64 位系统上则要按照 64 位边界进行对齐。这意味着诸如结构或联合之类的数据对象的大小在 32 位和 64 位系统上是不同的。
基本数据类型的大小。通常关于基本数据类型之间关系的假设在 64 位数据模型上都已经无效了。依赖于这些关系的应用程序在 64 位平台上编译也会失败。例如,sizeof (int) = sizeof (long) = sizeof (pointer) 的假设对于 ILP32 数据模型有效,但是对于其他数据模型就无效了。
总之,编译器要按照自然边界对数据类型进行对齐,这意味着编译器会进行 “填充”,从而强制进行这种方式的对齐,就像是在 C 结构和联合中所做的一样。结构或联合的成员是根据最宽的成员进行对齐的。清单 1 对这个结构进行了解释。
总之,编译器要按照自然边界对数据类型进行对齐,这意味着编译器会进行 “填充”,从而强制进行这种方式的对齐,就像是在 C 结构和联合中所做的一样。结构或联合的成员是根据最宽的成员进行对齐的。清单 1 对这个结构进行了解释。
清单 1. C 结构
|
struct test { int i1; double d; int i2; long l; }
|
表 2 给出了这个结构中每个成员的大小,以及这个结构在 32 位系统和 64 位系统上的大小。
表 2. 结构和结构成员的大小
|
结构成员
|
在 32 位系统上的大小
|
在 64 位系统上的大小
|
|
struct test {
|
|
|
|
int i1;
|
32
位
|
32
位
|
|
|
|
32
位填充
|
|
double d;
|
64
位
|
64
位
|
|
int i2;
|
32
位
|
32
位
|
|
|
|
32
位填充
|
|
long l;
|
32
位
|
64
位
|
|
};
|
结构大小为 20 字节
|
结构大小为 32 字节
|
注意,在一个 32 位的系统上,编译器可能并没有对变量 d 进行对齐,尽管它是一个 64 位的对象,这是因为硬件会将其当作两个 32 位的对象进行处理。然而,64 位的系统会对 d 和 l 都进行对齐,这样会添加两个 4 字节的填充。
从 32 位系统移植到 64 位系统
本节介绍如何解决一些常见的问题:
声明表达式赋值数字常数Endianism类型定义位移字符串格式化函数参数
声明:要想让您的代码在 32 位和 64 位系统上都可以工作,请注意以下有关声明的用法:
根据需要适当地使用 “L” 或 “U” 来声明整型常量。
确保使用无符号整数来防止符号扩展的问题。
如果有些变量在这两个平台上都需要是 32 位的,请将其类型定义为 int.如果有些变量在 32 位系统上是 32 位的,在 64 位系统上是 64 位的,请将其类型定义为 long.为了对齐和性能的需要,请将数字变量声明为 int 或 long 类型。不要试图使用 char 或 short 类型来保存字节。
将字符指针和字符字节声明为无符号类型的,这样可以防止 8 位字符的符号扩展问题。
表达式
在 C/C++ 中,表达式是基于结合律、操作符的优先级和一组数学计算规则的。要想让表达式在 32 位和 64 位系统上都可以正确工作,请注意以下规则:
将字符指针和字符字节声明为无符号类型的,这样可以防止 8 位字符的符号扩展问题。
表达式
在 C/C++ 中,表达式是基于结合律、操作符的优先级和一组数学计算规则的。要想让表达式在 32 位和 64 位系统上都可以正确工作,请注意以下规则:
两个有符号整数相加的结果是一个有符号整数。
int 和 long 类型的两个数相加,结果是一个 long 类型的数。
如果一个操作数是无符号整数,另外一个操作数是有符号整数,那么表达式的结果就是无符号整数。
int 和 doubule 类型的两个数相加,结果是一个 double 类型的数。此处 int 类型的数在执行加法运算之前转换成 double 类型。
赋值
由于指针、int 和 long 在 64 位系统上大小不再相同了,因此根据这些变量是如何赋值和在应用程序中使用的,可能会出现问题。下面是有关赋值的一些技巧:
赋值
由于指针、int 和 long 在 64 位系统上大小不再相同了,因此根据这些变量是如何赋值和在应用程序中使用的,可能会出现问题。下面是有关赋值的一些技巧:
不要交换使用 int 和 long 类型,因为这可能会导致高位数字被截断。例如,不要做下面的事情:
|
int i; long l; i = l;
|
不要使用 int 类型来存储指针。下面这个例子在 32 位系统上可以很好地工作,但是在 64 位系统上会失败,这是因为 32 位整数无法存放 64 位的指针。例如,不要做下面的事情:
|
unsigned int i, *ptr;
i = (unsigned) ptr;
|
不要使用指针来存放 int 类型的值。例如,不要做下面的事情;
|
int *ptr;
int i;
ptr = (int *) i;
|
如果在表达式中混合使用无符号和有符号的 32 位整数,并将其赋值给一个有符号的 long 类型,那么将其中一个操作数转换成 64 位的类型。这会导致其他操作数也被转换成 64 位的类型,这样在对表达式进行赋值时就不需要再进行转换了。另外一种解决方案是对整个表达式进行转换,这样就可以在赋值时进行符号扩展。例如,考虑下面这种用法可能会出现的问题:
|
long n;
int i = -2;
unsigned k = 1;
n = i + k;
|
从数学计算上来说,上面这个黑体显示的表达式的结果应该是 -1 。但是由于表达式是无符号的,因此不会进行符号扩展。解决方案是将一个操作数转换成 64 位类型(下面的第一行就是这样),或者对整个表达式进行转换(下面第二行):
|
n = (long) i + k;
n = (int) (i + k);
|
数字常量
16
进制的常量通常都用作掩码或特殊位的值。如果一个没有后缀的 16 进制的常量是 32 位的,并且其高位被置位了,那么它就可以作为无符号整型进行定义。
例如,常数 OxFFFFFFFFL 是一个有符号的 long 类型。在 32 位系统上,这会将所有位都置位(每位全为 1),但是在 64 位系统上,只有低 32 位被置位了,结果是这个值是 0x00000000FFFFFFFF.
如果我们希望所有位全部置位,那么一种可移植的方法是定义一个有符号的常数,其值为 -1.这会将所有位全部置位,因为它采用了二进制补码算法。
|
long x = -1L;
|
可能产生的另外一个问题是最高位的设置。在 32 位系统上,我们使用的是常量 0x80000000。但是可移植性更好的方法是使用一个位移表达式:
|
1L
<< ((sizeof(long) * 8) - 1);
|
Endianism
Endianism
是指用来存储数据的方法,它定义了整数和浮点数据类型中是如何对字节进行寻址的。
Little-endian
是将低位字节存储在内存的低地址中,将高位字节存储在内存的高地址中。
Big-endian
是将高位字节存储在内存的低地址中,将低位字节存储在内存的高地址中。
表 3 给出了一个 64 位长整数的布局示例。
表 3. 64 位 long int 类型的布局
|
|
低地址
|
|
|
|
|
|
|
高地址
|
|
Little endian
|
Byte 0
|
Byte 1
|
Byte 2
|
Byte 3
|
Byte 4
|
Byte 5
|
Byte 6
|
Byte 7
|
|
Big endian
|
Byte 7
|
Byte 6
|
Byte 5
|
Byte 4
|
Byte 3
|
Byte 2
|
Byte 1
|
Byte 0
|
例如,32 位的字 0x12345678 在 big endian 机器上的布局如下:
表 4. 0x12345678 在 big-endian 系统上的布局
|
内存偏移量
|
0
|
1
|
2
|
3
|
|
内存内容
|
0x12
|
0x34
|
0x56
|
0x78
|
如果将 0x12345678 当作两个半字来看待,分别是 0x1234 和 0x5678,那么就会看到在 big endian 机器上是下面的情况:
表 5. 0x12345678 在 big-endian 系统上当作两个半字来看待的情况
|
内存偏移量
|
0
|
2
|
|
内存内容
|
0x1234
|
0x5678
|
然而,在 little endian 机器上,字 0x12345678 的布局如下所示:
表 6. 0x12345678 在 little-endian 系统上的布局
|
内存偏移量
|
0
|
1
|
2
|
3
|
|
内存内容
|
0x78
|
0x56
|
0x34
|
0x12
|
类似地,两个半字 0x1234 和 0x5678 如下所示:
表 7. 0x12345678 在 little-endian 系统上作为两个半字看到的情况
|
内存偏移量
|
0
|
2
|
|
内存内容
|
0x3412
|
0x7856
|
下面这个例子解释了 big endian 和 little endian 机器上字节顺序之间的区别。
下面的 C 程序在一台 big endian 机器上进行编译和运行时会打印 “Big endian”,在一台 little endian 机器上进行编译和运行时会打印 “Little endian”。
清单 2. big endian 与 little endian
|
#include
main () {
int i = 0x12345678;
if (*(char *)&i == 0x12)
printf ("Big endian/n");
else if (*(char *)&i == 0x78)
printf ("Little endian/n");
}
|
Endianism 在以下情况中非常重要:
使用位掩码时
对象的间接指针地址部分
对象的间接指针地址部分
在 C 和 C++ 中有位域来帮助处理 endian 的问题。我建议使用位域,而不要使用掩码域或 16 进制的常量。有几个函数可以用来将 16 位和 32 位数据从 “主机字节顺序” 转换成 “网络字节顺序”。例如,htonl (3)、ntohl (3) 用来转换 32 位整数。类似地,htons (3)、ntohs (3) 用来转换 16 位整数。然而,对于 64 位整数来说,并没有标准的函数集。但是在 big endian 和 little endian 系统上,Linux 都提供了下面的几个宏:
bswap_16
bswap_32
bswap_64
bswap_32
bswap_64
类型定义
建议您不要使用 C/C++ 中那些在 64 位系统上会改变大小的数据类型来编写应用程序,而是使用一些类型定义或宏来显式地说明变量中所包含的数据的大小和类型。有些定义可以使代码的可移植性更好。
ptrdiff_t
:
这是一个有符号整型,是两个指针相减后的结果。
这是一个有符号整型,是两个指针相减后的结果。
size_t
:
这是一个无符号整型,是执行 sizeof 操作的结果。这在向一些函数(例如 malloc (3))传递参数时使用,也可以从一些函数(比如 fred (2))中返回。
这是一个无符号整型,是执行 sizeof 操作的结果。这在向一些函数(例如 malloc (3))传递参数时使用,也可以从一些函数(比如 fred (2))中返回。
int32_t
、uint32_t 等:
定义具有预定义宽度的整型。
定义具有预定义宽度的整型。
intptr_t
和 uintptr_t:
定义整型类型,任何有效指针都可以转换成这个类型。
定义整型类型,任何有效指针都可以转换成这个类型。
例 1:
在下面这条语句中,在对 bufferSize 进行赋值时,从 sizeof 返回的 64 位值被截断成了 32 位。
int bufferSize =
(int) sizeof (something);
解决方案是使用 size_t 对返回值进行类型转换,并将其赋给声明为 size_t 类型的 bufferSize,如下所示:
size_t bufferSize =
(size_t) sizeof (something);
例 2:
在 32 位系统上,int 和 long 大小相同。由于这一点,有些开发人员会交换使用这两种类型。这可能会导致指针被赋值给 int 类型,或者反之。但是在 64 位的系统上,将指针赋值给 int 类型会导致截断高 32 位的值。
解决方案是将指针作为指针类型或为此而定义的特殊类型进行存储,例如 intptr_t 和 uintptr_t.
位移
无类型的整数常量就是 (unsigned) int 类型的。这可能会导致在位移时出现被截断的问题。
例如,在下面的代码中,a 的最大值可以是 31.这是因为 1 << a 是 int 类型的。
long t = 1 << a
;
要在 64 位系统上进行位移,应该使用 1L,如下所示:
long t = 1L << a
;
字符串格式化
函数 printf (3) 及其相关函数都可能成为问题的根源。例如,在 32 位系统上,使用 %d 来打印 int 或 long 类型的值都可以,但是在 64 位平台上,这会导致将 long 类型的值截断成低 32 位的值。对于 long 类型的变量来说,正确的用法是 %ld.
类似地,当一个小整数(char、short、int)被传递给 printf (3) 时,它会扩展成 64 位的,符号会适当地进行扩展。在下面的例子中,printf (3) 假设指针是 32 位的。
char *ptr = &something
;printf (%x/n", ptr);
上面的代码在 64 位系统上会失败,它只会显示低 4 字节的内容。
这个问题的解决方案是使用 %p,如下所示;这在 32 位和 64 位系统上都可以很好地工作:
char *ptr = &something
;printf (%p/n", ptr);
函数参数
在向函数传递参数时需要记住几件事情:
在参数的数据类型是由函数原型定义的情况中,参数应该根据标准规则转换成这种类型。
在参数类型没有指定的情况中,参数会被转换成更大的类型。
在 64 位系统上,整型被转换成 64 位的整型值,单精度的浮点类型被转换成双精度的浮点类型。
如果返回值没有指定,那么函数的缺省返回值是 int 类型的。
在将有符号整型和无符号整型的和作为 long 类型传递时就会出现问题。考虑下面的情况:
清单 3. 将有符号整型和无符号整型的和作为 long 类型传递
|
long function (long l);
int main () {
int i = -2;
unsigned k = 1U;
long n = function (i + k);
}
|
上面这段代码在 64 位系统上会失败,因为表达式 (i + k) 是一个无符号的 32 位表达式,在将其转换成 long 类型时,符号并没有得到扩展。解决方案是将一个操作数强制转换成 64 位的类型。
在基于寄存器的系统上还有一个问题:系统采用寄存器而不是堆栈来向函数传递参数。考虑下面的例子:
float f = 1.25;
printf ("The hex value of %f is %x", f, f);
printf ("The hex value of %f is %x", f, f);
在基于堆栈的系统中,这会打印对应的 16 进制值。但是在基于寄存器的系统中,这个 16 进制的值会从一个整数寄存器中读取,而不是从浮点寄存器中读取。
解决方案是将浮点变量的地址强制转换成一个指向整型类型的指针,如下所示:
printf ("The hex value of %f is %x", f, *(int *)&f);
结束语
主流的硬件供应商最近都在扩充自己的 64 位产品,这是因为 64 位平台可以提供更好的性能、价值和可伸缩性。32 位系统的限制,特别是 4GB 的虚拟内存上限,已经极大地刺激很多公司开始考虑迁移到 64 位平台上。了解如何将应用程序移植到 64 位体系结构上可以帮助我们编写可移植性更好且效率更高的代码。
关于作者
Harsha Adiga
就职于印度的 IBM Software Group,他参与了很多 Linux 和开放源码社区、工作组的工作。
2万+

被折叠的 条评论
为什么被折叠?



