Questions tagged [extended-precision]
45 questions
9
votes
1 answer
Convert Extended (80-bit) to string
How can i convert an Extended precision floating point value to a string?
Background
The Intel CPU supports three floating point formats:
32-bit Single precision
64-bit Double precision
80-bit Extended precision
Delphi has native support for the…

Ian Boyd
- 246,734
- 253
- 869
- 1,219
8
votes
4 answers
Convert extended precision float (80-bit) to double (64-bit) in MSVC
What is the most portable and "right" way to do conversion from extended precision float (80-bit value, also known as long double in some compilers) to double (64-bit) in MSVC win32/win64?
MSVC currently (as of 2010) assumes that long double is…
user313885
6
votes
1 answer
manipulating 32 bit numbers with 16 bit registers in 8086
Im trying to write a program which get two 6-digit decimal numbers and show the addition of them, but in 16 bit 8086
i defined numbers as double word and put LO in WORD 1 and HO in word 2. similar to below code
but i dont have any idea to do…

Amir Reza Asadi
- 97
- 1
- 7
5
votes
1 answer
Convert 80-bit extended precision in java
I'm working on a programm which converts really old Open Access 4 .df Files into
other formats and also create Database scripts - I'm already able to convert every
possible type except the decimal type. I found out that the byte order has to be a…

daTake
- 87
- 7
4
votes
2 answers
Can XMM registers be used to do any 128 bit integer math?
My impression is definitely not but perhaps there is a clever trick?
Thanks.

Upper
- 51
- 3
3
votes
0 answers
Disassembling simple C function. (128-bit multiplication in 64-bit machine)
I'm solving problem in a book called 'Computer Systems'. Here is the problem I'm struggling with.
Question:
The following code computes the 128-bit product of two 64-bit signed values x and y and stores the result in memory:
1 typedef __int128…

SteveLim
- 31
- 2
3
votes
1 answer
Assembly: Increment by 2 (or larger number) without destroying CF in an ADC loop?
I am trying to test an addition function in TDM-GCC 64 bit assembly in Windows. I searched for resources on this a while back and I came across a code similar to this(I made some changes to compile this in TDM-GCC).
typedef struct
{
int size;
…

Dominic Jung
- 33
- 4
3
votes
1 answer
How to format MinExtended80Denormal?
The unit System.Math defines the constant
MinExtended80Denormal
Can I convert this number to string with the given rtl functions?
I tried
FormatFloat('#.##############E+####', System.math.MinExtended80Denormal)
which results in…

ventiseis
- 3,029
- 11
- 32
- 49
3
votes
2 answers
In J, how can I find the extended precision integer floor of a square root
I understand that when I take the square root (%:) of a number that does not result in an integer, my answer is a float. I'm looking to find the floor (<.) of the square root in order to get an integer result. Does J have a built-in way to achieve…

Dane
- 1,201
- 8
- 17
3
votes
2 answers
Function that returns whether the floating-point type is fully compliant to IEEE-754?
I would like to write a function that checks that float, double or long double are fully compliant to the IEEE-754 format. I mean:
float = IEEE-754 binary32
double = IEEE-754 binary64
long double = IEEE-754 binary128
I thought that…

Vincent
- 57,703
- 61
- 205
- 388
2
votes
2 answers
Is it possible to perform 128-bit / 64-bit division without branching, in terms of 64-bit division?
I'm working with the Algorand contract code which has a very limited scope of possible operations in their assembly code - e.g., it is not possible to control flow of the code. Basic 64 bit arithmetic operations are available.
What I need to do is…

Sebastian
- 33
- 3
2
votes
2 answers
How can i add two numbers with 12 bytes each-one?
I want to add two numbers that have 12 bytes and to store the result in a 16 bytes var. How can i do this?
section .data
big_num1 dd 0x11111111, 0x22222222, 0x33333333
big_num2 dd 0xffffffff, 0x22222222, 0x33333333
section .bss
…

ssrvz
- 33
- 1
- 6
2
votes
1 answer
Convert extended value to Time
I want to convert a extended value to time.
before the DecimalSeparator is hours
after the decimalseparator are minutes digital
8,62944444444444 --> 8:37
1,41666666666667 --> 1:25
I've made this funtion but I get for 1,41666666666667 --> 1:24…

Ravaut123
- 2,764
- 31
- 46
1
vote
0 answers
Handling very (while not arbitrary) small or big floating point numbers in C++
Our program manipulates extensively real numbers that happen to be very small or big.
while we don't need a very high precision. We are strongly concerned about performance (CPU usage).
Such numbers could be 2.5687e-45785 , for instance.
Remark : as…

Arnaud Mégret
- 189
- 8
1
vote
1 answer
AArch64: compare 256-bit unsigned integers
While learning Arm NEON instruction set, I tried to implement 256-bit numbers comparison (A <= B). Below is the implementation I ended up with, but I doubt my approach is good. Maybe there's some wiser and more optimized way to compare large…

Alexander Zhak
- 9,140
- 4
- 46
- 72