Questions tagged [extended-precision]

45 questions
9
votes
1 answer

Convert Extended (80-bit) to string

How can i convert an Extended precision floating point value to a string? Background The Intel CPU supports three floating point formats: 32-bit Single precision 64-bit Double precision 80-bit Extended precision Delphi has native support for the…
Ian Boyd
  • 246,734
  • 253
  • 869
  • 1,219
8
votes
4 answers

Convert extended precision float (80-bit) to double (64-bit) in MSVC

What is the most portable and "right" way to do conversion from extended precision float (80-bit value, also known as long double in some compilers) to double (64-bit) in MSVC win32/win64? MSVC currently (as of 2010) assumes that long double is…
user313885
6
votes
1 answer

manipulating 32 bit numbers with 16 bit registers in 8086

Im trying to write a program which get two 6-digit decimal numbers and show the addition of them, but in 16 bit 8086 i defined numbers as double word and put LO in WORD 1 and HO in word 2. similar to below code but i dont have any idea to do…
5
votes
1 answer

Convert 80-bit extended precision in java

I'm working on a programm which converts really old Open Access 4 .df Files into other formats and also create Database scripts - I'm already able to convert every possible type except the decimal type. I found out that the byte order has to be a…
daTake
  • 87
  • 7
4
votes
2 answers

Can XMM registers be used to do any 128 bit integer math?

My impression is definitely not but perhaps there is a clever trick? Thanks.
Upper
  • 51
  • 3
3
votes
0 answers

Disassembling simple C function. (128-bit multiplication in 64-bit machine)

I'm solving problem in a book called 'Computer Systems'. Here is the problem I'm struggling with. Question: The following code computes the 128-bit product of two 64-bit signed values x and y and stores the result in memory: 1 typedef __int128…
SteveLim
  • 31
  • 2
3
votes
1 answer

Assembly: Increment by 2 (or larger number) without destroying CF in an ADC loop?

I am trying to test an addition function in TDM-GCC 64 bit assembly in Windows. I searched for resources on this a while back and I came across a code similar to this(I made some changes to compile this in TDM-GCC). typedef struct { int size; …
3
votes
1 answer

How to format MinExtended80Denormal?

The unit System.Math defines the constant MinExtended80Denormal Can I convert this number to string with the given rtl functions? I tried FormatFloat('#.##############E+####', System.math.MinExtended80Denormal) which results in…
ventiseis
  • 3,029
  • 11
  • 32
  • 49
3
votes
2 answers

In J, how can I find the extended precision integer floor of a square root

I understand that when I take the square root (%:) of a number that does not result in an integer, my answer is a float. I'm looking to find the floor (<.) of the square root in order to get an integer result. Does J have a built-in way to achieve…
Dane
  • 1,201
  • 8
  • 17
3
votes
2 answers

Function that returns whether the floating-point type is fully compliant to IEEE-754?

I would like to write a function that checks that float, double or long double are fully compliant to the IEEE-754 format. I mean: float = IEEE-754 binary32 double = IEEE-754 binary64 long double = IEEE-754 binary128 I thought that…
Vincent
  • 57,703
  • 61
  • 205
  • 388
2
votes
2 answers

Is it possible to perform 128-bit / 64-bit division without branching, in terms of 64-bit division?

I'm working with the Algorand contract code which has a very limited scope of possible operations in their assembly code - e.g., it is not possible to control flow of the code. Basic 64 bit arithmetic operations are available. What I need to do is…
2
votes
2 answers

How can i add two numbers with 12 bytes each-one?

I want to add two numbers that have 12 bytes and to store the result in a 16 bytes var. How can i do this? section .data big_num1 dd 0x11111111, 0x22222222, 0x33333333 big_num2 dd 0xffffffff, 0x22222222, 0x33333333 section .bss …
ssrvz
  • 33
  • 1
  • 6
2
votes
1 answer

Convert extended value to Time

I want to convert a extended value to time. before the DecimalSeparator is hours after the decimalseparator are minutes digital 8,62944444444444 --> 8:37 1,41666666666667 --> 1:25 I've made this funtion but I get for 1,41666666666667 --> 1:24…
Ravaut123
  • 2,764
  • 31
  • 46
1
vote
0 answers

Handling very (while not arbitrary) small or big floating point numbers in C++

Our program manipulates extensively real numbers that happen to be very small or big. while we don't need a very high precision. We are strongly concerned about performance (CPU usage). Such numbers could be 2.5687e-45785 , for instance. Remark : as…
1
vote
1 answer

AArch64: compare 256-bit unsigned integers

While learning Arm NEON instruction set, I tried to implement 256-bit numbers comparison (A <= B). Below is the implementation I ended up with, but I doubt my approach is good. Maybe there's some wiser and more optimized way to compare large…
Alexander Zhak
  • 9,140
  • 4
  • 46
  • 72