Numerical Analysis Questions & Answers  
Question by Student 201627148
Professer, in roundoff error, You wrote $$x={-2g+\sqrt{4g*g+4}}/2$$ $$sqrt(4*g*g+4)=2000.001+\sqrt{8000\epsilon_{mach}}$$ But i think you forgot to put $$\sqrt{8000\epsilon_{mach}}$$ to get x. Is it possible to erase because it is too small?
09.16.17
Let's do it again step by step. $$ {\rm sqrt}(4.0*g*g+4.0)=\sqrt{4(g\pm\epsilon_{\rm mach}g)^2+4\pm 4\epsilon_{\rm mach}} $$ $$ {\rm sqrt}(4.0*g*g+4.0)=\sqrt{4g^2(1\pm\epsilon_{\rm mach})^2+4 \pm 4\epsilon_{\rm mach}} $$ For $\epsilon_{\rm mach}\ll 1$: $$ {\rm sqrt}(4.0*g*g+4.0)\approx\sqrt{4g^2(1 \pm 2\epsilon_{\rm mach})+4\pm 4\epsilon_{\rm mach}} $$ Or $$ {\rm sqrt}(4.0*g*g+4.0)\approx\sqrt{4g^2+4 \pm 8g^2 \epsilon_{\rm mach}\pm 4\epsilon_{\rm mach}} $$ But for $8g^2 \gg 4$: $$ {\rm sqrt}(4.0*g*g+4.0)\approx\sqrt{4g^2+4 \pm 8g^2 \epsilon_{\rm mach}} $$ But for $8g^2\epsilon_{\rm mach}\ll 4g^2+4$ can show that $$ {\rm sqrt}(4.0*g*g+4.0)\approx\sqrt{4g^2+4} \pm \frac{1}{2} \frac{8g^2 \epsilon_{\rm mach}}{4g^2+4}\sqrt{4g^2+4} $$ If $4g^2\gg 4$: $$ {\rm sqrt}(4.0*g*g+4.0)\approx\sqrt{4g^2+4} \pm \epsilon_{\rm mach}\sqrt{4g^2+4} $$ Thus, for $\epsilon_{\rm mach}$ much smaller than 1 and $g$ much greater than 1, a very good approximation to the error associated with the sqrt of $4g^2+4$ is $\epsilon_{\rm mach}$ times the sqrt of $4g^2+4$ (I think that was what I wrote last class). If not, make a correction. Good question. I'll give you 2 points bonus boost.
Question by Student 201427564
Professor, when you calculate the smallest possible positive denormal number, why did you put $0.f_{min}$ instead of $1.f_{min}$ ?
Well because for the same exponent (the smallest possible one), 0.f will be just below 1.f. 1 point bonus boost.
Question by Student 201700278
Dear Professor, from what I search from Internet, for the signed int, there are two different interpretations: Ones' complement interpretation and Two's complement interpretation.
For eight-bit ones' complement, $10000000 = −127$ while for eight-bit two's complement, $10000000 = −128$. May I know which of these is the correct one? I am still quite confusing about the smallest number for signed int. Thank you.
On gcc on x86 Linux, the smallest number for a signed 1-byte integer will be -128 and the largest number will be +127 (this can be easily tested). As I mentioned in class, this is due to the negative zero being used for an extra negative number. I couldn't find a reference thus on whether this is a strict standard for C. I also couldn't find a reference on how gcc does the arithmetic, hence why I mentioned in class there are several ways the compiler could achieve this. As you mention, one way this could be done is with “two's complement” [l]. Then the bits will look this way:
     
Bits in Base 2Base 10Two's Complement
0111 1111127127
0111 1110126126
0000 001022 
0000 000111 
0000 000000 
1111 1111255−1 
1111 1110254−2 
1000 0010130−126 
1000 0001129−127 
1000 0000128−128
I'm not sure if gcc on x86 linux uses this approach. Maybe. You can add it to your notes for completeness. But anyway, this is beyond the scope of this course. Numerical analysis is not concerned with how compilers do bitwise arithmetic — this belongs to a computer engineering course. Rather, what we care about here is the smallest and largest possible numbers that can be obtained with a certain number of bits. Thus, all you need to remember for this course is the minimum and maximum possible numbers that a signed and unsigned integer will give you. Although your question goes outside the scope of the course, I'll give you 2 points bonus boost nonetheless as it helps clarify what I mentioned last class.
Question by Student 201427106
professor, I have a question for denormal number of float and double. When we calculated float's denormal number with given condition e=00000000, I wonder why you used p=-126 instead of p=-127 in the class. And also when we calculated double's demoral number with given condition e=00000000000, you used p=-1023 instead of p=-1024. Is there any special reason for using p=-1023?
09.17.17
This was explained in class.
Question by Student 201427122
Professor, I don't understand what your writing. When I find minimunm and maximum P,We use 'P=e-127' in float format. But next step at IEEE double precision format, You wrote 'P=127-e' for float. I dont know which one correct answer for float. Please answer to me. Thank you for reading.
This should be obvious. The exponent is $p=e-b$ where $b$ is a constant that is used to split the possible exponents equally between positive and negative values (the exponent $p$ is either negative or positive while the biased exponent $e$ is all positive).
Question by Student 201627148
Professor,in denormal when e=0000 0000, 1.f change to 0.f. $$\#=(-1)^s*2^p*0.f$$ Here,if p=e-127, e=-127 not e=-126? Because e=0.But you wrote e=-126.
But $e=0$ indicates a special case, not an exponent. Thus, the minimum exponent $p_{\rm min}$ is not calculated using $e=0$ but using $e=1$.
Previous   1  ...  4 ,  5 ,  6  ...  19    Next  •  PDF 1✕1 2✕1 2✕2  •  New Question
$\pi$