Numerical Analysis Questions & Answers  
Question by Student 201700278
Dear Professor, from what I search from Internet, for the signed int, there are two different interpretations: Ones' complement interpretation and Two's complement interpretation.
For eight-bit ones' complement, $10000000 = −127$ while for eight-bit two's complement, $10000000 = −128$. May I know which of these is the correct one? I am still quite confusing about the smallest number for signed int. Thank you.
09.16.17
On gcc on x86 Linux, the smallest number for a signed 1-byte integer will be -128 and the largest number will be +127 (this can be easily tested). As I mentioned in class, this is due to the negative zero being used for an extra negative number. I couldn't find a reference thus on whether this is a strict standard for C. I also couldn't find a reference on how gcc does the arithmetic, hence why I mentioned in class there are several ways the compiler could achieve this. As you mention, one way this could be done is with “two's complement” [l]. Then the bits will look this way:
     
Bits in Base 2Base 10Two's Complement
0111 1111127127
0111 1110126126
0000 001022 
0000 000111 
0000 000000 
1111 1111255−1 
1111 1110254−2 
1000 0010130−126 
1000 0001129−127 
1000 0000128−128
I'm not sure if gcc on x86 linux uses this approach. Maybe. You can add it to your notes for completeness. But anyway, this is beyond the scope of this course. Numerical analysis is not concerned with how compilers do bitwise arithmetic — this belongs to a computer engineering course. Rather, what we care about here is the smallest and largest possible numbers that can be obtained with a certain number of bits. Thus, all you need to remember for this course is the minimum and maximum possible numbers that a signed and unsigned integer will give you. Although your question goes outside the scope of the course, I'll give you 2 points bonus boost nonetheless as it helps clarify what I mentioned last class.
Question by Student 201427106
professor, I have a question for denormal number of float and double. When we calculated float's denormal number with given condition e=00000000, I wonder why you used p=-126 instead of p=-127 in the class. And also when we calculated double's demoral number with given condition e=00000000000, you used p=-1023 instead of p=-1024. Is there any special reason for using p=-1023?
09.17.17
This was explained in class.
Question by Student 201427122
Professor, I don't understand what your writing. When I find minimunm and maximum P,We use 'P=e-127' in float format. But next step at IEEE double precision format, You wrote 'P=127-e' for float. I dont know which one correct answer for float. Please answer to me. Thank you for reading.
This should be obvious. The exponent is $p=e-b$ where $b$ is a constant that is used to split the possible exponents equally between positive and negative values (the exponent $p$ is either negative or positive while the biased exponent $e$ is all positive).
Question by Student 201627148
Professor,in denormal when e=0000 0000, 1.f change to 0.f. $$\#=(-1)^s*2^p*0.f$$ Here,if p=e-127, e=-127 not e=-126? Because e=0.But you wrote e=-126.
But $e=0$ indicates a special case, not an exponent. Thus, the minimum exponent $p_{\rm min}$ is not calculated using $e=0$ but using $e=1$.
Question by Student 201427143
Professor, Sorry but I'm wondering about the assignment 5.
The range upper limit is $10^{32}$, and the range lower limit is $3*10^{-65}$.
So, I thought 2 equations.

1) $(-1)^s*2^{P_{max}}*1.f_{max}=10^{32}$

2) $(-1)^s*2^{P_{min}}*1.f_{min}=3*10^{-65}$

Then, I got 2 size of bits 8 from eq1, and 9 from eq2.
And I thought the minimum number of bits is 9.
As at 5-(b) I put the 9 in equation $(-1)^s*2^{P_{min}}*0.f_{min}=3*10^{-65}$, I got the negative number of bits on significant.
What is the problem with that?
09.18.17
I'm not sure why you're getting negative bits.. You have more or less the right logic thus (although not perfectly). Keep working more on it.
Question by Student 201529190
Dear Professor, in IEEE single precision format p=e-127. I know 127 come from (2^8-2)/2 (just like in the middle). In Assignment#1 Question#5, we are asked to minimize the number of the bits. When I caculate p=e-something, it is necessery for "something" need to be the number that in the middle? Just like 127 in single precision format? Or I can define this "something" whatever I want? (If it can change, maybe I can find a smaller number of the bits.) And I wonder if we need to consider the subnormal numbers in this question. In this question, if the "something" have to in the middle and we need not consider subnormal numbers. Then we just need to find a range (a,b), which: a<3*10^-65,10^32<b. Then the number of bits needed to store the exponent is 9. And maybe the number of bits needed to store the significand is zero. Zero is weird. In question it does not mention machine precision. Thus I think it can be zero. If I make mistake or miss something please forgive me. Please do not limit my message permissions.(I saw someone was limited in last year message.) Thank you very much.
09.19.17
I will answer your question if the numbers are properly typeset. Answer it again below and use to write the math and numbers properly.
Question by Student 201529190
Dear Professor, in IEEE single precision format p=e-127. I know 127 come from $(2^8-2)/2$ (just like in the middle). In Assignment#1 Question#5, we are asked to minimize the number of the bits. When I caculate p=e-something, it is necessery for "something" need to be the number that in the middle? Just like 127 in single precision format? Or I can define this "something" whatever I want? (If it can change, maybe I can find a smaller number of the bits.) And I wonder if we need to consider the subnormal numbers in this question. In this question, if the "something" have to in the middle and we need not consider subnormal numbers. Then we just need to find a range (a,b), which: $a<3*10^{-65},10^{32}<b$. Then the number of bits needed to store the exponent is 9. And maybe the number of bits needed to store the significand is zero. Zero is weird. In question it does not mention machine precision. Thus I think it can be zero. If I make mistake or miss something please forgive me. Please do not limit my message permissions.(I saw someone was limited in last year message.) Thank you very much.
The number has to be constructed with the same rules as the float and double types shown in class. So yes, the exponents must be split equally between positive and negative values, and yes, the exponent must include the usual exceptions. Thus, denormal numbers will of course be present.
Question by Student 201427116
Professor, I have a question about Machine epsilon. We have used machine epsilon because of some possible errors on C language. In our class, for example, we calculated like this. $$g*g = (g\pm g*\epsilon_{mach})(g\pm g*\epsilon_{mach}) $$ But in our textbook('Introduction to Numerical Methods', Jeffrey R.Chasnov) the author says like this: "A rough estimate would be $5(1+\epsilon_{mach})=5+5\epsilon_{mach}$, but this is not exact. The exact answer can be found by writing $$ 5 = 2^2(1+\frac{1}{4})$$ so that the next largest number is $$2^2(1+\frac{1}{4}+2^{-23}) = 5 + 2^{-21} = 5+4\epsilon_{mach}."$$ I can't understand why this differece is made between 5 and $2^2(1+\frac{1}{4}).$ And also I'm wondering how can I express certain numbers to get rid of more errors as the author have done in our textbook.
09.23.17
Yes, the relative error can be less than $\epsilon_{\rm mach}$. In class, $\epsilon_{\rm mach}$ was defined as the maximum relative error on a non-denormal float. For most non-denormal floats, the error will of course be less. 1 point bonus boost.
09.24.17
Question by Student 201542124
Professor, I have a question about the assignment #2 question 1(a). At first, I started the bisection method about f=sin(x) with the interval [3,4]. And then, I tried it with [3.1, 3.2] again. As a result, the number of interactions in each case was different. Therefore, I think there is a possibility that we have different answers according to the intervals. Are there rules to decide the intervals?
10.05.17
Yes, of course, the number of iterations will depend on the initial interval chosen. This is why the initial interval is specified in the question. There are no rules that can be used to guess a proper initial interval that would apply in the general case because each function has a different behaviour. But if you do know the function, then it may be possible to specify an initial interval that is quite close to the root through a good guess for the root. 1 point bonus boost.
Question by Student 201427122
Professor, When I solved Question #1 (a), I appointed $$X_{min}=π/2$$ $$X_{max}=3π/2$$ Then I used the bisection method. Consequently The answer is $X_{13}$=3.1412... So $X_{13}$ is 3.141 for least 4 significant digits. Continuously, $$X_{14}=3.14140,X_{15}=3.14149, X_{16}=3.1415$$ Then I think $X_{16}$=3.142 for least 4 significant digits is answer. Because π is 3.142 for least 4 significant digits. I don't understand Why Your answer is 13. Did I equate wrong equation in this Question?
10.06.17
Here's a hint. Simply obtaining the 4th digit is not enough. You need to make sure the 4th significant digit is actually valid.. The best way to solve this problem is to specify what is the maximum allowable value of the relative error $\epsilon$ needed here and to determine analytically how many iterations is needed to reach this level. In the exam, I may ask you a similar question but instead of 4 significant digits, I may ask you for 100 significant digits. Solving the problem by hand one iteration at a time will take too long. Thus, you should aim to find here an analytical expression giving you number of iterations for a certain $\epsilon$ sought. 1 point bonus.
Question by Student 201627131
Professor, I have a question about radian in C code. If I use radian in C code, Can compiler read radian? And, PI is infinite decimals, so how can I express PI? I think if I declare PI = 3.14, result will include big error.
Just define $\pi$ in the preamble of your code with
#define pi 3.141592920
In C, the functions sin, cos, etc must always be used with radians.
Question by Student 201627148
professor, in Q#4(b), when I was solving that, I found something strange. I guess root is π because $x_1,x_2,x_3$ are convergent to π. so I made this equation ε=x-π, because of ε=x-r.then I got $ ε_1,ε_2,ε_3$ and p. But p is different every trying. I think x is infinite number, so it has error. Is answer p perfectly same 1.618?
10.07.17
Yes $p$ varies as the iteration count varies. You should explain why this is. There is one reason for the discrepancy in the first few iterations and another reason when approaching machine accuracy. Explain both.
Question by Student 201427122
Professor, In Question #5 1, $$ |\epsilon_{n}|_{Newton}/|\epsilon_{n}|_{secant}=(|1/2*f^{\prime\prime}(r)/f^{\prime}(r)| |\epsilon_{0}|)^{(2^{n}-1.618^{n})} $$ When I have written down my note, This equation was proportional to each other. which one do i choose between two equation?
Proportional to does not exclude equal to..
Previous   1 ,  2 ,  3 ,  4  ...  9    Next  •  PDF 1✕1 2✕1 2✕2  •  New Question
$\pi$