Chrome re-ordering object keys if numerics, is that normal/expected

I noticed that certain code that evaluates some shoe sizes for an e-commerce site and outputs them on screen is messing up the order in Chrome. JSON given can be: { "7": ["9149", "9139",...

Why is processing a sorted array faster than processing an unsorted array?

Here is a piece of C++ code that shows some very peculiar behavior. For some strange reason, sorting the data (before the timed region) miraculously makes the loop almost six times...

is mov rax,0x12345678; jmp rax still kills branch prediction?

I'm having trouble finding information specific to the two cases described above, And though of hearing your expert opinion. The first thing is: I know indirect jmps hurts branch prediction, and...

Return stack buffer?

As I understood, Return Stack Buffer only supports 4 to 16 entries (from wiki: http://en.wikipedia.org/wiki/Branch_predictor#Prediction_of_function_returns) and is not pair of key-value(based on...

What is the overhead of using Intel Last Branch Record?

Last Branch Record refers to a collection of register pairs (MSRs) that store the source and destination addresses related to recently executed branches....

What is the point of delay slots?

So from my understanding of delay slots, they occur when a branch instruction is called and the next instruction following the branch also gets loaded from memory. What is the point of this?...

How do SYSCALL/SYSRET instructions perform across x86 CPUs?

SYSCALL and SYSRET (and their 32-bit-only Intel counterparts SYSENTER and SYSEXIT) are usually described as a “generally faster” way to enter and exit supervisor mode in x86 processors than...

How does the branch predictor know if it is not correct?

This is the second time I'm asking this question; the first time someone did reply but I took too long to reply back to them and therefore didn't get the full understanding. What I'm trying to do...

Branch prediction: Does avoiding "else" branch for simple operations makes code faster (Java example)?

Options 1: boolean isFirst = true; for (CardType cardType : cardTypes) { if (!isFirst) { descriptionBuilder.append(" or "); } else { isFirst = false; } //other code...

Is there a compiler hint for GCC to force branch prediction to always go a certain way?

For the Intel architectures, is there a way to instruct the GCC compiler to generate code that always forces branch prediction a particular way in my code? Does the Intel hardware even support...

What branch misprediction does the Branch Target Buffer detect?

I am currently looking at the various parts of the CPU pipeline which can detect branch mispredictions. I have found these are: Branch Target Buffer (BPU CLEAR) Branch Address Calculator (BA...

MIPS - Branching convention with bne

In lecture, our professor said that there is a reason behind using bne in branching rather than using beq(and left us to figure it out), like the example shown below. if ( i == j ) i++ ; j--...

Are branch predictors results saved after process uses its timeslice

During discussion developer informed that likely/unlikely gcc optimization placing most common branch first in code have no effect and should be ignored on Intel processors. The stated reason is...

Slow jmp-instruction

As follow up to my question The advantages of using 32bit registers/instructions in x86-64, I started to measure the costs of instructions. I'm aware that this have been done multiple times (e.g....

Cost of a 64bits jump, always 10-22 cycles the first time?

In x86_64 there is no direct jump with a 64 bits address. Only a 32 bits one. With indirect jumps I understand the pipeline HAS TO BE RESOLVED ONCE before branch prediction comes into play. My...

An expensive jump with GCC 5.4.0

I had a function which looked like this (showing only the important part): double CompareShifted(const std::vector<uint16_t>& l, const std::vector<uint16_t> &curr, int shift, int shiftY) { ... ...

Is there a way to convert a conditional assignment to branch free code?

Is there a way to convert the following C code to something without any conditional statements? I have profiled some of my code and noticed that it is getting many branch misses on an if statement...

ARM prefetch workaround

I have a situation where some of the address space is sensitive in that you read it you crash as there is nobody there to respond to that address. pop {r3,pc} bx r0 0: e8bd8008 pop {r3, pc} ...

Does the Harvard architecture have the von Neumann bottleneck?

From the naming and this article I feel the answer is no, but I don't understand why. The bottleneck is how fast you can fetch data from memory. Whether you can fetch instruction at the same time...

LIME Image classification interpretation for multi-input DNN

I am fairly new to Deep Learning, but I managed to build a multi-branch Image Classification architecture yielding quite satisfactory results. Not so important: I am working on KKBox customer...

Conditional jump instructions in MSROM procedures?

This relates to this question Thinking about it though, on a modern intel CPU the SEC phase is implemented in microcode meaning there would be a check whereby a burned in key is used to verify the...

What is faster in C++: mod (%) or another counter?

At the risk of this being a duplicate, maybe I just can't find a similar post right now: I am writing in C++ (C++20 to be specific). I have a loop with a counter that counts up every turn. Let's...

Issue installing zip file created by setup.py for deploying custom prediction to AI platform

Am following google doc on creating custom prediction(https://cloud.google.com/ai-platform/prediction/docs/custom-prediction-routines). While building a new version for a model AI-platform...

Why is processing an unsorted array the same speed as processing a sorted array with modern x86-64 clang?

I discovered this popular ~9-year-old SO question and decided to double-check its outcomes. So, I have AMD Ryzen 9 5950X, clang++ 10 and Linux, I copy-pasted code from the question and here is...

Why does a loop transitioning from having its uops fed by the Uop Cache to LSD cause a spike in branch-misses?

All benchmarks are run on either Icelake or Whiskey Lake (In Skylake Family). Summary I am seeing a strange phenomina where it appears that when a loop transitions from running out of the Uop...

Large performance variability of fully CPU-bound code for no clear reason

I apologize for the lack of a minimal reproducible example for the problem that follows. I will continue investigating, and if I isolate it to a small section of code, I will add this information...

Why does adding an if(!memcmp()) speed up a loop that makes random short strides through a huge byte array?

Sorry, I have never understood the rules here. I have deleted all the duplicate posts I have posted. This is the first related issue. Please do not mark this post as a duplicate of my other...

Why in the C language loop, accessing array elements is so much slower than accessing variables?

The only difference between the following two tested loop codes is i += shift and i += shifts[str[i]]. They have the same character set size, the same stride of each shift, and the same number of...

Why does my Intel Skylake / Kaby Lake CPU incur a mysterious factor 3 slowdown in a simple hash table implementation?

In short: I have implemented a simple (multi-key) hash table with buckets (containing several elements) that exactly fit a cacheline. Inserting into a cacheline bucket is very simple, and the...

Branch-mispredictions versus cache misses

Consider the following two alternative pieces of code: Alternative 1: if (variable != new_val) // (1) variable = new_val; f(); // This function reads `variable`. Alternative 2: variable =...