So, just in raw numbers, these seems like it has a long way to go. Correct me if I'm wrong, but we're looking at a tenth the bandwidth and a thousand times the latency, based on the best case scenario of 2.5GB/s and 20μs.
Still, I'm sure there will be an improvement over going out to disk for some workloads, even if the performance of the actual XPoint chips isn't improved over NVMe drives or the like.
"Correct me if I'm wrong, but we're looking at a tenth the bandwidth and a thousand times the latency, based on the best case scenario of 2.5GB/s and 20μs."
Well considering you just pulled those numbers out of your backside with no evidence to support them whatsoever, you might want to not jump to stupid conclusions.
I actually pulled them from very generous estimates at improvements over the numbers we have seen for the P4800X, the fastest XPoint implementation so far.
AnandTechs benchmarks show a bit over 2GB/s and about 30μs. Again, assuming improvements have been made to the controller or memory packages, the reduction in overhead compared to a PCI-Express bus and any tweaking made for better response times, I think 2.5GB/s and 20μs are reasonable numbers.
So these were not pulled from my backside and you're just a rude ass.
Yeah, assuming that the PCIe P4800X that trivially saturates its PCIe connection means that the inherent bandwidth of Optanes is limited to that of a PCIe connection kind of shows that you really don't understand what this technology is all about.
As a followup to my earlier reply, you literally just said that the fastest HBM2 solutions on the market are pathetically limited to 16GB of bandwidth because a CPU connected to a $10,000 GPU over a PCIe connection can only get data from the HBM2 memory at 16GB/sec.
The P4800X uses a PCI-E 3.0 x4 interface, which would have a max bandwidth of around 4GB/s, not 2GB/s. So there is plenty of headroom available there if the P4800X were capable of higher throughput.
Also I said nothing about HBM2. At all. I think you're confusing me with someone else you're trolling. That's a completely different memory interface and those numbers have nothing to do with the maximum bandwidth of an Optane DIMM.
Actually, you know what, you're right. The PCI-E 3.0 x4 bus would be limited to 2GB/s, so it is possible that the P4800X is bottlenecked AND the XPoint DIMMs could perform higher.
But you found the absolutely worst way to say it and just come across another combative troll.
I think the important thing to focus on here is Optane was never meant to replace SSD’s or DRAM. It combines the benefits of both. But it doesn’t completely combine the benefits of DRAM, which is still faster.
its an extension, NOT REPLACEMENT. If your server needs more, and I know a lot of use-cases that would benefit from more... Highest capacity I've found is around 64GB per slot. with 512 GB dims even if its not as fast, a lot of servers can go into same old config we know with fast/cold storage. We will have a lot more server ram for real-time operations, while inactive, less important data stay inside optane. This is great news.
Achievable throughput after accounting for protocol overhead is largely dependent on the TLP maximum payload size, but for 128B it would be right around 3.24 GB/s
That is a great article to look at for perspective! Xpoint DIMMS are about 10x slower in latency compared to DRAM. Which is fine since the CPU will not be writing directly to the DIMM. Latency is only one variable... DDR4 has slower latency than DDR3 but its much faster
From the previous page of that very same article, the P4800X can hit <10us latency in many scenarios -- and that still includes all of the overhead of the PCIe link, the filesystem, and the OS call/storage driver. Low-mid single digit us latencies are not unthinkable when those factors are removed.
As for bandwidth, that could likely be scaled out just by increasing the number of channels on the controller, just like NAND drives do.
In short, we don't know enough about it yet to make any claims about what performance will be. A fixed percentage boost from the current product might make sense if it were just a gen 2 SSD, but this is a very different product, with a different architecture on a different bus.
Right, but for ballpark numbers, we're still looking a very large gulf in throughput and latency of even the best case scenario for these XPoint modules vs DDR4 modules.
If I were to start guessing, I'd go back to Intel's original slides on Xpoint, not an SSD implementation. That admits there's a gap, but you know, there's that whole persistency thing. It's still meant as storage, not RAM.
This honestly has me interested because it's the first time we can actually see what it's capable of in the best scenario (short of being on-die with the processor).
Well, ideally, in terms of application programming, it would. Persisted direct memory access would be incredible. I keep thinking how much this could transform database applications or other larger data maps. Right now, there's a lot of overhead in the OS->FS->RAM->Transform data, with further seeks, etc. If it was all direct-memory, not having to flow through various FS/OS conventions to use... that could be incredible.
It would, and it's the long term goal, but we're not there yet. Xpoint, ReRam, MRAM and others are aiming for this, but they'd need faster access and better reliability/endurance. Realistically, as long as we have a faster technology (DRAM), the new ones will not replace, but supplement. It may make sense for *some* applications (huge database, HPC with huge dataset) to get more local storage, even if slower, but most applications will behave better with limited, faster DRAM. So, for the moment, they will ADD some intermediary persistent storage, not replace DRAM with it. It just bridges the gap between DRAM (ns scale) and Storage (SSDs are us scale, 1000x slower).
The point is that compared with DRAM it costs less, has higher capacity, and uses significantly less power per capacity. Compared with NAND it is faster and has much lower latency. There are applications that may benefit from such a mix of attributes.
I'm not sure what you mean by "very large gulf". That is a judgment on relative size. What is important is how it matches up with demands of applications. Machine learning and databases, for example, may see large benefits from using this type of memory. The only thing that's clear is that your off-the-cuff dismissal of its potential is inappropriate.
jordanclock, you can't estimate the performance of a part that operates through the memory bus by looking at a different part that runs through the I/O bus, especially as it relates to latency. Those numbers are about as useless as if you had pulled them from your ass.
Yeah, but that's why I started my first comment with "correct me if I'm wrong." Not "berate me if you disagree." I admitted I am getting my numbers from the next closest related product AND that they are broad estimates. It just seems like everyone else is getting their panties in a bunch thinking that we can't discuss the potential performance until we have EXACT numbers.
20µs sounds pretty conservative, especially for read latency. PCIe and NVMe are responsible for at least half of that latency, judging by what's been reported for DRAM-backed NVMe drives.
That's what I was suspecting. Still, we're talking tens of microseconds verse tens of nanoseconds. Waaaay closer than we've ever been but I sure wouldn't want it for my system's main memory yet!
You might... I mean, lets say DRAM becomes more common in smaller amounts, and Optane becomes the bulk... with DRAM becoming another cache layer. Optane becoming direct-memory storage space.
It's not the raw transfer rate, IT'S ABOUT LATENCY, Z nand or regular NVME pretty much got the same latency 50microseonds+. Optane is already 10x better in that scenario.
You're wrong, but understandably so. There are two unknowns, afaics. One is the overhead imposed by nvme, while the other is the media access times of xpoint. An estimate of first one can be had by diffing the results of a loop ramdisk against an nvme ram device (I'm not even sure this exists). If the nvme ram device isn't feasible you can then just test against the ramdisk. That would at least give us an idea of the block layer overhead. With the previous results you could then test an xpoint device and the results should provide, at a minimum, the ceiling for xpoint latency.
DRAM-backed NVMe devices exist, at least in labs. They were used when the NVMe spec was being developed, to make sure it had capacity to support faster devices than NAND which weren't yet ready for prime time. Not sure if anyone ever productized such a device though. They still had latencies in the 10us range, due to the latency of the PCIe interrupt, the OS syscall, and the time it takes for the CPU to wake up and service the interrupt on completion (though polling drivers can avoid this, at a cost of higher CPU usage). Getting rid of those latencies is the whole motivation for putting XPoint on the DRAM bus in the first place.
Lite-On recently introduced a NVMe drive for servers that provides ~200GB of flash storage and a few GB of DRAM-backed storage that gets saved up to flash if there's a power failure. The intention is that the flash is used as a boot drive, and rather than let that PCIe port be underutilized after boot, they give the SSD some extra DRAM and make it a fast journal device.
Sorry, yes, I meant as a product. I've read about these, indirectly, on the lkml, and the lowest latencies were around 5us, but I don't recall the specifics of the system being mentioned. Regardless, you'll read no disagreements from me. My only reason for posting was to mention a few areas of uncertainty, the union of which are likely to contain the really key data wrt xpoint.
Just change the chrome shortcut to "process per site" and you'll fix chrome specially when you have many tabs originating from the same site. For me it went from unusable to smoothly.
Performance wise at least in speed is something around DDR2 667-800. Latency is still 100x high.
Optane was supposed to have a latency in the range of hundreds of nanoseconds (0.1-0.99 microsecons), right now it's in the range of 6-10microseconds (6000-10000 nanoseonds).
Either your numbers are wrong or your math is. If the target latency was 0.1-1ms and the actual latency is 6-10ms then the discrepancy is closer to 10x.
Unless you're disingenuously suggesting that the 10ms absolute worst case can be compared to the 0.1 absolute best case, I guess. Anyway, my point is that if you want to convince people that there's a "100x" problem, your numbers don't support your case.
0.1ms to 10ms is 100x, or am I just not using that newfangled math where 0.1ms x 100 = 10ms. Now admittedly the range he gave was 10x to 100x which is a little silly in it's extreme scale.
No newfangled math involved, you and he are just using regular math wrong.
6/0.1 = 60, 10/0.99 = 10.1 (but let's stop beating around the bush and call it 10). Unless he has more data than he's presented here, "100x" simply isn't justifiable.
I'm not arguing that Optane isn't overly hyped. I'm just saying if your best hypothetical case is 0.1 and your worst real case is 10, then 100x is not an honest number.
Persistent Memory is an umbrella term for memory technologies that can be accessed like RAM, but don't lose their contents when unplugged the way DRAM and SRAM do. NAND flash memory is non-volatile, but its block/page oriented structure makes it impractical to directly access in a RAM-like fashion, and its performance is nowhere close to DRAM in any respect. 3D XPoint is a persistent memory technology from Intel, and Optane is their brand for products using 3D XPoint memory. Optane DC Persistent Memory is the brand for Optane products that go in DDR4 memory module slots, as opposed to Optane SSDs that use the NVMe block storage interface and behave more like NAND flash based SSDs.
Honestly, noticeable differences in "normal" use on a home pc diminish greatly beyond a sata ssd. M.2 drives are nice and speedy but your not going to see the benefit of moving to them from an ssd that you would moving from a mechanical hard drive to an ssd. With this, unless you are in some very specific niche workloads, you won't see any benefit worth the cost.
That being said, it will eventually be in one of my systems.
Honestly, where I see this going, "bootable" could become a thing of the past. If your working memory never loses data, then when you power cycle you should still be in the exact same state as you left it, with the same apps running. No need to reinitialize everything -- that's an artifact of losing RAM data.
I wonder how latency compares to existing nvdimms?
Several companies offer DDR4 dimms with an external battery backup and flash backup.
For example, the motherboard used on NetApp filers has 2 dimm sockets cpu that are battery backed up and use this to log disk writes/journal. Intel Xeon chipsets already support this. It does not go through the pci-e bus, though some vendors do make nvdimm pci-e add-on cards.
While not truely persistent, you can get days of standby on a battery and flush contents to a flash drive for long term lights out.
Seems it would be way faster (true dimm speed) and might be cost competitive using commodity dimms over optane premiums.
This article suggest pricing between optane SSD and ddr4, but ddr4 pricing really varies.
Just makes me wonder if the folks that need super fast persistent storage already have a faster option over optane and the price difference won't scale done enough with the performance loss (in relation to nvdimm).
I would like to think that part of the reason for optane existance is the fact that you can have a lot more GB per stick than traditional RAM Not sure how interesting "persitent" memory is in the equation I'm sure there are some use cases (top of the hat, none really, not when you have to make a sacrifice between RAM speed/latnecy for persistence being the only factors) Servers aren't exactly shut down usually
Having several TB of "RAM" might be more useful, even if its slower than normal RAM, as long as you don't have to access any drives Maybe
That part is easy enough to understand, but you could just use RAM to cache traditional storage with things like AMD's SenseMI, Samsung's Magician Magic, SuperCache/SuperVolume, or FancyCache and they are way more cost effective and probably quicker alternatives anyway. It doesn't take my actual DRAM to cache other storage with the right software for massive performance gains and it's scale pretty linearly with RAM bandwidth as well meaning quad/octa channel is even more insane especially with faster speed/quality DRAM.
"but you could just use RAM to cache traditional storage with things like AMD's SenseMI, Samsung's Magician Magic, SuperCache/SuperVolume, or FancyCache "
Which all have zero sense with RAM because every OS since time immemorial (even DOS with a right driver) uses RAM to cache traditional disks.
It's a big maybe, but in certain workloads maybe though why not just use DRAM to cache a mechanical HD, SSD, or NVMe drive? If you really limited on storage speed and massive storage density I'd think EYPC and Supercache/FancyCache would be the clear winner.
Yeah I've been wondering this as well. In any case why would I want to lose DRAM DIMM slots in the first place? More over if you also have to reduce DRAM speed to that of Optane or NVDIMM that's a huge a negative as well. Personally I like the SenseMI approach of using a tiny % of RAM to cache traditional storage and greatly increase it's performance and in a very cost effective manner by contrast.
NVDIMMs are on the market and are indeed faster since they're just DRAM when powered, but they're *extremely* expensive. Even regular DRAM is very expensive compared to optane, especially if you need a lot in one server, since that usually means adding more CPUs as well. Just being able to get 512G in a single DIMM is a huge advantage.
If I read this and the companion live blog correctly, the real use scenario targeted here are very large, high availability databases. Intel used Cassandra (and HANA) in their presentation. Intelsomebody running SAP's HANA might want to take this for a spin, once Tier-1 OEMs have systems ready. Having your precious database in a non-volatile
Disregard the last two sentences of the above - stupid mobile interface + touch typing. What I meant to say in my last two sentences is that having your precious database always updated in non-volatile memory might be worth it if your business depends on it. For home use, this is still years away from being useful.
You hit the nail on the head. Most obvious use case at present is running large (terrabyte-scale and potentially petabyte-scale) databases in-memory. HANA is an obvious example.
I don't think the non-volatile part is a big thing (lets face it any datacenter worth its salt will have a decent UPS system), but the directly accessible memory + performance increase vs. NAND is the game changer for this workload.
Non-huge real databases (meaning with ACID) will also benefit, because ACID requires write-through which limits transactions by storage performance (especially latency). Potential improvement here, if OS maps persistent pages directly into the database process memory and the database software is aware of its persistence and not as a stupid RAM disk putting all kinds of unnecessary overhead on top of a relatively fast DDR4 (relative to disks, still super slow relative to CPUs).
Next step would be replacing DRAM with smaller, more expensive per GB SRAM on much faster bus (stacked with CPU for shorter lines) and have X-Point pickup the rest. At least on single CPU machines. And replacing last level cache (which is up to 30MB on Xeons) with more cores.
So now i will have to buy the next Xeon generation that will be a poor update to have the Intel Optane support? So now there will be again less space for memory on the intel part as these replaces dimms which is already less then the AMD counterpart and limited artificial capacity depending on cpu type... way to go INTEL.
Oh wait i can already buy it today its called HPE persistent memory.....
Lets be honest about implementation, the OS support for this is limited, the usecase is limited. It has a future, but limited because of the nvme introduction. nvme slots is expandable, memory lanes is always limited.
Are there applications here for scientific computing as well? Problems like computational chemistry and physics that require operations on very large, non-sparse matrices or grid based solvers?
Seems like everyone here is stuck on general compute. Intel is providing yet another piece of hardware to crush very specific tasks. If you’re one of the big boys and can afford the engineers to put these systems together, it’s not a big deal to pay the premium to Intel. It’s another nice diversification area for Intel.
This tech puts a bad taste in my mouth. For some reason it "feels" short lived. Is this a workaround for limited IO availability? I'm thinking 48 dedicated PCIe lanes, directly connected to the CPU, for a dozen NVMe drives in RAID 0.
If it's a workaround for anything, it's the difficulty of improving the latency of an interrupt-driven block storage protocol. NVMe was designed to be more or less the lowest latency storage protocol possible to layer over PCIe, and that combination still adds substantial overhead when you're using a storage medium as fast as 3D XPoint. The memory bus is the only place you can attach storage and avoid that overhead.
I apologize for my ignorance (or if it's been answered), but is the ultimate goal with these memory types (ReRAM, MRAM, XPoint) would be to effectively combine RAM and Storage (e.g. RAM would essentially become the harddrive, assuming large enough storage sizes)?
I understand the intermediate steps of using it as RAM or as supplemental memory in the meantime, but is my understanding correct?
128 Gigabytes is the smallest size? Jeeze! My CPU can't even support more than 64 gigabytes of RAM.
If I could get just four 16 gigabyte sticks of DRAM/Optane-hybrid NVDIMMs (using a Supercapacitor, the contents of DRAM are automatically flushed to the Optane memory when external power is lost), I'd be happy with that.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
73 Comments
Back to Article
jordanclock - Wednesday, May 30, 2018 - link
So, just in raw numbers, these seems like it has a long way to go. Correct me if I'm wrong, but we're looking at a tenth the bandwidth and a thousand times the latency, based on the best case scenario of 2.5GB/s and 20μs.Still, I'm sure there will be an improvement over going out to disk for some workloads, even if the performance of the actual XPoint chips isn't improved over NVMe drives or the like.
CajunArson - Wednesday, May 30, 2018 - link
"Correct me if I'm wrong, but we're looking at a tenth the bandwidth and a thousand times the latency, based on the best case scenario of 2.5GB/s and 20μs."Well considering you just pulled those numbers out of your backside with no evidence to support them whatsoever, you might want to not jump to stupid conclusions.
jordanclock - Wednesday, May 30, 2018 - link
I actually pulled them from very generous estimates at improvements over the numbers we have seen for the P4800X, the fastest XPoint implementation so far.https://www.anandtech.com/show/11930/intel-optane-...
AnandTechs benchmarks show a bit over 2GB/s and about 30μs. Again, assuming improvements have been made to the controller or memory packages, the reduction in overhead compared to a PCI-Express bus and any tweaking made for better response times, I think 2.5GB/s and 20μs are reasonable numbers.
So these were not pulled from my backside and you're just a rude ass.
CajunArson - Wednesday, May 30, 2018 - link
Yeah, assuming that the PCIe P4800X that trivially saturates its PCIe connection means that the inherent bandwidth of Optanes is limited to that of a PCIe connection kind of shows that you really don't understand what this technology is all about.Maybe you should go back to the phone reviews.
CajunArson - Wednesday, May 30, 2018 - link
As a followup to my earlier reply, you literally just said that the fastest HBM2 solutions on the market are pathetically limited to 16GB of bandwidth because a CPU connected to a $10,000 GPU over a PCIe connection can only get data from the HBM2 memory at 16GB/sec.jordanclock - Wednesday, May 30, 2018 - link
The P4800X uses a PCI-E 3.0 x4 interface, which would have a max bandwidth of around 4GB/s, not 2GB/s. So there is plenty of headroom available there if the P4800X were capable of higher throughput.Also I said nothing about HBM2. At all. I think you're confusing me with someone else you're trolling. That's a completely different memory interface and those numbers have nothing to do with the maximum bandwidth of an Optane DIMM.
jordanclock - Wednesday, May 30, 2018 - link
Actually, you know what, you're right. The PCI-E 3.0 x4 bus would be limited to 2GB/s, so it is possible that the P4800X is bottlenecked AND the XPoint DIMMs could perform higher.But you found the absolutely worst way to say it and just come across another combative troll.
p1esk - Wednesday, May 30, 2018 - link
More importantly, how much of that 30μs latency for P4800X comes from PCIe bus?jordanclock - Wednesday, May 30, 2018 - link
I'm going to guess better than half? Between the clockrate of the bus, the physical distance and logical overhead.Samus - Thursday, May 31, 2018 - link
I think the important thing to focus on here is Optane was never meant to replace SSD’s or DRAM. It combines the benefits of both. But it doesn’t completely combine the benefits of DRAM, which is still faster.deil - Friday, June 1, 2018 - link
its an extension, NOT REPLACEMENT. If your server needs more, and I know a lot of use-cases that would benefit from more... Highest capacity I've found is around 64GB per slot. with 512 GB dims even if its not as fast, a lot of servers can go into same old config we know with fast/cold storage. We will have a lot more server ram for real-time operations, while inactive, less important data stay inside optane. This is great news.peevee - Thursday, May 31, 2018 - link
"The PCI-E 3.0 x4 bus would be limited to 2GB/s"It is not.
repoman27 - Thursday, May 31, 2018 - link
Achievable throughput after accounting for protocol overhead is largely dependent on the TLP maximum payload size, but for 128B it would be right around 3.24 GB/sAt IDF back in 2015 Intel suggested ~6 GB/s per channel and ~250 ns latency for 3D XPoint DIMMs: https://www.kitguru.net/components/memory/anton-sh...
emvonline - Friday, June 1, 2018 - link
That is a great article to look at for perspective! Xpoint DIMMS are about 10x slower in latency compared to DRAM. Which is fine since the CPU will not be writing directly to the DIMM. Latency is only one variable... DDR4 has slower latency than DDR3 but its much fastertomatotree - Wednesday, May 30, 2018 - link
From the previous page of that very same article, the P4800X can hit <10us latency in many scenarios -- and that still includes all of the overhead of the PCIe link, the filesystem, and the OS call/storage driver. Low-mid single digit us latencies are not unthinkable when those factors are removed.As for bandwidth, that could likely be scaled out just by increasing the number of channels on the controller, just like NAND drives do.
In short, we don't know enough about it yet to make any claims about what performance will be. A fixed percentage boost from the current product might make sense if it were just a gen 2 SSD, but this is a very different product, with a different architecture on a different bus.
jordanclock - Wednesday, May 30, 2018 - link
Right, but for ballpark numbers, we're still looking a very large gulf in throughput and latency of even the best case scenario for these XPoint modules vs DDR4 modules.sor - Wednesday, May 30, 2018 - link
If I were to start guessing, I'd go back to Intel's original slides on Xpoint, not an SSD implementation. That admits there's a gap, but you know, there's that whole persistency thing. It's still meant as storage, not RAM.This honestly has me interested because it's the first time we can actually see what it's capable of in the best scenario (short of being on-die with the processor).
Dr. Swag - Wednesday, May 30, 2018 - link
If you look at this slide from the liveblog it's supposed to be 2-3x faster and has around 2-3x lower latency than a p4800xhttps://images.anandtech.com/doci/12826/1527704407...
frenchy_2001 - Thursday, May 31, 2018 - link
This slide is about their hypervisor, used with the Optane SSD...Nothing about the Optane Persistent Memory.
CheapSushi - Wednesday, May 30, 2018 - link
It's not replacing DRAM you idiots. It's supplemental.tracker1 - Thursday, May 31, 2018 - link
Well, ideally, in terms of application programming, it would. Persisted direct memory access would be incredible. I keep thinking how much this could transform database applications or other larger data maps. Right now, there's a lot of overhead in the OS->FS->RAM->Transform data, with further seeks, etc. If it was all direct-memory, not having to flow through various FS/OS conventions to use... that could be incredible.frenchy_2001 - Thursday, May 31, 2018 - link
It would, and it's the long term goal, but we're not there yet.Xpoint, ReRam, MRAM and others are aiming for this, but they'd need faster access and better reliability/endurance.
Realistically, as long as we have a faster technology (DRAM), the new ones will not replace, but supplement.
It may make sense for *some* applications (huge database, HPC with huge dataset) to get more local storage, even if slower, but most applications will behave better with limited, faster DRAM.
So, for the moment, they will ADD some intermediary persistent storage, not replace DRAM with it. It just bridges the gap between DRAM (ns scale) and Storage (SSDs are us scale, 1000x slower).
Yojimbo - Thursday, May 31, 2018 - link
The point is that compared with DRAM it costs less, has higher capacity, and uses significantly less power per capacity. Compared with NAND it is faster and has much lower latency. There are applications that may benefit from such a mix of attributes.I'm not sure what you mean by "very large gulf". That is a judgment on relative size. What is important is how it matches up with demands of applications. Machine learning and databases, for example, may see large benefits from using this type of memory. The only thing that's clear is that your off-the-cuff dismissal of its potential is inappropriate.
Yojimbo - Thursday, May 31, 2018 - link
jordanclock, you can't estimate the performance of a part that operates through the memory bus by looking at a different part that runs through the I/O bus, especially as it relates to latency. Those numbers are about as useless as if you had pulled them from your ass.Spunjji - Thursday, May 31, 2018 - link
He could have been more polite, but you really have just made those numbers up (at best it's a semi-educated guess) so he's not wrong either.jordanclock - Thursday, May 31, 2018 - link
Yeah, but that's why I started my first comment with "correct me if I'm wrong." Not "berate me if you disagree." I admitted I am getting my numbers from the next closest related product AND that they are broad estimates. It just seems like everyone else is getting their panties in a bunch thinking that we can't discuss the potential performance until we have EXACT numbers.Billy Tallis - Wednesday, May 30, 2018 - link
20µs sounds pretty conservative, especially for read latency. PCIe and NVMe are responsible for at least half of that latency, judging by what's been reported for DRAM-backed NVMe drives.jordanclock - Wednesday, May 30, 2018 - link
That's what I was suspecting. Still, we're talking tens of microseconds verse tens of nanoseconds. Waaaay closer than we've ever been but I sure wouldn't want it for my system's main memory yet!tracker1 - Thursday, May 31, 2018 - link
You might... I mean, lets say DRAM becomes more common in smaller amounts, and Optane becomes the bulk... with DRAM becoming another cache layer. Optane becoming direct-memory storage space.Spunjji - Thursday, May 31, 2018 - link
Nobody's trying to sell it to you as such, though. That's made pretty clear a few times over.eddman - Wednesday, May 30, 2018 - link
3D xpoint =/= optanePCI-e vs. DIMM: apples vs. oranges
Also: https://www.theregister.co.uk/2016/09/29/xpoint_pr...
Lolimaster - Wednesday, May 30, 2018 - link
It's not the raw transfer rate, IT'S ABOUT LATENCY, Z nand or regular NVME pretty much got the same latency 50microseonds+. Optane is already 10x better in that scenario.tuxRoller - Wednesday, May 30, 2018 - link
You're wrong, but understandably so.There are two unknowns, afaics. One is the overhead imposed by nvme, while the other is the media access times of xpoint.
An estimate of first one can be had by diffing the results of a loop ramdisk against an nvme ram device (I'm not even sure this exists). If the nvme ram device isn't feasible you can then just test against the ramdisk. That would at least give us an idea of the block layer overhead.
With the previous results you could then test an xpoint device and the results should provide, at a minimum, the ceiling for xpoint latency.
tomatotree - Thursday, May 31, 2018 - link
DRAM-backed NVMe devices exist, at least in labs. They were used when the NVMe spec was being developed, to make sure it had capacity to support faster devices than NAND which weren't yet ready for prime time. Not sure if anyone ever productized such a device though. They still had latencies in the 10us range, due to the latency of the PCIe interrupt, the OS syscall, and the time it takes for the CPU to wake up and service the interrupt on completion (though polling drivers can avoid this, at a cost of higher CPU usage). Getting rid of those latencies is the whole motivation for putting XPoint on the DRAM bus in the first place.Billy Tallis - Thursday, May 31, 2018 - link
Lite-On recently introduced a NVMe drive for servers that provides ~200GB of flash storage and a few GB of DRAM-backed storage that gets saved up to flash if there's a power failure. The intention is that the flash is used as a boot drive, and rather than let that PCIe port be underutilized after boot, they give the SSD some extra DRAM and make it a fast journal device.tuxRoller - Friday, June 1, 2018 - link
Sorry, yes, I meant as a product. I've read about these, indirectly, on the lkml, and the lowest latencies were around 5us, but I don't recall the specifics of the system being mentioned.Regardless, you'll read no disagreements from me. My only reason for posting was to mention a few areas of uncertainty, the union of which are likely to contain the really key data wrt xpoint.
nagi603 - Wednesday, May 30, 2018 - link
Finally, enough RAM for Chrome.... until a new version comes out :DLolimaster - Wednesday, May 30, 2018 - link
Just change the chrome shortcut to "process per site" and you'll fix chrome specially when you have many tabs originating from the same site. For me it went from unusable to smoothly.40-50tabs with the default settings is nightmare.
Lolimaster - Wednesday, May 30, 2018 - link
Performance wise at least in speed is something around DDR2 667-800. Latency is still 100x high.Optane was supposed to have a latency in the range of hundreds of nanoseconds (0.1-0.99 microsecons), right now it's in the range of 6-10microseconds (6000-10000 nanoseonds).
Old_Fogie_Late_Bloomer - Wednesday, May 30, 2018 - link
Either your numbers are wrong or your math is. If the target latency was 0.1-1ms and the actual latency is 6-10ms then the discrepancy is closer to 10x.Unless you're disingenuously suggesting that the 10ms absolute worst case can be compared to the 0.1 absolute best case, I guess. Anyway, my point is that if you want to convince people that there's a "100x" problem, your numbers don't support your case.
rahvin - Wednesday, May 30, 2018 - link
0.1ms to 10ms is 100x, or am I just not using that newfangled math where 0.1ms x 100 = 10ms. Now admittedly the range he gave was 10x to 100x which is a little silly in it's extreme scale.Old_Fogie_Late_Bloomer - Wednesday, May 30, 2018 - link
No newfangled math involved, you and he are just using regular math wrong.6/0.1 = 60, 10/0.99 = 10.1 (but let's stop beating around the bush and call it 10). Unless he has more data than he's presented here, "100x" simply isn't justifiable.
I'm not arguing that Optane isn't overly hyped. I'm just saying if your best hypothetical case is 0.1 and your worst real case is 10, then 100x is not an honest number.
Lolimaster - Thursday, May 31, 2018 - link
Optane latency was supposed to be in the sub microsecond range hence the 0.1+ microseconds.Lolimaster - Wednesday, May 30, 2018 - link
My idea is that the regular DRAM is working as a cache for the most changing data keeping optane for workset data and read operations.ಬುಲ್ವಿಂಕಲ್ ಜೆ ಮೂಸ್ - Wednesday, May 30, 2018 - link
I'd like to test the ease of keeping persistent malware in that workset data for these new dimmsCheapSushi - Wednesday, May 30, 2018 - link
That's exactly what is going on. In fact a 3D Xpoint DIMM is PAIRED with a RAM DIMM. Idiots on here keep thinking it's replacing the DRAM. It's NOT.alpha754293 - Wednesday, May 30, 2018 - link
Sorry for coming to the party late but what is Persistent Memory? And how is that different than Optane?I'm not sure if I really understand it.
Is it just "RAM"? NV RAM? ???
If someone can just walk back about 100 steps, or point to a reference, that would be greatly appreciated.
Thank you.
Billy Tallis - Wednesday, May 30, 2018 - link
Persistent Memory is an umbrella term for memory technologies that can be accessed like RAM, but don't lose their contents when unplugged the way DRAM and SRAM do. NAND flash memory is non-volatile, but its block/page oriented structure makes it impractical to directly access in a RAM-like fashion, and its performance is nowhere close to DRAM in any respect. 3D XPoint is a persistent memory technology from Intel, and Optane is their brand for products using 3D XPoint memory. Optane DC Persistent Memory is the brand for Optane products that go in DDR4 memory module slots, as opposed to Optane SSDs that use the NVMe block storage interface and behave more like NAND flash based SSDs.faiakes - Wednesday, May 30, 2018 - link
Would it make sense to have a OS bootable 512Gb stick?CaptCalamity - Wednesday, May 30, 2018 - link
Honestly, noticeable differences in "normal" use on a home pc diminish greatly beyond a sata ssd. M.2 drives are nice and speedy but your not going to see the benefit of moving to them from an ssd that you would moving from a mechanical hard drive to an ssd. With this, unless you are in some very specific niche workloads, you won't see any benefit worth the cost.That being said, it will eventually be in one of my systems.
CaptCalamity - Wednesday, May 30, 2018 - link
Strike "With this,"tomatotree - Thursday, May 31, 2018 - link
Honestly, where I see this going, "bootable" could become a thing of the past. If your working memory never loses data, then when you power cycle you should still be in the exact same state as you left it, with the same apps running. No need to reinitialize everything -- that's an artifact of losing RAM data.tmbm50 - Wednesday, May 30, 2018 - link
I wonder how latency compares to existing nvdimms?Several companies offer DDR4 dimms with an external battery backup and flash backup.
For example, the motherboard used on NetApp filers has 2 dimm sockets cpu that are battery backed up and use this to log disk writes/journal. Intel Xeon chipsets already support this. It does not go through the pci-e bus, though some vendors do make nvdimm pci-e add-on cards.
While not truely persistent, you can get days of standby on a battery and flush contents to a flash drive for long term lights out.
Seems it would be way faster (true dimm speed) and might be cost competitive using commodity dimms over optane premiums.
This article suggest pricing between optane SSD and ddr4, but ddr4 pricing really varies.
Just makes me wonder if the folks that need super fast persistent storage already have a faster option over optane and the price difference won't scale done enough with the performance loss (in relation to nvdimm).
Peter2k - Wednesday, May 30, 2018 - link
I would like to think that part of the reason for optane existance is the fact that you can have a lot more GB per stick than traditional RAMNot sure how interesting "persitent" memory is in the equation
I'm sure there are some use cases (top of the hat, none really, not when you have to make a sacrifice between RAM speed/latnecy for persistence being the only factors)
Servers aren't exactly shut down usually
Having several TB of "RAM" might be more useful, even if its slower than normal RAM, as long as you don't have to access any drives
Maybe
invasmani - Wednesday, May 30, 2018 - link
That part is easy enough to understand, but you could just use RAM to cache traditional storage with things like AMD's SenseMI, Samsung's Magician Magic, SuperCache/SuperVolume, or FancyCache and they are way more cost effective and probably quicker alternatives anyway. It doesn't take my actual DRAM to cache other storage with the right software for massive performance gains and it's scale pretty linearly with RAM bandwidth as well meaning quad/octa channel is even more insane especially with faster speed/quality DRAM.peevee - Friday, June 1, 2018 - link
"but you could just use RAM to cache traditional storage with things like AMD's SenseMI, Samsung's Magician Magic, SuperCache/SuperVolume, or FancyCache "Which all have zero sense with RAM because every OS since time immemorial (even DOS with a right driver) uses RAM to cache traditional disks.
invasmani - Wednesday, May 30, 2018 - link
It's a big maybe, but in certain workloads maybe though why not just use DRAM to cache a mechanical HD, SSD, or NVMe drive? If you really limited on storage speed and massive storage density I'd think EYPC and Supercache/FancyCache would be the clear winner.invasmani - Wednesday, May 30, 2018 - link
Yeah I've been wondering this as well. In any case why would I want to lose DRAM DIMM slots in the first place? More over if you also have to reduce DRAM speed to that of Optane or NVDIMM that's a huge a negative as well. Personally I like the SenseMI approach of using a tiny % of RAM to cache traditional storage and greatly increase it's performance and in a very cost effective manner by contrast.tomatotree - Thursday, May 31, 2018 - link
NVDIMMs are on the market and are indeed faster since they're just DRAM when powered, but they're *extremely* expensive. Even regular DRAM is very expensive compared to optane, especially if you need a lot in one server, since that usually means adding more CPUs as well. Just being able to get 512G in a single DIMM is a huge advantage.eastcoast_pete - Wednesday, May 30, 2018 - link
If I read this and the companion live blog correctly, the real use scenario targeted here are very large, high availability databases. Intel used Cassandra (and HANA) in their presentation. Intelsomebody running SAP's HANA might want to take this for a spin, once Tier-1 OEMs have systems ready. Having your precious database in a non-volatileeastcoast_pete - Wednesday, May 30, 2018 - link
Disregard the last two sentences of the above - stupid mobile interface + touch typing. What I meant to say in my last two sentences is that having your precious database always updated in non-volatile memory might be worth it if your business depends on it. For home use, this is still years away from being useful.Jon Tseng - Thursday, May 31, 2018 - link
You hit the nail on the head. Most obvious use case at present is running large (terrabyte-scale and potentially petabyte-scale) databases in-memory. HANA is an obvious example.I don't think the non-volatile part is a big thing (lets face it any datacenter worth its salt will have a decent UPS system), but the directly accessible memory + performance increase vs. NAND is the game changer for this workload.
peevee - Friday, June 1, 2018 - link
Non-huge real databases (meaning with ACID) will also benefit, because ACID requires write-through which limits transactions by storage performance (especially latency).Potential improvement here, if OS maps persistent pages directly into the database process memory and the database software is aware of its persistence and not as a stupid RAM disk putting all kinds of unnecessary overhead on top of a relatively fast DDR4 (relative to disks, still super slow relative to CPUs).
Next step would be replacing DRAM with smaller, more expensive per GB SRAM on much faster bus (stacked with CPU for shorter lines) and have X-Point pickup the rest. At least on single CPU machines. And replacing last level cache (which is up to 30MB on Xeons) with more cores.
duploxxx - Thursday, May 31, 2018 - link
So now i will have to buy the next Xeon generation that will be a poor update to have the Intel Optane support? So now there will be again less space for memory on the intel part as these replaces dimms which is already less then the AMD counterpart and limited artificial capacity depending on cpu type... way to go INTEL.Oh wait i can already buy it today its called HPE persistent memory.....
Lets be honest about implementation, the OS support for this is limited, the usecase is limited. It has a future, but limited because of the nvme introduction. nvme slots is expandable, memory lanes is always limited.
Landos - Thursday, May 31, 2018 - link
Are there applications here for scientific computing as well? Problems like computational chemistry and physics that require operations on very large, non-sparse matrices or grid based solvers?flgt - Thursday, May 31, 2018 - link
Seems like everyone here is stuck on general compute. Intel is providing yet another piece of hardware to crush very specific tasks. If you’re one of the big boys and can afford the engineers to put these systems together, it’s not a big deal to pay the premium to Intel. It’s another nice diversification area for Intel.eva02langley - Thursday, May 31, 2018 - link
Seems to me that Intel is trying to solve one of the biggest problem of Quantum computing, memory amount. I see potential in this area at least.pogostick - Thursday, May 31, 2018 - link
This tech puts a bad taste in my mouth. For some reason it "feels" short lived. Is this a workaround for limited IO availability? I'm thinking 48 dedicated PCIe lanes, directly connected to the CPU, for a dozen NVMe drives in RAID 0.Billy Tallis - Thursday, May 31, 2018 - link
If it's a workaround for anything, it's the difficulty of improving the latency of an interrupt-driven block storage protocol. NVMe was designed to be more or less the lowest latency storage protocol possible to layer over PCIe, and that combination still adds substantial overhead when you're using a storage medium as fast as 3D XPoint. The memory bus is the only place you can attach storage and avoid that overhead.peevee - Friday, June 1, 2018 - link
PCIe interrupt processing latencies (through MSI) on Intel were about 500ns 10 years ago on old platforms with separate MCH. Seehttps://www.intel.com/content/dam/www/public/us/en...
I am sure now, with newer PCIe version and newer, faster CPUs, it can only be less.
Of course NVMe introduces its own overhead. But even the 30 microsecond latencies are far more than just one interrupt.
Trackster11230 - Monday, June 4, 2018 - link
I apologize for my ignorance (or if it's been answered), but is the ultimate goal with these memory types (ReRAM, MRAM, XPoint) would be to effectively combine RAM and Storage (e.g. RAM would essentially become the harddrive, assuming large enough storage sizes)?I understand the intermediate steps of using it as RAM or as supplemental memory in the meantime, but is my understanding correct?
Lycanphoenix - Monday, February 18, 2019 - link
128 Gigabytes is the smallest size? Jeeze! My CPU can't even support more than 64 gigabytes of RAM.If I could get just four 16 gigabyte sticks of DRAM/Optane-hybrid NVDIMMs (using a Supercapacitor, the contents of DRAM are automatically flushed to the Optane memory when external power is lost), I'd be happy with that.