Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Math Communications The Internet

Researchers Claim Locally-Testable-Code Breakthrough With Exotic Multi-Dimensional Graph (quantamagazine.org) 62

"A team of researchers has finally created a long-sought locally testable code that can immediately betray whether it's been corrupted..." reports Quanta magazine.

"Many thought local testability would never be achieved in its ideal form." Now, in a preprint released on November 8, the computer scientist Irit Dinur of the Weizmann Institute of Science and four mathematicians, Shai Evra, Ron Livne, Alex Lubotzky and Shahar Mozes, all at the Hebrew University of Jerusalem, have found it. "It's one of the most remarkable phenomena that I know of in mathematics or computer science," said Tom Gur of the University of Warwick. "It's been the holy grail of an entire field."

Their new technique transforms a message into a super-canary, an object that testifies to its health better than any other message yet known. Any corruption of significance that is buried anywhere in its superstructure becomes apparent from simple tests at a few spots. "This is not something that seems plausible," said Madhu Sudan of Harvard University. "This result suddenly says you can do it."

Most prior methods for encoding data relied on randomness in some form. But for local testability, randomness could not help. Instead, the researchers had to devise a highly nonrandom graph structure entirely new to mathematics, which they based their new method on. It is both a theoretical curiosity and a practical advance in making information as resilient as possible....

To get a sense of what their graph looks like, imagine observing it from the inside, standing on a single edge. They construct their graph such that every edge has a fixed number of squares attached. Therefore, from your vantage point you'd feel as if you were looking out from the spine of a booklet. However, from the other three sides of the booklet's pages, you'd see the spines of new booklets branching from them as well. Booklets would keep branching out from each edge ad infinitum. "It's impossible to visualize. That's the whole point," said Lubotzky. "That's why it is so sophisticated...."

[A] test at one node can reveal information about errors from far away nodes. By making use of higher dimensions, the graph is ultimately connected in ways that go beyond what we typically even think of as connections... It establishes a new state of the art for error-correcting codes, and it also marks the first substantial payoff from bringing the mathematics of high-dimensional expanders to bear on codes...

Practical and theoretical applications should soon follow. Different forms of locally testable codes are now being used in decentralized finance, and an optimal version will allow even better decentralized tools.

This discussion has been archived. No new comments can be posted.

Researchers Claim Locally-Testable-Code Breakthrough With Exotic Multi-Dimensional Graph

Comments Filter:
  • WTH? (Score:5, Interesting)

    by robi5 ( 1261542 ) on Sunday November 28, 2021 @12:44PM (#62027789)

    What the hell is it about? The summary talks about local testability, ideal code, canary etc. as if all of us here shared a single understanding of it. Maybe start the summary with framing WTH it's even about? Otherwise it reads like spiritualism

    • by Anonymous Coward

      To get a sense of what their graph looks like, imagine observing it from the inside, standing on a single edge.
      (...)
      "It's impossible to visualize. That's the whole point," said Lubotzky. "That's why it is so sophisticated...."

      Well, there's two minutes of my life I'll never get back. Thanks a lot Slashdot, for letting this garbage through.

      • "It's been the holy grail of an entire field."

        Which field? Whatever field it is it's so obscure that I've never heard of it, and I took coding theory when I was still at Uni. Comments from the paper like "High rate LTCs could be useful in practice" don't exactly inspire confidence, it means "we have a solution and think we may possibly have managed to think up a problem that it solves".

        I suspect the field that it's the holy grail of is "artificial problems for which this conference paper is the solution".

    • by Fly Swatter ( 30498 ) on Sunday November 28, 2021 @12:59PM (#62027827) Homepage
      Sounds like they are talking about a better checksum algorithm to verify your data is in fact not corrupt.

      Their starting point in the article is equivalent to a parity bit, which can be fooled by just changing certain bits within the data to get the same parity so that it still looks authentic. Their breakthrough is an algorithm that can not be fooled by things like this even for very large data sets.

      -Just a layman here so I might be completely wrong.
      • by leptons ( 891340 )
        If that's what it's about, then they should have just said it. They lost me within a few paragraphs because it wasn't really clear any of it was going to start making sense.
        • by Entrope ( 68843 ) on Sunday November 28, 2021 @03:10PM (#62028075) Homepage

          In short, someone figured out how to generate an error-correcting code that can be short relative to the input message, but which lets a receiver detect errors with high probability from a small number of bits, randomly chosen from the coded message. One immediate application is that a sender can spread the message across multiple blocks, and the receiver can detect errors on each block independently to request a retransmission of a block with errors, before receiving the whole message. (In contrast, other schemes do this by adding overhead before encoding the message: For example, a sequence number, some indication of message boundaries, a per-message CRC, and a per-block CRC.)

          Probably the most important detail -- which is only implicit in the paper, and not explicitly stated in the summary or the linked article -- is that this code is non-systematic [wikipedia.org]: The plaintext message is not embedded directly in the codewords that can be sent. Instead, the codewords are generated through some transformation of the message that creates outputs that are longer than the inputs.

          Making a code non-systematic is not nearly enough to make it locally testable, though, and a key part of what this method does is to spread the error detection information very evenly across the output. That makes it feasible to check errors even given randomly selected bits from a coded message.

          • And yet, most people will still use crc/md5/sha for checksums, because they are good enough.

            • by Entrope ( 68843 )

              CRC is the only one of those that is remotely similar in use case. This is for a link-layer protocol with significant chance of bit errors and messages considerably larger than the delay-bandwidth product. Not checking for file corruption, or even applications like BitTorrent, but things like mobile wireless links. The base station will have a large bunch of data it wants to send, and if it can tailor the code rate to the link conditions while getting quick feedback on corrupted blocks, the effective thr

          • The size of the error correction is optimal for any number of bit errors . Want one bit ECC then your code is a certain size larger than raw. Want two bits? Then larger . Etc....

            So can we make a code that can detect any number of errors?
            Well imagine I had two messages that were encoded this way each K raw bits and K+C total bits long. Now I induce errors that flip all the bits in one set of K+C bits to make the bit string identical to the other encoded message. Well that is by definition a perfectly fo

            • by Entrope ( 68843 ) on Sunday November 28, 2021 @09:25PM (#62028699) Homepage

              Nobody claimed this code can detect all errors. Still, most codes can detect an arbitrary number of errors by making the rate small enough. For example, a 32-bit CRC will detect up to 8 bit errors in the codewords for an 8-bit input. As the input gets bigger, a 32-bit CRC becomes less able to do that. Notably, the relevant corruption model for ECCs is random noise (hard errors, erasures or soft errors) rather than intentional corruption -- cryptography is the domain of security against a threat that "knows" your message structure.

              Still, ECCs vary widely in strength. For example, Hamming codes are very simple, but only provide a distance of 3 bits (valid codewords differ by 3 bits). The version usually used in ECC RAM adds a parity bit to make the distance 4 bits, enabling single error correction with double error detection (SECDED). Hamming codes use the fewest possible number of bits to do that, so they are good in a code-rate sense as well as simplicity, but they compromise on code distance and you need essentially the entire message to check for decodability; they are not locally testable.

              The paper here claims it is the first code that can provide constant values for all of code rate, distance and checking locality. That does not mean it is a great code for a given application, but it is at a minimum a useful new discovery and way of thinking about how to build ECCs, and its relationship to other good codes (expander codes) suggests that it is good for some applications.

          • It's a hand-crafted error correction code that is short relative to the input message, but very long relative to other error correction codes.

            Furthermore, as with other types of error correction, if there is no corruption you still had to compare the whole thing. All this does is speed up detection of the corrupted samples, at the cost of a larger code that is hand-crafted, and increased processing requirements for non-corrupt messages.

            • by Entrope ( 68843 )

              ... and by "hand-crafted" you mean (from the preprint):

              an infinite family of left-right Cayley complexes and base codes that yield locally testable codes.

              • So you're gonna lie and cherry pick a phrase that doesn't refute what I said, and ignore all the parts that talk about it having to be hand-crafted?

                You thought the words "infinite family" refuted what I said? If that's your level of understanding, why are you arguing with anybody in any way?

                • by Entrope ( 68843 )

                  You were entirely wrong. Don't be an asshole about it. You still don't point to anything that says the code is hand-crafted. Your claim is even worse than saying that, for example, CRC-32C and RFC 6330 are hand-crafted, because most of this preprint describes how to construct codes with user-chosen parameters.

          • by gweihir ( 88907 )

            That does make some sense. It also makes it clear that this is bout error detection, not about cryptographic integrity verification. It does not seem to be about error-correction though.

            • by Entrope ( 68843 )

              Section 4.2 of the preprint gives directions on how to perform local error correction.

              • by gweihir ( 88907 )

                You are correct. Obviously, I should have looked at the preprint and not the garbled summary here.

        • by sd4f ( 1891894 )
          They may have, I don't know, but the question is, has the person writing the article understood anything whatsoever, do they have even basic knowledge of the subject they're writing an article about? I would hazard a guess that the answer is; not a fucking clue!
    • Otherwise it reads like spiritualism

      You might be somewhat over-stating the benefits of numerism.

    • I'm trying to guess what this is about too. My current best guess is "corrupt" doesn't mean "damaged" but instead means "intentionally altered in a way that still passes". Then I could believe a small sample would have to include some bits that were altered. But if that's right, I don't see how a reader could detect it if the checksums all pass, unless they have a correct copy to compare to. So I'm probably wrong.

      If they did mean damaged, well, if message has just one bit flipped, if you read the entire

  • by JoshuaZ ( 1134087 ) on Sunday November 28, 2021 @12:47PM (#62027795) Homepage

    Since Shannon and Hamming's work about 60 years ago, there's been a lot of interest in error correcting codes. These are ways of embedding data which allows one to detect or correct errors in transmission. Natural languages do this mostly automatically. For example, if I inclde a typo in this sentence, you can probably notice it and figure out what I wanted. But they aren't perfect at this. If I write "I have a pet bat" it might mean "I have a pet cat" with a typo, but you can't be certain. Error correcting codes let you handle at least some number of errors of a certain type.

    But how do you do this? We'll one naive thing is to just repeat each bit in your message a whole bunch of times. Let's say we repeat everything three times. So if you get "ccb" you can be pretty sure that I meant was "c" not "b". To do this in a mathematically precise fashion, we generally talk about this in terms of strings of 0s and 1s, rather than letters, but the same idea applies. (I tried writing out the whole sentence of "I have a pet cat" repeating each word but the stupid ASCII art filter thought I was trying to do ASCII art.)

    Notice that in order for this code to work we had to send things which are three times as long as our original message, and we can correct any single typo. That's not great. We can do a bit better. For example, Hamming codes https://en.wikipedia.org/wiki/Hamming_code [wikipedia.org] only increase the message size a teeny bit and still allow us to do a good job correcting a single error like this.

    There's some technical subtlety here between "detecting" and "correcting" an error. For example, if you get the sentence "I have a pet dat" you can tell there's a least one error, but correcting it is tough. Should that have read "dog" (two errors), "cat" (one error) or "bat" (one error)- all you can tell is that "dat" is not a type of animal so something has gone wrong here. You can guess what is most likely, but you can't be certain.

    One issue is that we have an existence proof of some very good codes. These codes are good in the sense that they don't require messages to be much longer than the message you intend to send, and also they let you detect even large numbers of errors. The argument essentially works by roughly speaking picking an assignment rule at random, and showing that with non-zero probability the code will do a good job. But actually constructing such codes is tough. This work roughly speaking manages to create one of those close to ideal sets of codes that we knew had to exist but had trouble constructing. They did so by using some ideas from graph theory that had previously been used to construct codes but had not succeeded at making really good codes. But these are very good, due to some new insights and careful constructions. One really nice thing about the codes they constructed is that errors can be seen by only checking a small part of the message.

    It is worth noting that the work here is using a variant of "expander graphs" and one of the authors of this is Irit Dinur. She previously used expander graphs to prove a version of the PCP theorem https://en.wikipedia.org/wiki/PCP_theorem [wikipedia.org] which essentially says that you can turn any mathematical proof into a proof which can be verified with a high probability by only checking a small section of the proof chosen at random. So there's some definite commonality of theme here.

  • As another poster aluded to, no error correcting code can detect an error that transforms one valid message into another valid message.

    It is only possible to make the probability of such errors low. Think of the empty barstool joke. There is the possibility of the beautiful woman materializing and it's nonzero.

    • What does that have to do with anything?

      • It has to do with people thinking that magic exists in computing.

        Perfect lossless compression of arbitrarily large files down to a single byte.

        Forward error correcting codes for free-space optics or microwave links that will "fix in software" not just a bird or insect flying through the beam but also a tree falling on the tower and knocking it down.

        Etc.

    • No, this method requires a hand-crafted code, there is no such thing as a valid message other than the one it was crafted for.

      This story is about creating an error-detection code that is 4x the size of the data, by which you can more rapidly detect large errors.

      Of course for single-bit errors, it is better to just compare each bit.

  • ...BUT...if the math and mechanics behind it are so dense that your average "layman" can't understand it, then it's functionally useless.

    Layman in this case would be the person's who's job it is to ensure said messages are uncorrupted. If that person can't, with confidence, say that the message is uncorrupted then how is this in any way useful to them, and their company at large?

    • You already have specialists work to implement the most difficult error correcting and cryptographic algorithms in libraries that everyone else uses. In the case of crypto, I certainly hope that you aren't writing your own code; so many holes in cryptographic implementations are due to subtle issues which experts would be aware of, but regular programmers might not. When I teach crypto algorithms to my students I always make sure that they understand that even for relatively simple things like RSA and Diffi
    • by Nkwe ( 604125 ) on Sunday November 28, 2021 @01:42PM (#62027911)

      ...BUT...if the math and mechanics behind it are so dense that your average "layman" can't understand it, then it's functionally useless.

      Layman in this case would be the person's who's job it is to ensure said messages are uncorrupted. If that person can't, with confidence, say that the message is uncorrupted then how is this in any way useful to them, and their company at large?

      How many "layman" (or even developers for that matter) actually understand the the math and technical mechanics of existing error correction systems that are in use today? Assuming the research checks out, it will likely end up as a set of libraries that software architects and developers use to ensure more solid communications. All the layman really needs to know is that error correction is being used and all (most) developers need to know is how to make the EncodeForTransmission() and DecodeAndValidate() calls. Software architects will need to know how solid the algorithms are (how many and what kind of errors can be detected / corrected), and how much overhead (message size, cpu) is added. Only a handful of folks need to know how it really works.

      • all (most) developers need to know is how to make the EncodeForTransmission() and DecodeAndValidate() calls.

        You'd be surprised how many developers make mistakes because they incorrectly call those two functions.

      • developers need to know is how to make the EncodeForTransmission() and DecodeAndValidate() calls.
        Error detection/correction works on a much lower level than this.

        No one is doing an EncodeForTransmission("text of this email") ...

        • by decep ( 137319 )

          developers need to know is how to make the EncodeForTransmission() and DecodeAndValidate() calls.
          Error detection/correction works on a much lower level than this.

          No one is doing an EncodeForTransmission("text of this email") ...

          I see you have never used PHP.

    • by dargaud ( 518470 )
      It will be just 2 math transforms: one to convert a block into verifiable code, then transmit, then another to convert it back with the certainty that it hasn't been corrupted. The only expertise will be to read the 'PASSED' confirmation.
      • by PPH ( 736903 )

        Anecdote: Back in my days at Boeing, we had a s/w engineer responsible for developing some automated test routines for aircraft acceptance testing. Initially, he wrote some subroutine stubs that did nothing except return the 'PASS' flag. The intent being to test integration into the main program. And then come back and fill in the actual functioning code. And also to get credit for completing his assigned tasks. But he fell behind and never had the opportunity to complete the subroutine definitions. Finally

        • by YVRGeek ( 669881 )
          Well, the whole point of this technique is precisely that it is fundamentally, provably and demonstrably NOT able to be "faked out". You're missing the entire point of it I'm afraid. We've long suspected that this was possible but no one had actually "discovered" the mathematical technique to do this... until now.
          • PPH was discussing this in terms of the implementation of the algorithm; if you scroll up the thread, you'll see they were responding to the thread which started with grasshoppa discussing the difficulty of implementation if people didn't understand it. You still need to implement the actual algorithms, and PPH is telling a story where instead of an implementation of an algorithm someone did a lot less work.
            • by PPH ( 736903 )

              Pretty much this.

              You have to trust the entire process that provides this function. The more complex the tool, the fewer people that understand it and the more we have to just trust them.

              Now if you'll excuse me, I have to install Windows 11.

        • Obviously if you write skeleton code, the result should be "failed".
          Or a "throw RuntimeException('unimplemented method')".

        • Anecdote: Back in my days at Boeing ....

          I was going to say that this could never happen, code reviews etc etc... And then I remembered my days working adjacent to the aircraft industry, and I thought; Yes, this could totally have happened.

          They're safer now, so that's good.

    • by ceoyoyo ( 59147 )

      Do you know how your credit card number is validated? Have you ever worked it out by hand? The average layman doesn't, and hasn't. Yet it's still quite useful.

      The average layman also doesn't know how a fin field effect transistor works, yet they find them incredibly useful for posting cat pictures on the Internet.

    • Do you thoroughly understand how the computer in your car detects and computes the proper amount of gas to inject?
      Do you fully understand how the routers between you and Slashdot decide which queue the packets of your message should be in, and why they should NOT be in the same queue as your Youtube or Netflix packets?

      It's unlikely you really understand both, yet you use these things all the time. Certainly you don't understand the CPU you are using to read this message. yet, you're using it.

      For non-human

      • For non-human animals, an individual animal will hunt down some food, kill it and eat it. That same individual will dig a hole or otherwise make a shelter for themselves. The same individual goes to the creek to get water for themselves. Humans don't do this. That's one major way we're different from the other animals, and why we can do things other animals, like building cars or rockets.

        A rabbit only has what that rabbit can get or make for itself.

        Ants and other eusocial insects do this sort of thing. Some

        • True. Once we're done, I fully expect them to take over.
        • That's true, a few species have individuals in a few different roles.

          The US Department of Labor uses a list of 867 occupations in 98 groups. None of those 867 really fits my job. Meaning there are actually several thousand different jobs for humans.

  • Really, break it down to something that is at least close to fucking English.

    What I hope I got correct from this is that an error in one part of a system will create a very noticible error in a piece of information that is making it's way through the entire system.

    I send the number "3" to wander through the entire city, it gets beat up and robbed in the ghetto, and it returns as a "0".

    Or maybe I'm staring at a panel of lights, all that should be lit. If one or more of the bulbs are dark, the

    • You're very close.

      As I understand it, this is an error detection (? correction?) code that lets you check any arbitrary segment of data you like, you don't need a whole block or a whole message.

      This might be useful where you're using streams rather than blocked data, or for very slow data.

      Uploading new firmware to a satellite or a rover would be a perfect situation where being able to error detect/correct as the stream progresses would be useful as you'd have sufficient otherwise idle time.

    • JosuhaZ said "Since Shannon and Hamming's work about 60 years ago, there's been a lot of interest in error correcting codes. These are ways of embedding data which allows one to detect or correct errors in transmission." The buzzwords were needed to add sex to an old boring subject. Adding more parity and checksums enhances message integrity, at the cost of sending more bits. The existing digital hashes are about optimum, and good work by the Chinese exposed vulnerabilities in the old ones, and the probabil
  • The description of other methods is all wrong. Verifying the integrity of a message relies very much not "on randomness". It relies on a checksum and a signature and trust if it is done cryptographically. Also, what has "encoding data" to do with it?

    Whoever wrote that "article" did not even understand whether this is error detection or cryptographic integrity checks or something else. That person just seems to have taken words at random from some source and connected them in some way that seemed to fit.

  • This is AFAICT quite deep into Math brainfuck territory for me, but as I understand it, this is basically, in software terms, a package that can prove its own validity in a way that is unbreakable and works with single chunks of the package even if other chunks aren't available. Am I getting this right?

  • Higher dimension maths have been chasing relevance since before the 1970’s. That’s when I benefited from a 13th dimensional mathematician on sabbatical teaching ComSci. Twenty years later re-upping that degree at Columbia, TA’s were on sabbatical trying to factor cryptography.

    Here authentication results from 40 yrs. in the trenches! No small accomplishment.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...