本文共 8036 字,大约阅读时间需要 26 分钟。
霍夫曼算法
In this tutorial, we’ll be discussing and implementing the Huffman Coding Algorithm in Java.
在本教程中,我们将讨论和实现Java中的霍夫曼编码算法。
The Huffman Coding Algorithm was discovered by David A. Huffman in the 1950s. The purpose of the Algorithm is lossless data compression.
霍夫曼编码算法由戴维·霍夫曼(David A. Huffman)在1950年代发现。 该算法的目的是无损数据压缩。
This algorithm is commonly used in JPEG Compression.
此算法通常在JPEG压缩中使用。
Now traditionally to encode/decode a string, we can use ASCII values. But this doesn’t compress it. The number of bits involved in encoding the string isn’t reduced.
现在传统上是对字符串进行编码/解码,我们可以使用ASCII值。 但这并没有压缩它。 编码字符串所涉及的位数不会减少。
Unlike ASCII codes, Huffman codes use lesser number of bits.
与ASCII码不同,霍夫曼码使用的位数更少。
The principle of Huffman code is based on the frequency of each data item.
霍夫曼编码的原理是基于每个数据项的频率。
Every data item is assigned a variable length of prefix code (typically a binary string).
每个数据项都分配有可变长度的前缀代码(通常是二进制字符串)。
The length of each prefix code is based on the frequency such that the most commonly occurring data has the smallest prefix code.
每个前缀码的长度基于频率,以使最常出现的数据具有最小的前缀码。
No two different data items would have the same prefix. Hence the prefixes are known as Prefix Codes. This is important to make sure decoded string is the same as the original string.
没有两个不同的数据项具有相同的前缀。 因此,前缀称为前缀代码 。 确保解码后的字符串与原始字符串相同非常重要。
The more often the data is used the lesser bits it takes up.
使用数据的频率越高,占用的比特就越少。
A Disadvantage of Huffman codes is that a minor change in any bit of the encoded string would break the whole message.
霍夫曼代码的缺点是,编码字符串的任何位的微小变化都会破坏整个消息。
The basic idea of the Huffman Algorithm is building a Tree from bottom up.
霍夫曼算法的基本思想是自下而上构建一棵树。
The steps involved in building the Huffman Tree are:
建立霍夫曼树的步骤包括:
Once the tree is built, to find the prefix code of each character we traverse the tree as:
构建完树后,为了找到每个字符的前缀代码,我们遍历树为:
During decoding, we just need to print the character of each leaf traversed by the above prefix code in the Huffman tree.
在解码期间,我们只需要在霍夫曼树中打印以上前缀代码遍历的每个叶子的字符。
An example of a Huffman tree is given below:
霍夫曼树的例子如下:
The string to be encoded needs the prefix codes for all the characters built in a bottom-up manner.
要编码的字符串需要以自下而上的方式构建的所有字符的前缀代码。
The internal node of any two Nodes should have a non-character set to it.
任何两个节点的内部节点都应设置一个非字符。
The code for the HuffmanNode class is given below:
下面给出了HuffmanNode类的代码:
class HuffmanNode implements Comparable{ int frequency; char data; HuffmanNode left, right; public int compareTo(HuffmanNode node) { return frequency - node.frequency; }}
It holds the character and its frequency.
它包含字符及其频率。
The comparable interface is implemented. This would be used when comparing the values in the PriorityQueue in order to get the minimum value always.
实现了可比较的接口。 在比较PriorityQueue中的值时始终使用该值,以便始终获得最小值。
The code for the HuffmanCodeSolution.java class is given below:
下面给出了HuffmanCodeSolution.java类的代码:
package com.journaldev.huffmancoding;import java.util.HashMap;import java.util.Map;import java.util.PriorityQueue;import java.util.Set;public class HuffmanCodeSolution { private static MapcharPrefixHashMap = new HashMap<>(); static HuffmanNode root; public static void main(String[] args) { String test = "ABCD1%$^"; System.out.println("Original Text = "+test); String s = encode(test); decode(s); } private static HuffmanNode buildTree(Map freq) { PriorityQueue priorityQueue = new PriorityQueue<>(); Set keySet = freq.keySet(); for (Character c : keySet) { HuffmanNode huffmanNode = new HuffmanNode(); huffmanNode.data = c; huffmanNode.frequency = freq.get(c); huffmanNode.left = null; huffmanNode.right = null; priorityQueue.offer(huffmanNode); } assert priorityQueue.size() > 0; while (priorityQueue.size() > 1) { HuffmanNode x = priorityQueue.peek(); priorityQueue.poll(); HuffmanNode y = priorityQueue.peek(); priorityQueue.poll(); HuffmanNode sum = new HuffmanNode(); sum.frequency = x.frequency + y.frequency; sum.data = '-'; sum.left = x; sum.right = y; root = sum; priorityQueue.offer(sum); } return priorityQueue.poll(); } private static void setPrefixCodes(HuffmanNode node, StringBuilder prefix) { if (node != null) { if (node.left == null && node.right == null) { charPrefixHashMap.put(node.data, prefix.toString()); } else { prefix.append('0'); setPrefixCodes(node.left, prefix); prefix.deleteCharAt(prefix.length() - 1); prefix.append('1'); setPrefixCodes(node.right, prefix); prefix.deleteCharAt(prefix.length() - 1); } } } private static String encode(String test) { Map freq = new HashMap<>(); for (int i = 0; i < test.length(); i++) { if (!freq.containsKey(test.charAt(i))) { freq.put(test.charAt(i), 0); } freq.put(test.charAt(i), freq.get(test.charAt(i)) + 1); } System.out.println("Character Frequency Map = " + freq); root = buildTree(freq); setPrefixCodes(root, new StringBuilder()); System.out.println("Character Prefix Map = " + charPrefixHashMap); StringBuilder s = new StringBuilder(); for (int i = 0; i < test.length(); i++) { char c = test.charAt(i); s.append(charPrefixHashMap.get(c)); } return s.toString(); } private static void decode(String s) { StringBuilder stringBuilder = new StringBuilder(); HuffmanNode temp = root; System.out.println("Encoded: " + s); for (int i = 0; i < s.length(); i++) { int j = Integer.parseInt(String.valueOf(s.charAt(i))); if (j == 0) { temp = temp.left; if (temp.left == null && temp.right == null) { stringBuilder.append(temp.data); temp = root; } } if (j == 1) { temp = temp.right; if (temp.left == null && temp.right == null) { stringBuilder.append(temp.data); temp = root; } } } System.out.println("Decoded string is " + stringBuilder.toString()); }}class HuffmanNode implements Comparable { int frequency; char data; HuffmanNode left, right; public int compareTo(HuffmanNode node) { return frequency - node.frequency; }}
So our string to be compressed and encoded/decoded is “ABA11$A”.
因此,我们要压缩和编码/解码的字符串是“ ABA11 $ A”。
In encoding
method we store the frequencies of each character in a map first.
在encoding
方法中,我们首先将每个字符的频率存储在地图中。
Then we call the buildTree
method using the approach discussed before.
然后,我们使用前面讨论的方法调用buildTree
方法。
We save the prefix code for each character in a HashMap inside the setPrefixCode
method.
我们将每个字符的前缀代码保存在setPrefixCode
方法内的HashMap中。
In the decode
method, we just traverse the encoded string left or right based on 0 or 1. Once we reach the leaf, we append that leaf data to the StringBuilder and again start from the top for the remaining encoded string.
在decode
方法中,我们仅基于0或1向左或向右遍历编码的字符串。一旦到达叶子,我们会将叶子数据附加到StringBuilder,然后从顶部开始为其余的编码字符串。
Following is the output of the above program:
以下是上述程序的输出:
Reference:
参考:
翻译自:
霍夫曼算法
转载地址:http://uamzd.baihongyu.com/