博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
霍夫曼算法_霍夫曼编码算法
阅读量:2529 次
发布时间:2019-05-11

本文共 8036 字,大约阅读时间需要 26 分钟。

霍夫曼算法

In this tutorial, we’ll be discussing and implementing the Huffman Coding Algorithm in Java.

在本教程中,我们将讨论和实现Java中的霍夫曼编码算法。

霍夫曼编码 (Huffman Coding)

The Huffman Coding Algorithm was discovered by David A. Huffman in the 1950s. The purpose of the Algorithm is lossless data compression.

霍夫曼编码算法由戴维·霍夫曼(David A. Huffman)在1950年代发现。 该算法的目的是无损数据压缩。

This algorithm is commonly used in JPEG Compression.

此算法通常在JPEG压缩中使用。

Now traditionally to encode/decode a string, we can use ASCII values. But this doesn’t compress it. The number of bits involved in encoding the string isn’t reduced.

现在传统上是对字符串进行编码/解码,我们可以使用ASCII值。 但这并没有压缩它。 编码字符串所涉及的位数不会减少。

Unlike ASCII codes, Huffman codes use lesser number of bits.

与ASCII码不同,霍夫曼码使用的位数更少。

The principle of Huffman code is based on the frequency of each data item.

霍夫曼编码的原理是基于每个数据项的频率。

Every data item is assigned a variable length of prefix code (typically a binary string).

每个数据项都分配有可变长度的前缀代码(通常是二进制字符串)。

The length of each prefix code is based on the frequency such that the most commonly occurring data has the smallest prefix code.

每个前缀码的长度基于频率,以使最常出现的数据具有最小的前缀码。

No two different data items would have the same prefix. Hence the prefixes are known as Prefix Codes. This is important to make sure decoded string is the same as the original string.

没有两个不同的数据项具有相同的前缀。 因此,前缀称为前缀代码 。 确保解码后的字符串与原始字符串相同非常重要。

The more often the data is used the lesser bits it takes up.

使用数据的频率越高,占用的比特就越少。

A Disadvantage of Huffman codes is that a minor change in any bit of the encoded string would break the whole message.

霍夫曼代码的缺点是,编码字符串的任何位的微小变化都会破坏整个消息。

The basic idea of the Huffman Algorithm is building a Tree from bottom up.

霍夫曼算法的基本思想是自下而上构建一棵树。

The steps involved in building the Huffman Tree are:

建立霍夫曼树的步骤包括:

  • Get Frequency of each character and store it in a Map.

    获取每个字符的频率并将其存储在地图中。
  • Create a node for each character with its frequency and insert it into a Min Priority Queue.

    为每个字符及其频率创建一个节点,并将其插入“最小优先级队列”。
  • Extract the two nodes with the minimum frequency from the priority queue.

    从优先级队列中提取频率最小的两个节点。
  • Create a new node by merging these two nodes as children and with weight equal to the sum of the two nodes frequencies.

    通过合并这两个节点作为子节点并权重等于两个节点频率之和来创建一个新节点。
  • Add this back to the Priority Queue.

    将此添加回优先级队列。
  • Repeat till the Priority Queue has only one node left. That node becomes the root of the Huffman Tree.

    重复直到“优先级队列”只剩下一个节点。 该节点成为霍夫曼树的根。

Once the tree is built, to find the prefix code of each character we traverse the tree as:

构建完树后,为了找到每个字符的前缀代码,我们遍历树为:

  • Starting at the top when you go left, append 0 to the prefix code string.

    左移时从顶部开始,将0附加到前缀代码字符串。
  • When you go right, append 1.

    右移时,附加1。
  • Stop when you have reached the Leaf nodes.

    到达Leaf节点后停止。
  • The string of 0 and 1s created till now is the prefix code of that particular Node in the tree.

    到现在为止创建的0和1的字符串是树中该特定Node的前缀代码。

During decoding, we just need to print the character of each leaf traversed by the above prefix code in the Huffman tree.

在解码期间,我们只需要在霍夫曼树中打印以上前缀代码遍历的每个叶子的字符。

All Input characters are present only in the leaves of the Huffman tree.
所有输入字符仅出现在霍夫曼树的叶子中。

An example of a Huffman tree is given below:

霍夫曼树的例子如下:

The string to be encoded needs the prefix codes for all the characters built in a bottom-up manner.

要编码的字符串需要以自下而上的方式构建的所有字符的前缀代码。

The internal node of any two Nodes should have a non-character set to it.

任何两个节点的内部节点都应设置一个非字符。

霍夫曼编码算法的实现 (Huffman Coding Algorithm Implementation)

The code for the HuffmanNode class is given below:

下面给出了HuffmanNode类的代码:

class HuffmanNode implements Comparable
{ int frequency; char data; HuffmanNode left, right; public int compareTo(HuffmanNode node) { return frequency - node.frequency; }}

It holds the character and its frequency.

它包含字符及其频率。

The comparable interface is implemented. This would be used when comparing the values in the PriorityQueue in order to get the minimum value always.

实现了可比较的接口。 在比较PriorityQueue中的值时始终使用该值,以便始终获得最小值。

The code for the HuffmanCodeSolution.java class is given below:

下面给出了HuffmanCodeSolution.java类的代码:

package com.journaldev.huffmancoding;import java.util.HashMap;import java.util.Map;import java.util.PriorityQueue;import java.util.Set;public class HuffmanCodeSolution {	private static Map
charPrefixHashMap = new HashMap<>(); static HuffmanNode root; public static void main(String[] args) { String test = "ABCD1%$^"; System.out.println("Original Text = "+test); String s = encode(test); decode(s); } private static HuffmanNode buildTree(Map
freq) { PriorityQueue
priorityQueue = new PriorityQueue<>(); Set
keySet = freq.keySet(); for (Character c : keySet) { HuffmanNode huffmanNode = new HuffmanNode(); huffmanNode.data = c; huffmanNode.frequency = freq.get(c); huffmanNode.left = null; huffmanNode.right = null; priorityQueue.offer(huffmanNode); } assert priorityQueue.size() > 0; while (priorityQueue.size() > 1) { HuffmanNode x = priorityQueue.peek(); priorityQueue.poll(); HuffmanNode y = priorityQueue.peek(); priorityQueue.poll(); HuffmanNode sum = new HuffmanNode(); sum.frequency = x.frequency + y.frequency; sum.data = '-'; sum.left = x; sum.right = y; root = sum; priorityQueue.offer(sum); } return priorityQueue.poll(); } private static void setPrefixCodes(HuffmanNode node, StringBuilder prefix) { if (node != null) { if (node.left == null && node.right == null) { charPrefixHashMap.put(node.data, prefix.toString()); } else { prefix.append('0'); setPrefixCodes(node.left, prefix); prefix.deleteCharAt(prefix.length() - 1); prefix.append('1'); setPrefixCodes(node.right, prefix); prefix.deleteCharAt(prefix.length() - 1); } } } private static String encode(String test) { Map
freq = new HashMap<>(); for (int i = 0; i < test.length(); i++) { if (!freq.containsKey(test.charAt(i))) { freq.put(test.charAt(i), 0); } freq.put(test.charAt(i), freq.get(test.charAt(i)) + 1); } System.out.println("Character Frequency Map = " + freq); root = buildTree(freq); setPrefixCodes(root, new StringBuilder()); System.out.println("Character Prefix Map = " + charPrefixHashMap); StringBuilder s = new StringBuilder(); for (int i = 0; i < test.length(); i++) { char c = test.charAt(i); s.append(charPrefixHashMap.get(c)); } return s.toString(); } private static void decode(String s) { StringBuilder stringBuilder = new StringBuilder(); HuffmanNode temp = root; System.out.println("Encoded: " + s); for (int i = 0; i < s.length(); i++) { int j = Integer.parseInt(String.valueOf(s.charAt(i))); if (j == 0) { temp = temp.left; if (temp.left == null && temp.right == null) { stringBuilder.append(temp.data); temp = root; } } if (j == 1) { temp = temp.right; if (temp.left == null && temp.right == null) { stringBuilder.append(temp.data); temp = root; } } } System.out.println("Decoded string is " + stringBuilder.toString()); }}class HuffmanNode implements Comparable
{ int frequency; char data; HuffmanNode left, right; public int compareTo(HuffmanNode node) { return frequency - node.frequency; }}

So our string to be compressed and encoded/decoded is “ABA11$A”.

因此,我们要压缩和编码/解码的字符串是“ ABA11 $ A”。

In encoding method we store the frequencies of each character in a map first.

encoding方法中,我们首先将每个字符的频率存储在地图中。

Then we call the buildTree method using the approach discussed before.

然后,我们使用前面讨论的方法调用buildTree方法。

We save the prefix code for each character in a HashMap inside the setPrefixCode method.

我们将每个字符的前缀代码保存在setPrefixCode方法内的HashMap中。

In the decode method, we just traverse the encoded string left or right based on 0 or 1. Once we reach the leaf, we append that leaf data to the StringBuilder and again start from the top for the remaining encoded string.

decode方法中,我们仅基于0或1向左或向右遍历编码的字符串。一旦到达叶子,我们会将叶子数据附加到StringBuilder,然后从顶部开始为其余的编码字符串。

Following is the output of the above program:

以下是上述程序的输出:

. 检出完整的代码以及更多DS和算法示例。

Reference:

参考:

翻译自:

霍夫曼算法

转载地址:http://uamzd.baihongyu.com/

你可能感兴趣的文章
两台电脑如何实现共享文件
查看>>
组合模式Composite
查看>>
程序员最想得到的十大证件,你最想得到哪个?
查看>>
我的第一篇CBBLOGS博客
查看>>
【MyBean调试笔记】接口的使用和清理
查看>>
07 js自定义函数
查看>>
jQueru中数据交换格式XML和JSON对比
查看>>
form表单序列化后的数据转json对象
查看>>
[PYTHON]一个简单的单元測试框架
查看>>
[BZOJ4303]数列
查看>>
一般处理程序在VS2012中打开问题
查看>>
C语言中的++和--
查看>>
thinkphp3.2.3入口文件详解
查看>>
POJ 1141 Brackets Sequence
查看>>
Ubuntu 18.04 root 使用ssh密钥远程登陆
查看>>
Servlet和JSP的异同。
查看>>
虚拟机centOs Linux与Windows之间的文件传输
查看>>
ethereum(以太坊)(二)--合约中属性和行为的访问权限
查看>>
IOS内存管理
查看>>
middle
查看>>