大数实现闲谈：无符号大整数的实现与优化

毕竟是大整数，标题图就放个大点的罢

和之前的post一样：

想着网上博客要么是数学公式，要么是什么离散数学，代码又（对我来说）特别难理解，所以就写了这篇post，希望能帮助到什么有缘人。
毕竟我粗人一个，看不懂什么公式也不喜欢搞一堆乱七八糟的位操作。

大概是这样一回事：51放假比较闲，想写有关浮点转字符串的相关内容，但是发现大数是其基础，所以闲着也是闲着，写就对了。
这篇blog会基于elc@911b1c2f的大数实现进行讲解。

代码路径在parts/header_file/files/elc/_files/bignum/bignum

咱们看看_body.hpp里面的内容：

#include "ubigint.hpp"
#include "bigint.hpp"
#include "ubigfloat.hpp"
#include "bigfloat.hpp"
#include "literal.hpp"

不难猜到：bigint以及其他的类都是基于ubigint实现
所以我们先介绍ubigint的实现

实现本质

ubigint的定义本质实际上是

class ubigint{
public:
	#if in_debug && !defined(ELC_SPEED_TEST)
		typedef unsigned char base_type;
	#else
		typedef unsigned_specific_size_fast_t<sizeof(uintmax_t)/2> base_type;
	#endif
private:
	typedef array_t<base_type> data_type;

	static constexpr auto base_type_mod=number_of_possible_values_per<base_type>;

	data_type _data;
};

如你所见，ubigint的本质是一个array_t<base_type>，base_type是unsigned char或者unsigned_specific_size_fast_t<sizeof(uintmax_t)/2>，这基于具体的编译情况而定
就我所知，无损大整数的实现一般都是模拟小学必教的多位数加减乘除，ubigint的实现也是如此
首先让我们看看base_type的选择问题：
在debug环境下编译时，base_type首选是unsigned char：其数值范围小，可以在有限的数值下经历更多的处理，便于bug的发现
在release环境下或涉及到速度的测试时，base_type首选是unsigned_specific_size_fast_t<sizeof(uintmax_t)/2>：其数值范围大，可以在有限的数值下经历更少的处理，便于提高速度：

大整数实现常识：base_type的数值范围越大，大整数的运算速度越快
但是你必须考虑到乘法实现时运算类型必须是base_type的2倍大小，所以至少你不能选取uintmax_t作为base_type

base_type的数值范围是[0,base_type_mod)，base_type_mod是base_type的可能取值的个数（即base_type的最大值加1）
比方说debug环境下（假设CHAR_BIT是8，并且环境是2进制计算机），base_type的数值范围是[0,256)，base_type_mod是256，那么ubigint实际上是256进制的数组
顺便一提，array_t是elc的变长数组实现，不是定长的，别误会了

低位在前还是高位在前？

我们假设_data有两个元素[1 2]。
这时你有两种解释思路：若低位在前，那么这个数代表2*base_type_mod+1；若高位在前，那么这个数代表1*base_type_mod+2。
这两种思路都是可以的，但是我们选择低位在前，尽管这有些反直觉，但是加减、比较时低位比高位更常被访问，所以低位在前更快

构造函数

在我们确定了高低位顺序后，我们就可以开始写构造函数了：

template<unsigned_basic_integer_type T>
static constexpr size_t base_type_size_of=ceil_div(sizeof(T),sizeof(base_type));

template<unsigned_basic_integer_type T>
ubigint(T value)noexcept{
	constexpr auto size = base_type_size_of<T>;
	_data.resize(size);
	auto i=_data.begin();
	while(value){
		const auto info=divmod(value,base_type_mod);
		*i=base_type(info.mod);
		value=T(info.quot);
		i++;
	}
	const auto used_size=i-_data.begin();
	if(used_size!=size)
		_data.resize(used_size);
}

你可以看到ubigint的构造函数实际上异常简单：

先在编译时计算好T类型的值需要多少个base_type来表示，然后resize到这个大小
然后不断地取模取商以初始化_data，直到value为0
最后resize到value实际使用的大小

ceil_div是向上取整的除法，ceil_div(a,b)等价于(a/b)+((a%b)!=0)
比方说ceil_div(5,2)是3，ceil_div(4,2)是2

divmod是取模取商的函数，它返回有两个成员mod和quot的struct，分别代表取模和取商的结果

这两都是elc的私货（定义在math），你可以自己实现ceil_div和divmod
也可以使用std::div代替divmod

shrink?

与[1 2]相同，[1 2 0 0 0]也代表2*base_type_mod+1，考虑到我们可能在加法乘法时预先分配足够的空间，我们是否该允许_data的大小比实际使用的大小更大？
答案是否定的。
shrink（移除多余的0）的意义不仅仅是节省空间：在同样的值所使用的size相同的情况下，很多逻辑可以直接通过读取_data.size()来判断，而不需要遍历_data，这可以节约不少时间

static void shrink_to_fit(data_type&a)noexcept{
	auto size=a.size();
	while(size--)
		if(a[size]!=0){
			a.resize(size+1);
			return;
		}
	a.clear();
}

加减乘除

只要你上过小学你就能理解这些函数的实现，本质上就是竖式的加减乘除
唯一不同的点是现在是base_type_mod进制，而不是10进制

让我们先看一看加减乘除中都要用到的一些东西：

typedef array_like_view_t<const base_type> data_view_type;
typedef unsigned_specific_size_fast_t<2*sizeof(base_type)> calc_type;

仍然，array_like_view_t和unsigned_specific_size_fast_t是elc的私货，你可以自己实现差不多的东西

array_like_view_t保留一个T*和一个size_t，并提供begin()和end()函数，使得你可以像使用string_view一样使用它
它的好处是极度低廉的构造和传递代价，之后的乘法优化中它会大显身手

unsigned_specific_size_fast_t是一个类型模板，它根据你传入的size_t的大小来得到一个至少有给定大小的快速无符号整数类型
我们需要保证calc_type至少有2*sizeof(base_type)的大小
因为我们在乘法中会用到calc_type来存储两个base_type的乘积

加法

首先我们需要一套体系来判断应当预先分配多少个base_type给结果：

[[nodiscard]]static size_t get_safety_add_buf_size_diff(data_view_type a,data_view_type b)noexcept{
	//判断进位所需空间
	if(a.size()!=b.size()){
		auto i = a.size();
		while(i-- != b.size())//判断进位区是否没有足够的空间以至于需要进位
			if(a[i]!=max(type_info<base_type>))//任意一位不是最大值就不需要进位
				return 0;
		return 1;
	}
	else{
		//只需要判断最高位是否需要进位
		const auto res=calc_type(a.back())+calc_type(b.back())+1;//+1是因为次高位可能进位
		return static_cast<size_t>(res>>bitnum_of(base_type));
	}
}
[[nodiscard]]static size_t get_safety_add_buf_size(data_view_type a,data_view_type b)noexcept{
	return a.size()+get_safety_add_buf_size_diff(a,b);
}
[[nodiscard]]static size_t get_safety_add_buf_size_with_not_compared_buf(data_view_type a,data_view_type b)noexcept{
	if(a.size()<b.size())
		swap(a,b);
	return get_safety_add_buf_size(a,b);
}

get_safety_add_buf_size_diff返回a+b可能所需的base_type个数与a现有的base_type个数的差（其实就是进位可能的判断）
get_safety_add_buf_size返回a+b可能所需的base_type个数
get_safety_add_buf_size_with_not_compared_buf返回a+b可能所需的base_type个数，但是不会比较a和b的大小

随后我们就可以实现加法了：

[[nodiscard]]static data_type add_base(data_view_type a,data_view_type b)noexcept{
	if(a.size()<b.size())
		swap(a,b);
	auto base_size = a.size();
	const auto size_diff = get_safety_add_buf_size_diff(a,b);
	const auto size = base_size+size_diff;

	array_t<base_type> tmp(note::size(size));
	copy_assign[base_size](tmp.data(),a.data());
	copy_assign[size_diff](note::to(tmp.data()+base_size),base_type{0});
	add_to_base(tmp.data(),b);
	if(size_diff)
		shrink_to_fit(tmp);
	return tmp;
}
static void add_to_base(base_type*buf,data_view_type b)noexcept{
	bool is_overflows = 0;
	const auto end_size = b.size();
	auto other_begin = b.begin();

	for(size_t i=0;i<end_size;++i)
		*buf++ = add_carry(*buf,*other_begin++,is_overflows);
	while(is_overflows)
		*buf++ = add_carry(*buf,is_overflows);
}

add_base分配足够的空间，然后调用add_to_base来实现加法
add_to_base自低位向高位使用add_carry实现带进位的加法

add_carry的实现在此处
简单来说，它接受1-2个任意类型的无符号整数和一个bool的引用，并用尽可能快速的方法将所有的参数相加并感知进位
如果发生了溢出（即应当进位）它会将bool的引用置为1，否则置为0
最后返回一个无符号整数代表所有参数相加的结果

减法

减法的实现和加法大同小异，但是不用额外计算需要分配的空间

[[nodiscard]]static data_type sub_base(data_view_type a,data_view_type b)noexcept{
	//调用方保证a>=b
	const auto size = a.size();

	array_t<base_type> tmp(note::size(size));
	copy_assign[size](tmp.data(),a.data());
	sub_with_base(tmp,b);//already shrink_to_fit ed
	return tmp;
}
static void sub_with_base(base_type*buf,data_view_type b)noexcept{
	//调用方保证a>=b
	bool is_overflows = 0;
	const auto end_size = b.size();
	auto other_begin = b.begin();

	for(size_t i=0;i<end_size;++i)
		*buf++ = sub_borrow(*buf,*other_begin++,is_overflows);
	while(is_overflows)
		*buf++ = sub_borrow(*buf,is_overflows);
}
static void sub_with_base(data_type&buf,data_view_type b)noexcept{
	sub_with_base(buf.data(),b);
	shrink_to_fit(buf);
}

sub_with_base自低位向高位使用sub_borrow实现带借位的减法
sub_base创建一个副本，然后调用sub_with_base实现减法

sub_borrow和add_carry一样是elc的私货，其定义在相同的文件

乘法

在实现乘法前，让我们先考虑下末尾的0：
正常人计算3000000*5000000时，会先将两数的末尾的0去掉，然后计算3*5，最后再补上末尾的0
大数也是同理，可以用此方法加速计算

//shrink_of_end_zeros
//去掉（数理上）末尾的（实现上）开头的0以减少乘法的次数
[[nodiscard]]static size_t shrink_of_end_zeros(data_view_type&buf)noexcept{
	if(buf.empty())
		return 0;
	auto begin=buf.begin();
	const auto end=buf.end();
	while(begin!=end && !*begin)
		++begin;
	const size_t aret=begin-buf.begin();
	const size_t size=end-begin;
	buf=get_data_view_of_data(begin,size);
	return aret;
}
//unshrink_of_end_zeros
//就是他妈的撤销
[[nodiscard]]static data_view_type unshrink_of_end_zeros(data_view_type a,size_t zeros)noexcept{
	return get_data_view_of_data(a.begin()-zeros,a.size()+zeros);
}
//apply_shrink_of_end_zeros
//对于data_type和data_view_type应用已经获得的zeros大小进行shrink
static void apply_shrink_of_end_zeros(data_type&buf,size_t zeros)noexcept{
	if(zeros)
		buf.forward_resize(buf.size()-zeros);
}
static void apply_shrink_of_end_zeros(data_view_type&buf,size_t zeros)noexcept{
	if(zeros)
		buf=get_data_view_of_data(buf.begin()+zeros,buf.size()-zeros);
}

这里面唯一要注意的是forward_resize，它是array_t的成员函数，用于向前调整容器的大小
比方说，array_t的大小是7，现在调用forward_resize(5)，那么容器的前2个元素会被丢弃，后续元素向前移动2个位置
再比方说，array_t的大小是7，现在调用forward_resize(9)，那么容器的前端添加2个元素，后续元素向后移动2个位置
据我所知没有任何标准库容器有这个功能，但是在这里你可以用erase实现取代它

然后我们来实现乘法

static void muti_with_base_no_zero_check(base_type*buf,data_view_type a,base_type b)noexcept{
	size_t i=0;
	calc_type num=0;
	while(i!=a.size()){
		num+=calc_type(a[i])*calc_type(b);
		*buf = base_type(num%base_type_mod);
		num/=base_type_mod;

		i++;
		buf++;
	}
	if(num)
		*buf = base_type(num);
	else
		*buf = base_type{0};
}
static void muti_with_base(base_type*buf,data_view_type a,data_view_type b)noexcept{
	{
		const auto zeros=shrink_of_end_zeros(a)+shrink_of_end_zeros(b);
		copy_assign[zeros](note::to(buf),base_type{0});
		buf+=zeros;
	}
	array_t<base_type> tmp(note::size(a.size()+1));
	size_t muti_scale=0;
	while(muti_scale!=b.size()){
		if(b[muti_scale]){
			muti_with_base_no_zero_check(tmp.data(),a,b[muti_scale]);
			add_to_base(buf+muti_scale,get_shrinked_data_view_of_data(tmp));
		}
		muti_scale++;
	}
}
[[nodiscard]]static data_type muti_base(data_view_type a,data_view_type b)noexcept{
	if(a.size()<b.size())swap(a,b);//大数在前循环数小
	array_t<base_type> tmp(note::size(a.size()+b.size()),0);
	muti_with_base(tmp.data(),a,b);
	shrink_to_fit(tmp);
	return tmp;
}

在上述代码中，muti_with_base_no_zero_check实现了不检查末尾0的单行乘法
muti_with_base实现了多行乘法，并且在计算前会先去掉末尾的0
muti_base则为muti_with_base提前分配了空间，随后调用muti_with_base实现乘法

乘法的优化

在实现乘法后，我们可以考虑一下优化：
elc只实现了Karatsuba分治法，对于更大的数的ntt没有实现
对于Karatsuba分治法，参考熊佬的介绍：

我们假设要相乘的两个数，都有2n位，那么这两个数就可以分别表示为$a_1base^n+a_2, b_1base^n+b_2$，其中，$a_1,a_2,b_1,b_2$是n位的大整数，那么，它们的积就是
\[\begin{align} (a_1base^n+a_2) \times (b_1base^n+b_2) \\ = a_1b_1base^{2n} + (a_1b_2+a_2b_1)base^n + a_2b_2 \\ = a_1b_1base^{2n} + ((a_1+a_2)(b_1+b_2)-a_1b_1-a_2b_2)base^n + a_2b_2 \end{align}\]
如果这样不够明显的话，我们用$c_1$代替$a_1b_1$，用$c_3$代替$a_2b_2$，得到

$c_1base^{2n} + ((a_1+a_2)(b_1+b_2)-c_1-c_3)base^n + c_3$

于是，这里一共有3次乘法，比起原来的4次暴力乘法减少了1次。而里面的乘法又可以进行递归优化，时间复杂度从$O(n^2)$下降到$O(n^{log_23})$约$O(n^{1.585})$

和熊佬的实现大同小异

移位加法实现

/// 将b向前偏移offset个base_type后加到buf上，等价于buf+=b<<offset*bitnum_of(base_type)
/// 用于乘法优化
static void offset_add_to_base(base_type*buf,const data_type&b,size_t offset)noexcept{
	add_to_base(buf+offset,get_data_view_of_data(b));
}
static void offset_add_to_base(data_type&buf,data_view_type b,size_t offset)noexcept{
	//检查是否需要扩容
	if(buf.size()<=offset+b.size()){
		const auto size_now = buf.size();
		const auto size_need = offset+b.size()+1;//考虑进位
		const auto size_diff = size_need - size_now;
		buf.insert(size_now,size_diff,base_type{0});//扩容&填充0
	}
	offset_add_to_base(buf.data(),b,offset);
	shrink_to_fit(buf);
}
auto& offset_add_to_base(data_view_type b,size_t offset)noexcept{
	offset_add_to_base(_data,b,offset);
	return*this;
}

分治乘法实现

//分割乘法以提高效率
[[nodiscard]]static data_type fast_muti_base(data_view_type a,data_view_type b)noexcept{
	constexpr auto fast_muti_base_threshold=1<<6;//对小于1000的随机数阶乘测试后选择此数
	if(min(a.size(),b.size())<fast_muti_base_threshold)
		return muti_base(a,b);
	//计算分割点
	const auto split_point=max(
		min((a.size()+1)/2,b.size()-1),
		min(a.size()-1,(b.size()+1)/2)
	);
	//拆成4个数
	const auto a_split_point=a.data()+split_point;
	const data_view_type a_low=get_shrinked_data_view_of_data(a.data(),split_point);
	const data_view_type a_high{a_split_point,a.size()-split_point};
	const auto b_split_point=b.data()+split_point;
	const data_view_type b_low=get_shrinked_data_view_of_data(b.data(),split_point);
	const data_view_type b_high{b_split_point,b.size()-split_point};
	//计算结果
	ubigint high{fast_muti_base(a_high,b_high)};
	ubigint low{fast_muti_base(a_low,b_low)};
	ubigint middle{fast_muti_base(add_base(a_high,a_low),add_base(b_high,b_low))};
	//合并结果
	middle -= high+low;
	return move(low.offset_add_to_base(high,split_point*2).
					offset_add_to_base(middle,split_point)._data);
}

比起熊佬在blog中的实现，我这里的实现在a_X和b_X上使用了data_view_type避免不必要的内存复制，提高效率
其他的基本一致

除法

让我们先来点针对不需要余数时的除法的优化：
考虑10000465468721除以1000000000，我们可以直接进行如下操作化简两个数：

获取除数的末尾0的个数
将除数和被除数右移相应的位数

同样的优化方式还有“被除数不及两倍除数长度减2时”（来自熊佬博客）
这样的优化可以些微的提高效率

//除法优化：在不需要余数的情况下可以进行的优化
[[nodiscard]]static size_t get_no_mod_optimisation_size(data_view_type a,data_view_type b)noexcept{
	return no_mod_optimisation_of_div(a,b);
}
static size_t no_mod_optimisation_of_div(data_view_type&a,data_view_type&b)noexcept{
	size_t aret=0;
	//去除末尾0
	{
		const auto zeros=shrink_of_end_zeros(b);
		apply_shrink_of_end_zeros(a,zeros);
		aret+=zeros;
	}
	//被除数不及两倍除数长度减2时，可以忽略一部分最低位且不影响结果
	if ((b.size()-1)*2 > a.size()) {
		const auto ans_len = a.size() - b.size() + 2;
		const auto shr = b.size() - ans_len;
		a=a.subview(shr);
		b=b.subview(shr);
		aret+=shr;
		aret+=no_mod_optimisation_of_div(a,b);
	}
	return aret;
}

然后让我们实现一个简单的除法：

//除法实现
[[nodiscard]]static base_type div_with_base_no_optimisation(data_type&buf,base_type*a,data_view_type b)noexcept{
	data_view_type tryto{a,b.size()+1};
	const calc_type dividend=exlambda{
		const base_type*p=tryto.rbegin();
		auto tmp=calc_type(*p);tmp*=base_type_mod;tmp+=calc_type(p[-1]);
		return tmp;
	}();
	const calc_type divisor=calc_type(b.back());
	calc_type left=dividend/(divisor+1);
	calc_type right=calc_type(dividend/divisor);
	right=min(right,(calc_type)max(type_info<base_type>));
	if(right==0)return 0;//a/b<=right==0
	base_type last_work_able=0;
	//left<=a/b<=right
	tryto=get_shrinked_data_view_of_data(tryto.data(),tryto.size());
	while(left<=right) {
		const calc_type test=(left+right)/2;//二分法
		muti_with_base(buf.data(),b,base_type(test));
		const auto myview=get_shrinked_data_view_of_data(buf);
		const auto cmp=compare(tryto,myview);
		if(cmp>=0)
			last_work_able=base_type(test);
		if(cmp>0){//tryto>myview：测试值太小或合适
			left=test+1;
			if(!base_type(left))//溢出了，test是最大的可用值
				break;
		}
		elseif(cmp<0)//tryto<myview：测试值太大
			right=test-1;
		else//tryto==myview：测试值合适
			break;
	}
	if(last_work_able==0)return 0;
	muti_with_base(buf.data(),b,last_work_able);
	sub_with_base(a,get_shrinked_data_view_of_data(buf));
	return last_work_able;
}
[[nodiscard]]static base_type div_with_base(data_type&buf,base_type*a,data_view_type b)noexcept{
	return div_with_base_no_optimisation(buf,a,b);
}
[[nodiscard]]static base_type div_base(base_type*a,data_view_type b)noexcept{
	array_t<base_type> fortry(note::size(b.size()+1));
	return div_with_base(fortry,a,b);
}

这是一个简单的单步除法，它接受长度有b.size()+1的base_type数组a，和长度为b.size()的data_view_type数组b
它通过头两位计算出该步商的left和right，然后通过二分法找到最大的可用值（真正的商）
并且修改a，使得a的值为a-b*result
至于buf，它是一个用于存储中间结果的数组，长度至少应该为b.size()+1，为了避免重复分配内存，这里将其作为参数传入

有了单步的除法，我们就可以搭配循环来实现完整的除法了：

static data_type div_with_base_no_optimisation_impl(data_type&a,data_view_type b)noexcept{
	array_t<base_type> tmp(note::size(a.size()-b.size()),0);
	const auto end=a.rend();
	auto begin=a.rbegin()+b.size();
	auto tmpwritter=tmp.rbegin();
	array_t<base_type> fortry(note::size(b.size()+1));
	while(begin!=end){
		auto result=div_with_base(fortry,begin,b);
		*tmpwritter=result;
		tmpwritter++;
		begin++;
	};
	shrink_to_fit(tmp);
	return tmp;
}
static data_type div_with_base_no_optimisation(data_type&a,data_view_type b)noexcept{
	a.push_back(0);
	return div_with_base_no_optimisation_impl(a,b);
}
static data_type div_with_base(data_type&a,data_view_type b)noexcept{
	no_mod_optimisation_of_div(a,b);
	return div_with_base_no_optimisation(a,b);
}
[[nodiscard]]static data_type div_base_no_optimisation(data_view_type a,data_view_type b)noexcept{
	array_t<base_type> tmp(note::size(a.size()+1));
	copy_assign[a.size()](tmp.data(), a.data());
	tmp.back()=0;
	tmp=div_with_base_no_optimisation_impl(tmp,b);
	return tmp;
}
[[nodiscard]]static data_type div_base(data_view_type a,data_view_type b)noexcept{
	no_mod_optimisation_of_div(a,b);
	return div_base_no_optimisation(a,b);
}

这里的div_with_base_no_optimisation_impl是一个健全的除法，对于a的每一位调用单步除法，并将结果写入tmp。
最后剩余的a就是余数，tmp就是商。
由于div_with_base_no_optimisation_impl期待a以0（数理上）开头|（实现上）结尾，所以div_with_base_no_optimisation作为其简易封装，会在调用div_with_base_no_optimisation_impl之前在a的末尾添加一个0。
而div_with_base则附带了我们先前的“不要余数时可以进行的优化”，在调用div_with_base_no_optimisation之前会先调用no_mod_optimisation_of_div优化参数以提高效率。

除法优化？

首先要说的一点是：二分法是仍然有优化空间的。
如果你的base_type满足bitnum_of(base_type)*2<=float_info::precision_base_bit<T>（或者你并不是满位实现大数），那么你可以使用T实现和乘法复杂度相同的除法（详情见熊佬博客）

其次，基于除法的分治优化需要等实现高速divmod时顺便实现，而牛顿迭代等针对更大的数的优化则参见前一篇post
此处让我们先继续实现取模。

取模

取模的实现和除法的实现非常相似，只是我们不能使用“不要余数时可以进行的优化”，然后最后返回的是a而不是tmp。

static void mod_with_base_impl(data_type&a,data_view_type b)noexcept{
	const auto end=a.rend();
	auto begin=a.rbegin()+b.size();
	array_t<base_type> fortry(note::size(b.size()+1));
	while(begin!=end){
		discard(div_with_base_no_optimisation(fortry,begin,b));
		begin++;
	};
}
static void mod_with_base(data_type&a,data_view_type b)noexcept{
	a.push_back(0);
	mod_with_base_impl(a,b);
	shrink_to_fit(a);
}
[[nodiscard]]static data_type mod_base(data_view_type a,data_view_type b)noexcept{
	array_t<base_type> tmp(note::size(a.size()+1));
	copy_assign[a.size()](tmp.data(), a.data());
	tmp.back()=0;
	mod_with_base_impl(tmp,b);
	shrink_to_fit(tmp);
	return tmp;
}

没什么好讲的，看懂除法就能看懂取模。

`divmod`

现在我们终于可以来实现divmod了。
先声明返回类型。

template<class=void>
struct divmod_result_t_base{
	ubigint quot;
	ubigint mod;
};
typedef divmod_result_t_base<void> divmod_result_t;

then

[[nodiscard]]static divmod_result_t divmod_with_base(data_type&a,data_view_type b)noexcept{
	const auto opt_size=get_no_mod_optimisation_size(a,b);
	if(!opt_size){
		data_type quot = div_with_base_no_optimisation(a, b);
		return {ubigint{move(quot)},ubigint{move(a)}};
	}
	else{
		const auto a_view=get_data_view_of_data(a).subview(opt_size);
		const auto ori_b_view=b; b=b.subview(opt_size);
		data_type quot = div_base_no_optimisation(a_view, b);
		sub_with_base(a,fast_muti_base(quot,ori_b_view));//already shrink_to_fit ed
		return {ubigint{move(quot)},ubigint{move(a)}};
	}
}
[[nodiscard]]static divmod_result_t divmod_base(data_view_type a,data_view_type b)noexcept{
	array_t<base_type> tmp=a;
	return divmod_with_base(tmp,b);
}

divmod_with_base假设第一个参数是用不到的右值，省略一次复制和分配
divmod_base则是divmod_with_base的简易封装，用于处理左值。

divmod_with_base先调用get_no_mod_optimisation_size判断是否可以使用“不要余数时可以进行的优化”，如果可以则使用，并根据乘法和减法来间接计算模数
如果不可以则直接调用div_with_base_no_optimisation来计算商，余数存留在a中。

`div&mod`优化

重头戏来喽，我们来实现divmod的优化。
首先进行规则化的定义

//除法分治：规则化
[[nodiscard]]static auto fast_div_regularisation(data_view_type a,data_view_type b)noexcept{
	constexpr auto max_calc_type=min(max(type_info<calc_type>),base_type_mod*base_type_mod-1);
	const auto first2=(b.back()*base_type_mod+*(b.rbegin()+1)+1);
	const ubigint muti=max_calc_type/first2;
	const ubigint a_after_reg{fast_muti_base(a,muti)};
	const ubigint b_after_reg{fast_muti_base(b,muti)};
	struct result_t{
		ubigint a_after_reg;
		ubigint b_after_reg;
		ubigint muti;
	};
	return result_t{move(a_after_reg),move(b_after_reg),move(muti)};
}

规则化可以使得除法分治的试商次数降低
然后就是喜闻乐见的分治了

//除法分治：阈值
static constexpr auto fast_div_base_threshold=1<<5;
//除法分治：递归
[[nodiscard]]static divmod_result_t fast_divmod_base_impl(data_view_type a,data_view_type b)noexcept{
	if(compare(a,b)<0)return {ubigint{},ubigint{a}};
	if(a.size()<=fast_div_base_threshold)return divmod_base(a,b);
	size_t base = (b.size()+1) / 2;
	//符合3/2时，进行试商
	if(a.size() <= base*3) {
		base = b.size() / 2;
		auto a_high = a.subview(base);//不需要re_shrink：subview是舍弃低位，下同
		auto b_high = b.subview(base);
		//数值优化，这意味着余数不可用（下方的remain确实被舍弃了所以可以这样优化）
		no_mod_optimisation_of_div(a_high,b_high);
		auto result=fast_divmod_base_impl(a_high, b_high);
		auto&ans_base=result.quot._data;auto&remain=result.mod._data;
		remain=fast_muti_base(ans_base,b);
		while (compare(remain, a) > 0) {
			sub_with_base(remain, b);
			sub_one_from_base(ans_base);
		}
		remain=sub_base(a,remain);
		return result;
	}
	//不符合3/2时，进行递归
	//选择合适的base长度做分割
	if(a.size() > base*4)
		base = a.size() / 2;
	const auto a_high = a.subview(base);
	auto result=fast_divmod_base_impl(a_high, b);
	auto&ans_base=result.quot._data;auto&remain=result.mod._data;
	ans_base.insert(0,base,base_type{0});
	data_type m{note::size(base + remain.size())};
	copy_assign[base](note::from(a.data()),note::to(m.data()));
	copy_assign[remain.size()](add_const(remain.data()),m.data()+base);
	//这里不需要对m进行shrink，因为remain是已经shrink过的
	{
		auto another_result=fast_divmod_base_impl(m, b);
		add_to_base(ans_base, move(another_result.quot._data));
		remain=move(another_result.mod._data);
	}
	return result;
}

最后便是fast_divmod和fast_div、fast_mod、fast_mod_with的实现了

[[nodiscard]]static divmod_result_t fast_divmod_base(data_view_type a,data_view_type b)noexcept{
	if(a.size()<=fast_div_base_threshold)return divmod_base(a,b);
	const auto [a_after_reg,b_after_reg,muti]=fast_div_regularisation(a,b);
	auto result=fast_divmod_base_impl(a_after_reg,b_after_reg);
	result.mod/=muti;
	return result;
}
[[nodiscard]]static data_type fast_div_base(data_view_type a,data_view_type b)noexcept{
	if(a.size()<=fast_div_base_threshold)return div_base(a,b);
	no_mod_optimisation_of_div(a,b);
	const auto [a_after_reg,b_after_reg,muti]=fast_div_regularisation(a,b);
	return fast_divmod_base_impl(a_after_reg,b_after_reg).quot._data;
}
static void fast_mod_with_base(data_type&a,data_view_type b)noexcept{
	if(a.size()<=fast_div_base_threshold)return mod_with_base(a,b);
	const auto [a_after_reg,b_after_reg,muti]=fast_div_regularisation(a,b);
	auto result=fast_divmod_base_impl(a_after_reg,b_after_reg);
	result.mod/=muti;
	a=move(result.mod._data);
}
[[nodiscard]]static data_type fast_mod_base(data_view_type a,data_view_type b)noexcept{
	if(a.size()<=fast_div_base_threshold)return mod_base(a,b);
	const auto [a_after_reg,b_after_reg,muti]=fast_div_regularisation(a,b);
	auto result=fast_divmod_base_impl(a_after_reg,b_after_reg);
	result.mod/=muti;
	return move(result.mod._data);
}

这几个大同小异，看懂一个就能看懂其他的了

若小于阈值，调用基础版本
若返回值不需要余数，进行no_mod_optimisation_of_div
规则化
递归
若需要余数，除以规则化得到的muti

封装

封装的部分就是对外的接口了，各种operator和friend之类，我就不贴了
你可以自己翻

字符串转换

先让我们看看elc的to_string的整数实现

template<unsigned_integer_type T>//适用于任何c++无符号整数类型以及elc的ubigint
[[nodiscard]]string to_string_base(T num)const noexcept{
	string aret;
	const auto radix=_repres.get_radix();
	if constexpr(is_basic_type<T>){
		//基本类型有最大值，可以预分配足够的空间来提高效率
		constexpr auto info_threshold_base = pow(BIT_POSSIBILITY, bitnum_of(T));
		const auto info_threshold = to_size_t(ceil(log(info_threshold_base, radix)));
		aret.pre_alloc_before_begin(info_threshold);
	}
	if constexpr(is_big_type<T>){
		//大整数类型可以通过分治法来提高效率
		constexpr auto partition_method_threshold=max(type_info<size_t>);
		if(num>partition_method_threshold){
			T base{radix};
			size_t len=1;//计算余数部分要补的前导0
			//计算分割点
			while(base.memory_usage()*3 < num.memory_usage()){
				len *= 2;
				base *= base;
			}
			//算出分割后的高位和低位
			auto result = divmod(num, base);
			auto&high = result.quot;
			auto&low = result.mod;
			return to_string(move(high)) + to_string(move(low)).pad_left(_repres.get_char(0), len);
		}
		else
			return to_string(num.convert_to<size_t>());
	}
	else{
		push_and_disable_msvc_warning(4244);
		do{//do-while是为了保证至少有一位"0"
			auto res=divmod(move(num),radix);
			const auto index=to_size_t(move(res.mod));
			const auto ch=_repres.get_char(index);
			aret.push_front(ch);
			num=move(res.quot);
		}while(num);
		pop_msvc_warning();
		return aret;
	}
}

没啥好说的，分治到size_t，然后再用size_t的版本不断divmod直到num为0

from_string_get的实现也大差不差

template<unsigned_integer_type T>
[[nodiscard]]T from_string_get_unsigneded(const string&str,convert_state_t&state)const noexcept{
	if constexpr(type_info<T> == type_info<bool>)
		return union_cast<bool>(from_string_get<unsigned_specific_size_t<sizeof(bool)>>(str,state));
	else{
		const auto radix=_repres.get_radix();
		if constexpr(is_big_type<T>){
			//大整数类型可以通过分治法来提高效率
			const auto partition_method_threshold=trunc(log(max(type_info<size_t>),radix));
			if(str.size()>partition_method_threshold){
				T base{radix};
				size_t len=1;
				//计算分割点
				while(len*3 < str.size()){
					len *= 2;
					base *= base;
				}
				const auto split_pos=str.size()-len;
				string high_str=str.substr(0,split_pos);
				string low_str=str.substr(split_pos);
				T high=from_string_get_unsigneded<T>(high_str,state);
				if(!state.success())
					return {};
				T low=from_string_get_unsigneded<T>(low_str,state);
				if(!state.success())
					return {};
				return move(high)*move(base)+move(low);
			}
			else
				return from_string_get_unsigneded<size_t>(str,state);
		}
		else{
			push_and_disable_msvc_warning(4267);
			T aret={};
			const auto end=str.end();
			for(auto i=str.begin();i!=end;++i){
				aret*=radix;
				const char_t ch=*i;
				if(_repres.is_valid_char(ch))
					aret+=_repres.get_index(ch);
				else{//error
					state.set_error();
					state.set_has_invalid_char();
					return {};
				}
			}
			return aret;
			pop_msvc_warning();
		}
	}
}

rand

大整数由于定义域在[0,inf)，所以rand对大整数只能支持给定范围内的随机数

首先定义gen_randbit_with_bitnum<T>

template<typename T> requires unsigned_integer_type<T>
[[nodiscard]]force_inline constexpr T gen_randbit_with_bitnum(size_t bitnum)noexcept{
	if constexpr(is_big_type<T>){
		typedef T::base_type base_type;
		constexpr size_t base_bitnum = bitnum_of(base_type);
		T aret{};
		while(bitnum>base_bitnum){
			bitnum-=base_bitnum;
			apply_basetype_to_head(aret,gen_randbit<base_type>());
		}
		apply_basetype_to_head(aret,gen_randbit_with_bitnum<base_type>(bitnum));
		return aret;
	}
	else
		return gen_randbit<T>()&((T{1}<<bitnum)-1);
}

用于生成指定位数的随机数

然后是between_integral_t<T>的默认做法：随机_diff同样位数的随机数直到小于_diff，然后加上_min

struct between_integral_t{
private:
	rand_seed_t& _seed;
	typedef to_unsigned_t<T> unsigned_T;
	T _min;
	unsigned_T _diff;
	size_t _bitnum;
public:
	constexpr between_integral_t(rand_seed_t&seed,T amin,T amax)noexcept:_seed(seed),_min(amin),_diff(amax-amin){
		_bitnum=get_bitnum(_diff);
	}
	[[nodiscard]]force_inline T operator()()const noexcept{return inclusive();}
	[[nodiscard]]force_inline operator T()const noexcept{return operator()();}
	[[nodiscard]]force_inline T exclusive()const noexcept{
		unsigned_T diff;
		do diff=_seed.gen_randbit_with_bitnum<unsigned_T>(_bitnum);while(diff>=_diff);
		return _min+diff;
	}
	[[nodiscard]]force_inline T inclusive()const noexcept{
		unsigned_T diff;
		do diff=_seed.gen_randbit_with_bitnum<unsigned_T>(_bitnum);while(diff>_diff);
		return _min+diff;
	}
};

结尾

写到这里大整数的实现就结束了，有了这些前提知识，介绍完其他大数实现后就可以开始介绍浮点输出该怎么做了
yysy，写blog让我回头又检视了一下代码，在可读性方面又改了不少，挺好的

摸了

若无特殊声明，本作品网页、文字、程序部分与原本无许可部分采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可，以外的部分遵循其原有许可协议。
若您于许可外使用本作品任一部分，其作者有权依法追究您一切民事与刑事责任。

大数实现闲谈：无符号大整数的实现与优化

摸了

实现本质

低位在前还是高位在前？

构造函数

shrink?

加减乘除

加法

减法

乘法

乘法的优化

除法

除法优化？

取模

`divmod`

`div&mod`优化

封装

字符串转换

rand

结尾

CATALOG

FEATURED TAGS

FRIENDS

实现本质

低位在前还是高位在前？

构造函数

shrink?

加减乘除

加法

减法

乘法

乘法的优化

除法

除法优化？

取模

divmod

div&mod优化

封装

字符串转换

rand

结尾

CATALOG

FEATURED TAGS

FRIENDS

`divmod`

`div&mod`优化