Inference of plasma parameters from fixed-bias multi-needle Langmuir probes (m-NLP)

New approaches are presented to infer plasma densities and satellite floating potentials from currents collected with fixed-bias multi-needle Langmuir probes (m-NLP). Using synthetic data obtained from kinetic simulations, comparisons are made with inference techniques developed in previous studies and, in each case, model skills are assessed by comparing their predictions with known values in the synthetic data set. The new approaches presented rely on a combination of an approximate analytic scaling law for the current collected as a function of bias voltage, and multivariate regression. Radial basis function regression (RBF) is also applied to Jacobsen et al’s procedure (2010 Meas. Sci. Technol. 21 085902) to infer plasma density, and shown to improve its accuracy. The direct use of RBF to infer plasma density is found to provide the best accuracy, while a combination of analytic scaling laws with RBF is found to give the best predictions of a satellite floating potential. In addition, a proof-of-concept experimental study has been conducted using m-NLP data, collected from the Visions-2 sounding rocket mission, to infer electron densities through a direct application of RBF. It is shown that RBF is not only a viable option to infer electron densities, but has the potential to provide results that are more accurate than current methods, providing a path towards the further use of regression-based techniques to infer space plasma parameters.


Introduction
Our reliance on space technology requires good first principle understanding of the complex dynamics occurring in our near space environment. Space weather events can affect * Author to whom any correspondence should be addressed.
Original Content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. communications, remote sensing, and scientific satellites in orbit, as well as large power grids and pipelines at Earth surface. Monitoring the state of our space environment is the most basic requirement for understanding and developing reliable interpretive and predictive models. Among the many parameters characterizing the state of a plasma, the density, temperature, and plasma flow velocity are the most fundamental, as they are always affected by changes in our environment, and consequently, they can serve as proxies to infer the state of the ionosphere and magnetosphere. Many instruments have been developed over the years, to measure these parameters. Ionospheric plasmas can be monitored remotely with ground-based instruments such as incoherent scatter radars (ISR) and ionosondes [26,27] but such measurements lack the spatial and temporal resolution of in situ measurements made with instruments mounted on rockets or satellites. Ground based instruments are generally not mobile, which limits their use to specific regions of space. In comparison, in situ measurements made with satellites provide a broader coverage, and higher spatial and temporal resolutions. Among those, Langmuir probes have been the instruments of choice in labs and in space, because of their relative simplicity and the many theories developed to describe their interaction with plasma. Langmuir probes are typically operated in sweep mode, where a bias voltage with respect to a ground is varied periodically between negative, and positive values, resulting respectively in ions and electrons being collected. Parameters such as density or temperature can then be inferred from probe characteristics (current as a function of applied voltage), based on convenient analytic inference procedures. In this study we consider the use of fixed-bias multi-Needle Langmuir probes (m-NLP), first proposed by Jacobsen et al to infer plasma density. Many articles have been written on the inference of plasma parameters from Langmuir probe characteristics. Over the years, several different probe geometries have been considered theoretically [1,24,25,41] and experimentally [4,13,19], for plasma in different regimes. The approach considered here is to use multi-needle Langmuir probes (m-NLP) with fixed-bias voltages with respect to a spacecraft, to infer the density, and possibly other physical parameters. Compared to sweep-voltage probes, the advantage of fixed-bias Langmuir probes is that they can provide a much higher temporal and, owing to the high speed of spacecraft, spatial resolution. Assuming a sufficiently long and thin probe, the orbital-motion-limited (OML) approximation for probe current collection, implies that the square of the current collected by such a probe should vary linearly with the square of the density times the probe voltage. This led Jacobsen et al to propose fixed positively biased multineedle Langmuir probes as a means of measuring plasma density independently of the temperature or the satellite floating potential. This approach was justified by the fact that electron thermal speeds are much larger than combined satellite speeds in low Earth orbit (LEO), and ionospheric winds. As a result, electrons appear as stationary in a satellite rest frame, and the current collected by a long, positive cylindrical probe is approximated as with β = 0.5, and where e, n e , T e and m e are the elementary charge, the plasma density, the electron temperature and mass, A is the surface area of the probe, k is the Boltzmann constant, V b is the probe bias voltage, V f is the spacecraft floating potential, and V f + V b is the probe potential with respect to the background plasma. In equation (1), and in what follows, I is the absolute values of electron collected current. From this expression, it readily follows that I 2 varies linearly with the bias voltage, with a slope that is independent of the temperature. As a result, the slope K of I 2 as a function of V b , obtained from two or more currents from probes biased to different voltages, can be used to infer the density with This result has motivated laboratory and rocket experiments, and it led to the use of m-NLPs on several of CubeSats deployed in the QB50 program [12,37], and to the use of such an array of probes on the larger NorSat-1 satellite [14]. One point to keep in mind when considering equations (1) and (2) however, is that OML equations are derived under somewhat stringent simplifications. In particular it is assumed that (a) the probe radius is much smaller than the plasma Debye length, (b) plasma is unmagnetized, (c) the background plasma velocity distribution function is Maxwellian, (d) plasma flow speed is negligible compared to the thermal speed of the species being collected, and (e) the probe is much longer than the Debye length in order for end effects to be negligible. For probe diameters of order 0.5 mm or less, assumption (a) is satisfied.
The thermal gyroradius of electrons in the ionosphere being of order 1 cm or more, (b) is also satisfied. Frequent collisions with neutrals cause electron velocity distributions to be nearly Maxwellian at low and mid latitudes, where (c) is satisfied. As for electron thermal speeds, they are typically more than an order of magnitude larger than low Earth orbit satellite speeds, so that (d) is well satisfied as well. The problem with equation (2) stems from the use of the OML theory, in which probe lengths are assumed to be sufficiently long for end effects to be negligible. Indeed as demonstrated experimentally [19,41], and theoretically [22,23,33], probe lengths have to be much larger than the Debye length, by factors ranging between several tens, to hundreds, for the OML approximation to be applicable with β = 0.5. This condition, however, is generally not satisfied with needle probes mounted on CubeSats or NorSat-1 in ionospheric plasma. As a result, while equation (1) can still provide a good scaling law for the collected current as a function of voltage, the value of β is no longer 0.5, and equation (2) no longer provides an accurate estimate of the density. For finite-length probes, in which end effects contribute to the collected current, β is found experimentally and theoretically to range between 0.5 and 1.0, and the technique used to infer the density must be modified accordingly.
In response to this predicament, two solutions were proposed by Hoang et al [16], and Barjatya et al [2], in order to improve the inference of the plasma density, while accounting for the fact that β may be different from 0.5. In both studies, the scaling law given in equation (1) is assumed, but β is now treated as a parameter to be determined. In their approach, Barjatya et al use a nonlinear fit to determine the unknown parameters n e , V f , T e , and β appearing in equation (1), in order to match the currents collected by the four probes in the m-NLP. Similar fits had previously been used by Barjatya et al to determine β, the floating potential, and the electron density, from the Floating Potential Measurement Unit attached to the International Space Station [3]. While the determination of four parameters from four independent measurements is possible in principle, it was found that the determination of the temperature could not be made accurately in this straightforward approach. Noting that the inference of other parameters was relatively insensitive to T e , the solution proposed consisted of specifying an approximate value for the temperature, and then using a nonlinear fit to determine the remaining three parameters from the currents collected by the probes with the three largest bias voltages. This was justified by the fact that, based on synthetic data generated with equation (1), assuming a range of β values between 0.5 and 0.65, significantly more accurate inferences of the density were made than with Jacobsen's original technique (that is, assuming β = 0.5) even if the temperature used in the nonlinear fit was varied by ±100% relative the actual temperature used to generate the data set.
Alternatively, Hoang et al assessed two approaches; the first one being based on Jacobsen's original least squares linear fit to multiple collected currents assuming β = 0.5, and a second one, based on different implementations of nonlinear fits, similar to those considered in [2]. With nonlinear fits, four or three currents with different relative weights are considered to determine β, V f , and n e . As in [2], it was noted that nonlinear fits were relatively insensitive to the temperatures assumed in equation (1). Temperature values are nonetheless needed in order to perform nonlinear fits and obtain good accuracy for the inferred parameters. In space, it was suggested that temperature estimates be obtained from the International Reference Ionosphere (IRI) model [5] or incoherent scatter radar measurements. The article also reports comparisons of inferred densities with those obtained from the IRI model, and independent measurements in a laboratory plasma [10]; both being deemed satisfactory, and constituting an improvement over estimates made with Jacobsen's original technique.
In the following we present yet two alternative approaches based on a combination of analytic approximation, and multivariate regression, for which inference skills are assessed, using a synthetic data set obtained from simulations, as well as from actual data from a rocket mission. While model validation should ideally be made with actual measurements, the advantage of synthetic data is that it enables the assessment of predictive models with known plasma densities, temperatures, and floating potentials. Constructing such data sets also avoids biasing the data set which would result, for example, from an assumed analytic expression for the scaling of current as a function of voltage. In the next section we explain how kinetic simulations were used to construct a synthetic data set, and present our two inference approaches. In section 3, inference skills are assessed for each of these two approaches, and compared with those of previous models. The application of radial basis function (RBF) regression to experimental data collected from the Visions-2 [39] sounding rocket is presented in section 4. Finally, a summary of our findings and concluding remarks are presented in section 5.

Methodology
In order to train and validate inference models as those from m-NLP, it is necessary to have data sets with low level (L1B in satellite data parlance) currents and associated plasma and satellite parameters (n e , T e , V f ). Ideally such a data set should be constructed from actual accurate measurements, crossvalidated with different instruments, but this is rarely possible in practice, owing to challenges in making such measurement in space or lab plasma [13,[40][41][42]. An alternative is to use synthetic data sets using analytic models [2,7], or computer simulations [31,33], from which precise values of collected currents and corresponding plasma parameters are known. Admittedly, computer simulations, while more accurate than analytic models, do not account for all processes at play in an experiment or in space. They nonetheless make it possible to construct self-consistent data bases from which inference models can be tested and predictive skills quantified. Data sets constructed from simulations are also free from bias which would result from using analytic expressions. This is particularly important, when these same expressions and scaling laws are used to construct and asses inference predictive models. Given data sets, the next step is then to construct and validate models capable of inferring plasma parameters from measured currents. These procedures are described in detail below.

Data sets
Two distinct synthetic data sets are constructed and used in our model skill assessments. The first one uses the threedimensional particle-in-cell (PIC) simulation code PTetra to simulate a needle probe in a flowing plasma, with velocity perpendicular to the probe axis. In PTetra, the simulation domain consists of an unstructured adaptive tetrahedral mesh in which Poisson's equation is solved at each time step using Saad's GMRES sparse matrix solver [36]. The validity of simulation results obtained with PTetra has been assessed in previous publications, in which results were compared with theory [28], and those obtained with independently developed computer models [8,29,30]. More information about the code can be found in [28,29]. The probes simulated have the same radius, r = 0.255 mm, as the ones on the QB50 CubeSats, but they are twice as long with a length L = 50 mm instead of 25 mm [17]. Longer probes would have the advantage of collecting more current with higher signal-to-noise ratio, while being less affected by end effects. The probe dimensions and plasma parameters used in the simulations are given in table 1. Variations in the plasma flow speed by 1 km s −1 in different directions relative to the probe, as well as different ion compositions have been considered in selected cases, which resulted in only minor effects on collected currents. This is why we limited our study to the parameters listed in table 1. Simulations are made for 25 combinations of densities and temperatures (five densities, each with five temperatures), and we consider four probes biased to 2, 3, 4 and 5 V with respect to the spacecraft. However, since the spacecraft floating potential varies, it is necessary to obtain collected currents for a Table 1. Probe dimensions and plasma environment conditions assumed in PTetra simulations. All probes are on the ram side of the satellite, and oriented perpendicularly to the ram direction.
in order to enable the calculation of collected current for arbitrary bias and floating voltages in the range 0 This analytic expression involves three adjustable parameters a, b, c, which are determined from the four or more currents obtained from simulations, and calculated with a nonlinear least squares fit, with differential evolution [9] as the optimization algorithm. Clearly, equation (3) is very similar to equation (1) found in the OML approximation for a probe of infinite length. The parameter b however has no counterpart in equation (1), and it was introduced to relax possible bias in the fits with the expression resulting from the OML approximation, and assumed in some of the inference approaches considered below. This parameter is introduced to account for the fact that equation (1) is not exact, even if it is generally a good approximation for the collected current. By setting the fitting parameter b to unity, we would constrain the interpolation of our simulation results to have exactly the form prescribed in equation (1), while our goal here is to have the best analytic fit for currents computed in our simulations. In practice, depending on the parameters, our fits produce values for b ranging from approximately 0.7 to 1.1. We remark that our simulations do not account for a guard cylinder that would be supporting the probe, and be set to the same potential. For that reason, in order to approximate the effect of a guard, the probes are subdivided into five segments of equal lengths (10 mm each), and the current collected by one of the end segments is replaced by the current collected by the segment next to it. A correlation plot of fitted currents as a function of actual (simulation) collected currents in figure 1 shows the excellent agreement between fits and data. In most cases, the relative error in the fits is of order 1%, and the maximum relative error among all the cases, is under 4%. Given coefficients a, b and c for each of the 25 combinations of densities and temperatures, it is then possible to construct a data set with 4-tuples of currents corresponding to 4-tuples of bias voltages V b and arbitrary floating potentials V f in a range such that 0 Several increments have been tried between successive bias voltages, ranging between 0.75 to 1.5 V, and found to have relatively little impact on prediction accuracy. In this data set, bias voltages of 2, 3, 4, and 5 V are considered, with 21 uniformly distributed floating potentials in the range (−2, 2) V; thus forming a set of 25 × 21 = 525 entries or nodes. Thus, each entry in the data set consists of a 4-tuple of currents, along with associated density, temperature and floating potential. The second data set considered is constructed using the Langmuir software [32], which uses the fits reported in [33]. In this article, fits were constructed for a thin cylindrical probe in a wide range of non-dimensionalised plasma parameters. These can be used to predict the current per unit length along a probe, as well as the total current collected, for different ratios of probe length to the Debye length, and ratios between the probe voltage to the electron temperature. Prescriptions were also derived to approximate the effect of a guard, which would reduce or eliminate end effects on one end of a probe. The simulation results used for the fits in [33] were obtained using PTetra. Since the work in [33] made similar assumptions to those in the OML theory, except for the finite length of the probe, one would expect the results of Langmuir to approach those of OML for a cylinder as the probe length is increased. This is indeed the case, the worst-case discrepancy being less than 5% [32]. Further on, as the probe is shortened to less than the Debye length, one expects the collected current to be proportional to the probe voltage, similar to a spherical probe (though the exact current may not be known, since the effective spherical surface area may differ from the true surface area). This is also observed in [33], where the current-voltage characteristics for a probe shorter than the Debye length fits a function similar to (1), but with β ∼ 1. This model is used to generate a data set consisting of 10 000 4-tuples of currents for randomly distributed temperatures, densities, and floating potentials in the same range in parameter space, as assumed in the first data set. The temperature and floating potential are uniformly distributed within their ranges, whereas the density is logarithmically distributed (i.e. log n e is uniformly distributed). The bias voltages are also the same as in the first data set. Contrary to the first data set, however, this one does not account for a plasma flow, which is deemed negligible, due to the small drift velocity compared to electron thermal speed. Moreover, the data set is generated assuming an ideal (infinite) guard on one end of the probes.

Inference models
We now turn to the construction of models capable of inferring plasma parameters from measured currents. In addition to the methods already mentioned in the introduction, with which comparisons will be made, two approaches are presented. In model 1, parameters are obtained in part from the assumed relation between current and voltage in equation (1), and in part from multivariate regression. In model 2, inference is made directly using multivariate regression, without relying on any analytic scaling law between current and voltage. Since both models 1 and 2 make use of RBF regressions, we start with a brief presentation of the method.

Radial basis function regression (RBF)
Multivariate regression offers a general means of inferring dependent variables from scattered data in a multidimensional space. Among the several possible approaches, RBF was chosen for its relative simplicity and accuracy [6,20,35]. The method consists of a weighted superposition of functions of the 'radial' distances between points in a multidimensional space, where regression is to be made. Given a set of N reference nodes, or 'pivots' {(X i , Y i ), i = 1, N}, where X i and Y i are respectively independent and dependent vectors, and assuming an L 2 norm, or Euclidean metric for the distance between two points in X space, RBF regression consists of approximating Y for an arbitrary X, as where a j are regression coefficients, and G is a radial basis function. Regression coefficients a j can then be determined by requiring exact collocation at pivots; that is, by solving the set of linear equations The choice of the G function is arbitrary, with the only constraint that the equations in equation (5) be linearly independent from one another. The construction of an RBF regression model is done in two steps. The interpolation function and pivots are first set so as to best approximate dependent variables in a 'training set' in which X and Y are known. The trained model is then applied to a distinct 'validation set', not used in training, and covering the same range in parameter space. In each case, model prediction skill is assessed with a 'cost function' C, which vanishes if predictions match data values exactly, and increases with increasing discrepancies. Several functions have been tried for training, and G(x) = x 1.8 is used throughout because of the good results that it produces in our problem. Given a function G, the choice of pivots is critical in order to construct an accurate model. In our analysis, training and validation sets are subsets of a larger set, or solution library, constructed from kinetic simulations, as described in section 2.1. In training, given a function G, the objective is to distribute pivots in order to obtain the highest accuracy when applying the model to a data set in which both X and Y values are known. Different approaches have been proposed to achieve this task, including k-clustering [18], and Gaussian clustering [34]. Assuming a number N of pivots, and a number N of nodes in the training set, we adopt a straightforward strategy, consisting of trying all possible combinations of N pivots among N nodes in the training set, and selecting the distribution of pivots for which C is minimum. Two cost functions are used in this study, depending on the nature of the physical parameter being modeled. For the density, which varies over two orders of magnitude, we use the maximum relative error between predictions and actual values in a given data set. For the floating potential, which can vary continuously between negative and positive values, C is the maximum absolute error between predicted and actual values.

Model 1: analytic-regression based
The first model considered consists of three steps in which (a) the exponent β, (b) the floating potential and the temperature, and (c) the density are successively estimated. The starting point in (a) is the empirical relation between collected current and voltage in equation (1). By raising each side of the equation to the power α = 1/β, we obtain Given two currents I 1 , I 2 collected by two probes biased to voltages V 1 , and V 2 , it is straightforward to solve for V f + kT e /e in terms of the two voltages and currents and obtain Now, if a third probe is used, with bias voltage V 3 , collecting current I 3 , the following identity must be satisfied: since T e and V f are constants, independent of the bias voltages or currents. In this equation, only α is unknown, because the currents are measured, and the bias voltages are set by design of the instrument. It is then straightforward to solve for α, and hence β, using a standard numerical root finder. In step (b), given β, neglecting kT e /e, which is generally small compared to V f , equation (7) is used to make a first estimate of the floating potential, as where subscript 1 is used to label this first approximation of V f . This first estimate of V f can now be improved by regression to approximate the error in our first inference, δV f1 = V f1 − V f . This is carried out with RBF, using 4-tuples of measured currents as input, and as output, the known difference δV f1 = V f1 − V f between the first inference V f 1 , and the floating potential in our data set. This inferred correction is then used to construct a second inference of V f 2 with improved accuracy. Referring to equations (7) and (9), it is seen that the model of the correction δV f 1 also provides an estimate of the electron temperature: kT e /e ∼ δV f1 . Finally in step (c), given the estimates β, V f 2 , and T e , it is possible to make a first inference of the density n e1 analytically from equation (1). As a final improvement, the relative difference between n e1 and the known value from our data set, δn e1 = (n e1 − n e )/n e1 is modeled with RBF, again with the 4-tuples of currents as input. The modeled correction is then applied to n e1 to yield a further improved density estimate n e2 . In practice, the increase in accuracy between n e1 and n e2 is only modest, but n e2 is found to be better centered around the exact values. For that reason, only inferred n e2 is considered below.
To summarize, model 1 involves several steps consisting of analytic and regression estimates, from which the four parameters β, V f , T e , and n e are estimated from 4-tuples of currents obtained with four given bias voltages as the only input. It is noted that the procedure involving equations (8) and (9) only requires 3-tuples of currents and bias voltages. The RBF corrections to V f and n e however, are done using the four currents and voltages, owing to the fact that four parameters (β, V f , T e , and n e ) need to be determined. Results obtained with both three and four sets of collected currents are presented in section 3.3 below.

Model 2: direct RBF regression
In this approach, RBF regression is used to directly infer physical parameters, without relying on any analytic approximation for the currents and bias voltage, as in method 1. The advantage here is that the resulting models are unbiased relative to any approximate theory; that is, they are purely data-driven. The added challenge however, is that they must accurately reproduce the full dependence of parameters such as densities and satellite floating potentials, directly from 4-tuples of collected currents.

Assessment of model inference skills
In this section, the models proposed independently by Barjatya et al [2], and Hoang et al [15], and models 1 and 2 described in 2.4 and 2.5 are assessed with data sets obtained from kinetic simulations as described in section 2.1. For models 1 and 2, 200 randomly selected nodes are used for training, and the remaining 325 nodes are used for validation.
The models are assessed using different skill metrics. For the floating potential we define the error of a data point as V f,model − V f,actual , and compute the maximum absolute error (MAE) and the root mean square of the error (RMSE). In addition, we decouple RMSE 2 = µ 2 + σ 2 into a bias/offset µ and a (population) standard deviation σ of the error to more clearly identify to which extent the error is caused by a systematic offset or a less predictable spread. The offset µ is calculated as the average error, and σ is the standard deviation of the error in a given data set. For the density, because it spans several orders of magnitude, we use the relative error, defined as (n e,model − n e,actual )/n e,model . This is chosen rather than the more usual definition where relative errors are with respect to exact values, because, from an operational point of view, model inference is made for variables that are not known otherwise. It is therefore more convenient to assess margins of uncertainty with respect to prediction values, which are known, than with exact values, which are not known. With this, we report the maximum (absolute value of) the relative error (MRE) and the root mean square of the relative error (RMSrE) for the density. Again, we decompose RMSrE into the bias/offset µ r , and standard deviation σ r of the relative error. For consistency with past literature, we also report the Pearson correlation coefficient R for both the density and floating potential, although R is known to be close to unity even for relatively large errors [2].

Jacobsen et al's linear fit approach, with β = 0.5
Using the linear fit approach proposed in [21], and summarized in section 1, densities are inferred from the 4-tuples of currents in our solution library constructed from kinetic simulations described in section 2.1. The correlation plot of these results is shown in figure 2, with selected skill metrics. The linear fit inference is seen to significantly overestimate densities, by factors ranging from 3 to 9, relative to densities in our data set. It is interesting to note that each vertical cluster in the figure consists of 21 × 5 = 105 circles, which is the number of combinations in floating potentials and temperatures considered in the construction of the data set. It follows that the spread in inferred density, using this approach, is mainly caused by the spread in floating potentials and temperatures, which are not accounted for in this linear fit formalism.
The regularity in the discrepancies between inferred and data base densities, however, suggests that it should be possible to improve model predictions in this case, using regression. Thus, RBF was used to construct a model for the relative difference between predicted and data densities seen in figure 2. Three pivots were found to be optimal in this case, to shift the centroid of predicted densities close to the ideal  correlation curve, without over-fitting. This model was then applied to the validation set to assess the skill. As shown in figure 3, while the spread in the vertical clusters of circles remains significant, the centroid of the predicted densities is now much closer to the ideal correlation curve; resulting in a notable improvement in the skill metrics.

Nonlinear least squares fits
Nonlinear least squares, similar to those proposed by Barjatya et al and Hoang et al summarized in section 1 are now considered for determining the floating potential and electron density. This is done by considering three possible implementations consisting of (a) a 4-parameter fit using all four currents, (b) a 3-parameter fit using currents from the three largest bias voltages (to account for the possibility of a probe with negative voltage), and (c) a 3-parameter fit using currents from all four probes. In all cases, nonlinear least square fits are made using the Python library SciPy, to perform differential evolution optimization [38]. In both cases (b) and (c), the temperature appearing in equation (1) is set to the exact value from the solution library, in order to reduce the number of fitting parameters from four to three. This is similar to the approach taken by Barjatya et al who used estimated values of the temperature, and by Hoang et al who used estimates from the International Reference Ionosphere (IRI) [5], and EISCAT measurements [11], in their fits. By setting the temperature to its true value, these results produce the best possible fits with these approaches. Model skills are summarized in tables 2 and 3 for each case. In case (a), consistently with findings from [2], fitted temperatures are found to be very inaccurate, which explains the lower performance of the four-parameter fit approach compared to the other two. From the tables, cases (b) and (c) are seen to result in nearly identical skills, although three parameter fits from the four probe currents (case (c)) is found to be slightly more accurate. Inference skills obtained in case (c) are shown in figures 4 and 5 for the floating potential and density, respectively. Excellent agreement is seen for inferred V f compared to known values from our validation set, with a maximum absolute error of 0.101 V in the range (−2, 2) V of possible floating potentials. Densities are also modeled with good accuracy, with a maximum relative error of approximately ±59% over the (10 10 , 10 12 )m −3 range.

Model 1: analytic-regression based
Following the procedure outlined in section 2.4, which consists of several steps involving a root finder, analytic expressions, and regression, models were constructed with synthetic training and validation sets obtained from simulations. Correlation plots computed with the validation sets are shown in figure 6 for the floating potential V f 2 , and figure 7 for the density n e2 . Referring to tables 2 and 3, inferred floating potentials are seen to be slightly less accurate compared to those obtained from three-parameter fits, while inferred densities are more accurate. We must recall however, that in the three-parameter fits, known temperatures from the data sets were used, while in model 1, the temperature is one of the fitting parameters, and that its determination is rather inaccurate. Thus, in order to have a fair comparison between model 1 and the three-parameter fit approaches, model 1 inferences were made in which the known temperatures were used. The results in the tables show that model 1 predictions of the floating potential are now as accurate as those from the threeparameter fits, but that the density predictions are significantly more accurate. For either V f or n e , model 1 predictions, with or without specifying known values of T e , are also significantly more accurate than those obtained from a four-parameter fit, which would be required should an accurate measurement of the temperature not be available.

Model 2: direct RBF regression
The most straightforward model consists of using RBF regression directly to infer the floating potential and plasma density. This is done without any intermediate analytic steps, which results in models that are fully data-driven, and unbiased to analytic approximations. Considering that in this approach regression is expected to reproduce the full dependence of the variables of interest from 4-tuples of currents, as opposed to small corrections to estimates obtained by other means, it is not clear a priori, whether inferences should be more or less  accurate than those of method 1. Correlation plots are shown in figures 8 and 9 for inferred floating potentials and densities, respectively, using RBF with five pivots. While model predictions of V f follow the ideal correlation line in figure 8, with nearly the same slope and cluster centroids close to the ideal correlation line, their vertical spread is larger than in plots of V f from other models, and prediction skills are seen to be the lowest among all models considered. The situation is opposite for the density however, for which inferred densities show the best agreement with those from the validation data set. This is also clear from the skill metrics listed in table 2, which are the best  among the eight models considered. An interesting observation is that, with our training and validation data sets, the direct RBF approach produces the highest accuracy for the density, and the lowest one for the floating potential. This shows that different approaches may be better adapted to model different physical parameters. As a final remark, very little has been said so far about modeling the temperature. The reason is that all attempts have produced very scattered and inaccurate estimates of the temperature, whether with four-parameter fits, model 1, or with direct RBF. This is consistent with findings reported by Barjatya et al and Hoang et al and it is a consequence of the relatively weak dependence of collected currents on the temperature for these types of probes.

Application to other data sets
In this section method 2 is further tested by inferring densities in two data sets constructed independently from the one considered in the previous sections. The focus here on method 2 and density predictions is motivated in part by the original purpose of using m-NLPs, to infer densities independently of temperatures. Another reason is that method 2 is independent of a priori analytic expressions such as equation (1), which makes it applicable under more general conditions, in which equation (1) may not be a good approximation. The first set is constructed with the Langmuir program, in which the current collected by a probe is interpolated from numerically computed currents on a grid of non-dimensionalised plasma and probe parameters [32]. The second, experimental, consists of 4-tuples of currents measured in the Visions-2 rocket mission, with densities inferred using two independent techniques.

Blind test with Langmuir generated data
To better assess method 2, it has been applied to a second data set in a blind study, where one author (SM) assessed the skill of predictions made by the other authors. As mentioned in 2.1, this second data set consists of 10 000 4-tuples of currents for different plasma parameters. 200 of these were used for training, and yet another 800 were made available for quick assessments and experimentation during the training phase. The true plasma parameters behind the remaining 9000 4-tuples of currents were not seen by the experimenters, and were only used later to compute skill metrics of the predictions by the last author.
Correlation plots for predicted densities are shown in figure 10, with corresponding skill metrics included in table 3. With Langmuir, currents are calculated for a probe geometry, and plasma conditions different from what was assumed in section 2.1. The excellent correlation between given and inferred densities, with similar metrics to those seen in figure 9, is promising and provides strong support to the applicability of the method to experimental data. For comparison, predictions of the density from the Langmuir data set, using a Jacobsen et al's linear fit with β = 0.5 is also included. Compared with the linear fit, which largely overestimates the density with a 52% bias, model 2 predictions have the lowest bias and standard deviation; thus providing a significant improvement to the predicted density.

Application of RBF to visions-2 experimental data
Method 2, our best density inference technique, is also tested against experimentally inferred densities from the sounding rocket 35.039 of the Visions-2 mission. The m-NLP system aboard rocket 35.039 consisted of four cylindrical Langmuir probes of length 39 mm and diameter .51 mm, biased to 3, 4.5, 6 and 7.5 V [39]. The model is trained with synthetic data produced by the Langmuir library, and inferences of the density are compared with those obtained with two techniques used in this mission. The synthetic training data set of currents and densities was constructed for probes of diameter 0.51 mm, and length 40 mm. The same bias voltages of 3, 4.5, 6, and 7.5V were assumed as in the experiment. The data set was generated with randomly distributed electron densities in the range 10 10 -10 12 m −3 on a logarithmic scale and, using a linear scale, temperatures from 0.07 to 0.17 eV, and spacecraft floating potentials between −4 and −0.5 V. The RBF model was trained with 300 randomly selected currents and densities from a 10 000 node Langmuir data set, using five pivots. A comparison of inferred densities with those reported in the Visions-2 mission, as a function of time and altitude, is shown in figure 11. The two densities reported in the mission, shown in the figure, were obtained with Jacobsen's β = 0.5 linear fit, and the β = 0.8 non linear fit techniques.
As a final test of our proposed technique, RBF was trained with, and applied to experimental measurements. This is to ascertain whether the method is applicable to more general cases than those obtained with synthetic data constructed from simulations or computer models. To this end, the model was trained using five pivots and 300 randomly selected entries from the Visions-2 data set, consisting of 4-tuples of collected currents. In one case, the densities used in training and validation were inferred with Jacobsen's linear fit technique, while in the other case, they were inferred with a β = 0.8 nonlinear fit; both inferences being reported in the Visions-2 data set. A comparison between RBF inferences and experimental densities, including selected skill metrics, are shown in the two panels of figure 12 for the full Visions-2 data set. The ability of RBF to be trained with, and accurately reproduce densities inferred with these two different techniques, is yet Figure 11. Comparison between densities from Visions-2 data inferred with Jacobsen's linear fit, β = 0.8 nonlinear fit, and RBF regression trained with data consisting of 300 randomly selected nodes from the Langmuir model, using five pivots. another demonstration of the applicability of method 2 based exclusively on RBF regression. While the comparison made here cannot be used to ascertain the accuracy of either inference technique used in the experiment, it clearly shows that given accurately measured currents and densities, RBF can be used to construct high skill inference models for the density.

Summary and conclusion
New procedures are presented to infer a satellite floating potential and plasma density from currents collected with fixed-bias multi-needle Langmuir probes (m-NLP). The use of such probes was first considered by Jacobsen et al as a means of inferring plasma density, with high temporal and spatial resolution, independently of the electron temperature. Recognizing the limits of the OML approximation, with β = 0.5, linear and nonlinear least squares fit approaches have been developed in order to infer the plasma density from probe measurements. In this paper we revisited these procedures and introduced two alternatives to infer plasma parameters from low level (L1B) m-NLP measurements. The first method makes use of a generalized orbital-motion-limited (OML) scaling law (see equation (1)), combined with radial basis function (RBF) regression to correct discrepancies obtained analytically. The second method relies solely on RBF regression to infer the density and floating potential. In both cases, physical parameters of interest are inferred from 4-tuples of currents collected by as many probes biased to known voltages. With the first method, the intermediate analytic expressions involve the plasma temperature, which can be obtained from independent measurements if possible, or from the regression procedure itself. In all cases considered, the inference of temperature comes with significant uncertainties, consistent with findings from previous studies. These large uncertainties result from the weak dependence of collected currents on temperature which, as noted in previous studies, enables good quality inferences of the density, even with rough estimates of the temperature. With the second method, inference of both plasma density and satellite floating potential relies exclusively on RBF regression. The absence of analytic approximations in this case, implies that inferences are not biased to any a priori theory, and are therefore exclusively data-driven. The added challenge however, is that without being 'aided' by analytic approximations, regression now has to 'do all the work', resulting in possible accuracy loss for some parameters. The procedures reviewed and presented were assessed by applying them to a synthetic data set constructed with kinetic simulations, consisting of 4-tuples of collected currents with corresponding bias voltages, for a range of assumed densities, temperatures, and floating potentials. Although simulations do not account for the full complexity of processes at play near satellites in space, they do provide consistent data sets with known density, temperature and floating potential, from which inference algorithms can be tested. Model prediction skills were assessed graphically and quantitatively using adapted metrics. Consistently with what was reported by Hoang et al the linear fit approach leads to significant systematic overestimates of the density. We found that this can be corrected in part with RBF regression, to bring the centroid of predicted densities close to actual values. When inference is made using known (from the data sets) temperatures, all approaches provide good accuracy for the floating potential, but method 1 based on a combination of analytic scaling laws and regression is found to be appreciably more accurate. A loss of accuracy is noted in floating potentials resulting from method 1, and particularly from 4-parameter nonlinear fits, when temperatures are inferred from the model. Interestingly, this is not the case for predicted densities as, with model 1, inference accuracy is found to be slightly better when temperatures are calculated in the model. While the difference here is small (0.27 vs. 0.32 for the maximum relative error), this may result from the approximate nature of equation (1) assumed in method 1. Conversely, method 2 relying exclusively on RBF regression is found to have a relatively low inference skill for the floating potential, but an excellent one for the density. This indicates that an optimal strategy might require different algorithms to infer different physical parameters from a given instrument.
Finally, to show the applicability of direct RBF inference, to more general data sets, a proof-of-concept study was conducted using two additional and independent data sets. One was generated with the Langmuir code in which assumed plasma conditions and probe geometry were different from the ones assumed in our first assessments. The other consisted of currents measured experimentally in the Visions-2 mission, with densities inferred with two distinct inference algorithms. Although the true electron densities were unknown in this case, RBF produced results similar to those obtained when considering β = 0.8, which is believed to be more accurate than the original β = 0.5 method. Just as importantly, to show that RBF has the potential to accurately infer densities in experiments, given accurate training data (either through simulations or more accurate experimental methods), two RBF models were constructed by training on small subsets of the Visions-2 data with densities inferred from (a) Jacobsen's linear fit, and (b) the β = 0.8 nonlinear fit technique. It was then shown that RBF accurately reproduces densities, when trained with subsets of experimental data, independently of the experimental data analysis technique used to infer the density. These results provide strong evidence that direct RBF methods can be used to accurately infer densities from experimental data, given that the models are trained using sufficiently accurate data sets.
In conclusion, methods have been presented as promising to improve the accuracy of plasma density and satellite floating potentials inferred from m-NLP measurements. Our analysis shows that RBF alone (method 2) should be the preferred approach to infer densities from m-NLP measurements, whether training is made with synthetic simulation-based data or with data measured and validated experimentally. Conversely, based on our assessments made with data sets constructed with kinetic simulations, for which the empirical equation (1) is a good approximation, method 1 (combining regression and an empirical expression for the collected currents) should be preferred for inferring a satellite potential. More generally however, with different configurations of the probes relative to other satellite components, or plasma environment conditions, equation (1) might not accurately describe currents collected by the probes, which would then result in a loss of skill in inferences made with method 1. The possible variations on multivariate regression techniques and data sets are of course endless, and it would be possible to compare several more variants and data sets, which could of course lead to different results. While the determination of plasma density with m-NLP is not as straightforward as initially assumed on the basis of OML theory and the assumption of sufficiently long probes, this type of instrument offers interesting possibilities for measuring the density, as well a satellite floating potential. Part of our analysis is based on synthetic data generated with kinetic simulations, in which many processes at play in actual measurements are not accounted for. The results obtained are nonetheless sufficiently encouraging to motivate further computational and experimental studies with more physics, more detailed geometry, and broader expected space environment conditions, to support specific space missions.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.
being allowed to participate in this research, as well as Wojciech J Miloch for his support. The m-NLP experiment on Visions-2 and the University of Oslo participation in the Grand Challenge Initiative Cusp rocket campaign were funded through the Research Council of Norway Grant No. 275653. Thanks to Andres Spicher, Espen Trondsen, David Michael Bang-Hauge, and the Mechanical Workshop at the University of Oslo, Norway for the data. S M also thanks Andres Spicher for valuable discussions and input.