期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The social dilemma of big data: Donating personal data to promote social welfare

《Information and Organization》2023,33(1):100452

When using digital devices and services, individuals provide their personal data to organizations in exchange for gains in various domains of life. Organizations use these data to run technologies such as smart assistants, augmented reality, and robotics. Most often, these organizations seek to make a profit. Individuals can, however, also provide personal data to public databases that enable nonprofit organizations to promote social welfare if sufficient data are contributed. Regulators have therefore called for efficient ways to help the public collectively benefit from its own data. By implementing an online experiment among 1696 US citizens, we find that individuals would donate their data even when at risk of getting leaked. The willingness to provide personal data depends on the perceived risk level of a data leak but not on a realistic impact of the data on social welfare. Individuals are less willing to donate their data to the private industry than to academia or the government. Finally, individuals are not sensitive to whether the data are processed by a human-supervised or a self-learning smart assistant. 相似文献

2.

Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?

下载免费PDF全文

Hervé Cardot David Degras 《Revue internationale de statistique》2018,86(1):29-50

Principal component analysis (PCA) is a method of choice for dimension reduction. In the current context of data explosion, online techniques that do not require storing all data in memory are indispensable to perform the PCA of streaming data and/or massive data. Despite the wide availability of recursive algorithms that can efficiently update the PCA when new data are observed, the literature offers little guidance on how to select a suitable algorithm for a given application. This paper reviews the main approaches to online PCA, namely, perturbation techniques, incremental methods and stochastic optimisation, and compares the most widely employed techniques in terms statistical accuracy, computation time and memory requirements using artificial and real data. Extensions of online PCA to missing data and to functional data are detailed. All studied algorithms are available in the package onlinePCA on CRAN. 相似文献

3.

Innovations in job analysis: Development and application of metrics to analyze job data

《Human Resource Management Review》2006,16(3):310-323

Job analysis is an integral part of any human resource function. Recent advancements in technology and changing worker environments have drastically altered the means by which job analysis data are collected and stored. These changes have led to an increase in the amount of data that is collected and the potential for the data to inform complex decision making. However, due to a lack of tools available for configuring and analyzing data, human resource professionals are often unable to keep themselves abreast of changes in their workforce, make complex decisions using job data, and facilitate communication across jobs, job families or departments in their organization. As a result, advanced methods for analysis of job data are needed. Metrics are quantitative algorithms applied to job data that aid in decision making in areas such as recruitment, selection, transferability, promotion, training, and development. Metrics are a sophisticated, user-friendly approach to analyzing job data that have the potential to meet the needs of human resource professionals in today's dynamic workplace. The development of metrics, their application and benefit to human resource professionals, and their use of O^⁎NET are discussed. 相似文献

4.

MATLAB中三维数据可视化及应用 总被引：1，自引：0，他引：1

张晓利《价值工程》2011,30(24):143-143

MATLAB在三维数据可视化中的应用很灵活,数据较难理解。在介绍MATLAB的三维绘图指令的基础上,详细分析指令中的绘图数据含义,并给出相应的实例,目的在于对形式多样的数据理解提供有力帮助。相似文献

5.

On Two Existing Approaches to Statistical Analysis of Social Media Data

Martina Patone Li‐Chun Zhang 《Revue internationale de statistique》2021,89(1):54-71

Using social media data for statistical analysis of general population faces commonly two basic obstacles: firstly, social media data are collected for different objects than the population units of interest; secondly, the relevant measures are typically not available directly but need to be extracted by algorithms or machine learning techniques. In this paper, we examine and summarise two existing approaches to statistical analysis based on social media data, which can be discerned in the literature. In the first approach, analysis is applied to the social media data that are organised around the objects directly observed in the data; in the second one, a different analysis is applied to a constructed pseudo survey dataset, aimed to transform the observed social media data to a set of units from the target population. We elaborate systematically the relevant data quality frameworks, exemplify their applications and highlight some typical challenges associated with social media data. 相似文献

6.

Nonresponse in dynamic panel data models

Cheti Nicoletti 《Journal of econometrics》2006

To verify whether data are missing at random (MAR) we need to observe the missing data. There are only two exceptions: when the relationship between the probability of responding and the missing variables is either imposed by introducing untestable assumptions or recovered using additional data sources. In this paper, we briefly review the estimation and test procedures for selectivity in panel data. Furthermore, by extending the MAR definition from a static setting to the case of dynamic panel data models, we prove that some tests for selectivity are not verifying the MAR condition. 相似文献

7.

REAL‐TIME DATA AND FISCAL POLICY ANALYSIS: A SURVEY OF THE LITERATURE

下载免费PDF全文

Jacopo Cimadomo 《Journal of economic surveys》2016,30(2):302-326

This paper surveys the empirical research on fiscal policy analysis based on real‐time data. This literature can be broadly divided into four groups that focus on: (1) the statistical properties of revisions in fiscal data; (2) the political and institutional determinants of projection errors by governments; (3) the reaction of fiscal policies to the business cycle and (4) the use of real‐time fiscal data in structural vector autoregression (VAR) models. It emerges that, first, fiscal revisions are large and initial releases are biased estimates of final values. Secondly, strong fiscal rules and institutions lead to more accurate releases of fiscal data and smaller deviations of fiscal outcomes from government plans. Thirdly, the cyclical stance of fiscal policies is estimated to be more ‘counter‐cyclical’ when real‐time data are used instead of ex post data. Fourthly, real‐time data can be useful for the identification of fiscal shocks. Finally, it is shown that existing real‐time fiscal data sets cover only a limited number of countries and variables. For example, real‐time data for developing countries are generally unavailable. In addition, real‐time data on European countries are often missing, especially with respect to government revenues and expenditures. Therefore, more work is needed in this field. 相似文献

8.

PEDAKSI: Methodology for Collecting Data about Survey Non-Respondents

Lynn Peter 《Quality and Quantity》2003,37(3):239-261

The effects of unit non-response on survey errors are of great concern to researchers.However, direct assessment of non-response bias in survey estimates is rarely possible.Attempts are often made to adjust for the effects of non-response by weighting, but thisusually relies on the use of frame data or external population data, which are at bestmodestly correlated with the survey variables. This paper reports the development ofa method to collect limited survey data from non-respondents to personal interviewsurveys and a large-scale field test of the method on the British Crime Survey (BCS).The method is shown to be acceptable and low cost, to provide valid data, and to haveno detrimental effect on the main survey. The use of the resultant data to estimatenon-response bias is illustrated and some substantive conclusions are drawn for the BCS. 相似文献

9.

Comparing Inference Methods for Non‐probability Samples

下载免费PDF全文

Bart Buelens Joep Burger Jan A. van den Brakel 《Revue internationale de statistique》2018,86(2):322-343

Social and economic scientists are tempted to use emerging data sources like big data to compile information about finite populations as an alternative for traditional survey samples. These data sources generally cover an unknown part of the population of interest. Simply assuming that analyses made on these data are applicable to larger populations is wrong. The mere volume of data provides no guarantee for valid inference. Tackling this problem with methods originally developed for probability sampling is possible but shown here to be limited. A wider range of model‐based predictive inference methods proposed in the literature are reviewed and evaluated in a simulation study using real‐world data on annual mileages by vehicles. We propose to extend this predictive inference framework with machine learning methods for inference from samples that are generated through mechanisms other than random sampling from a target population. Describing economies and societies using sensor data, internet search data, social media and voluntary opt‐in panels is cost‐effective and timely compared with traditional surveys but requires an extended inference framework as proposed in this article. 相似文献

10.

基于元数据驱动的多策略通用数据迁移模型

袁满肖红凤《价值工程》2012,31(19):218-219

随着企业数据中心的建立,企业需要将不同数据源的数据迁移到数据中心。一种做法是采用商用迁移工具,另一种是针对不同的数据源开发专门迁移软件。随着企业业务及需求的变化必将导致数据源数据模型的改变,这两种迁移方式根本无法满足企业对数据迁移的灵活需求。本文在对数据迁移策略、映射模式、ETL以及元数据等技术研究的基础上提出了一种基于元数据驱动的通用数据迁移模型。当数据源发生变化时,只需要在映射模型中增加新的映射模式就能够适应数据源的变化。该模型的普适性已在大庆油田井下作业数据中心多策略数据迁移中得到了验证。相似文献

11.

Computer‐Aided Statistical Instruction—Multi‐Mediocre Techno‐Trash?

Carl J. Schwarz 《Revue internationale de statistique》2007,75(3):348-354

The very soul of statistics are data, but few students actually collect data as part of their statistical journey. The impediments to real data collection exercises are very real—they are logistically difficult to set up, expensive, and may not work because of extraneous events outside the control of the instructor. Computer‐aided laboratories are a way to bring many of the benefits of actual data collection to students at a fraction of the cost and can be easily controlled by the instructor. There are many computer‐aided modules available—indeed a search on Google gave over 1 million hits. Some modules are good but many are mediocre. What separates the gems from the trash? 相似文献

12.

Forecasting Earnings Using Geographical Segment Data: Some UK Evidence

Clare B. Roberts 《Journal of International Financial Management & Accounting》1989,1(2):130-151

This study examines the question of whether or not the geographical segment data disclosed by UK companies can be used to generate forecasts of earnings that outperform forecasts based upon past consolidated data. One year ahead forecasts of attributable earnings or net income before extraordinary items are generated for both geographical sales data combined with a consolidated attributable earnings to sales margin and segmental earnings data. The forecasts are based upon forecasts of changes in the GNP of individual countries, both with and without the addition of forecasted inflation rates. It is found that models based upon both geographical segment sales and segment earnings outperform the random walk and random walk plus drift consolidated models for the years 1981 to 1983. The difference in the sizes of the errors generated by the segment data based models and the consolidated data based models are significant in the majority of cases especially when the errors are truncated at 100%. However, there is no additional advantage in terms of forecast accuracy in using segment earnings data rather than segment sales data. 相似文献

13.

Regression analysis on serial dilution data from virus validation robustness studies

Nan van Geloven Eric A. Cator Hendrik P. Lopuhaä Mart P. Janssen† 《Statistica Neerlandica》2009,63(3):245-257

To ensure the safety of plasma-derived medicinal products, the Dutch Blood Supply Foundation (Sanquin) performs virus validation experiments. Data from these experiments are based on serial dilution assays. Regression analysis on assay data faces several problems: only a small number of data points are available, data contain censoring and are subject to sampling error. Furthermore, the process variability inherent to the experiments is not evident. In this paper we address these problems by introducing a regression model for serial dilution data and by analyzing how validation experiments and simulation techniques can help elucidate various sources of variability the experiments are subject to. These are then incorporated into the regression model. 相似文献

14.

Big data for cyber physical systems in industry 4.0: a survey

Li Da Xu 《Enterprise Information Systems》2019,13(2):148-169

ABSTRACT

With the technology development in cyber physical systems and big data, there are huge potential to apply them to achieve personalization and improve resource efficiency in Industry 4.0. As Industry 4.0 is the relatively new concept originated from an advanced manufacturing vision supported by the German government in 2011, there are only several existing surveys on either cyber physical systems or big data in Industry 4.0. In addition, there are much less surveys related to the intersection between cyber physical systems and big data in Industry 4.0. However, cyber physical systems are closely related to big data in nature. For example, cyber physical systems will continuously generate a large amount of data which requires the big data techniques to process and help to improve system scalability, security, and efficiency. Therefore, we conduct this survey to bring more attention to this critical intersection and highlight the future research direction to achieve the fully autonomy in Industry 4.0. 相似文献

15.

电子商务数据交换系统研究与设计

潘晓辉《价值工程》2011,30(24):127-129

随着世界经济全球化的发展,企业迫切需要在更大范围内实现资源共享与数据交换。由于企业的平台和数据模式各不相同,传统的数据交换方式已经不能适应现代电子数据交换的需要,基于可扩展标记语言XML和Internet的新一代数据交换系统是未来的发展方向。本文采用XML和J2EE等技术研究适合电子商务应用的数据交换系统。在数据交换中,发送端的企业首先将XML数据文档发送到数据交换平台,由平台转换成符合接收端的企业数据模式的XML文档后发送给接收端,从而实现了企业间异构数据的交换。相似文献

16.

Bayesian network models for incomplete and dynamic data

Marco Scutari 《Statistica Neerlandica》2020,74(3):397-419

Bayesian networks are a versatile and powerful tool to model complex phenomena and the interplay of their components in a probabilistically principled way. Moving beyond the comparatively simple case of completely observed, static data, which has received the most attention in the literature, in this paper, we will review how Bayesian networks can model dynamic data and data with incomplete observations. Such data are the norm at the forefront of research and in practical applications, and Bayesian networks are uniquely positioned to model them due to their explainability and interpretability. 相似文献

17.

建筑物沉降观测数据处理与分析

刘皓《价值工程》2014,(34):219-220

随着社会经济的快速发展,各式各样的建筑物大量增加,为了确保建筑物的安全,对建筑物进行沉降观测就变得必不可少,观测之后,对数据的处理至关重要,因为要用它来衡量建筑物是否安全的重要指标。以前是人工处理,即繁杂,又费力费时,还容易出现计算错误,影响数据结果的分析。现在虽然已有一些计数机软件可以进行处理数据,但是不是精度不够就是价格昂贵。本文以具体建筑物的沉降观测为实例,使用VC++语言和MATLAB 7.0程序,通过计数机对数据完成处理与分析。相似文献

18.

Models and techniques

J. de Leeuw 《Statistica Neerlandica》1988,42(2):91-98

相似文献

19.

Regression Analysis under Probabilistic Multi‐Linkage

Gunky Kim Raymond Chambers 《Statistica Neerlandica》2012,66(1):64-79

Linkage errors can occur when probability‐based methods are used to link records from two distinct data sets corresponding to the same target population. Current approaches to modifying standard methods of regression analysis to allow for these errors only deal with the case of two linked data sets and assume that the linkage process is complete, that is, all records on the two data sets are linked. This study extends these ideas to accommodate the situation when more than two data sets are probabilistically linked and the linkage is incomplete. 相似文献

20.

Revisiting IS research practice in the era of big data

Steven L. Johnson Peter Gray Suprateek Sarker 《Information and Organization》2019,29(1):41-56

Through building and testing theory, the practice of research animates data for human sense-making about the world. The IS field began in an era when research data was scarce; in today's age of big data, it is now abundant. Yet, IS researchers often enact methodological assumptions developed in a time of data scarcity, and many remain uncertain how to systematically take advantage of new opportunities afforded by big data. How should we adapt our research norms, traditions, and practices to reflect newfound data abundance? How can we leverage the availability of big data to generate cumulative and generalizable knowledge claims that are robust to threats to validity? To date, IS academics have largely welcomed the arrival of big data as an overwhelmingly positive development. A common refrain in the discipline is: more data is great, IS researchers know all about data, and we are a well-positioned discipline to leverage big data in research and teaching. In our opinion, many benefits of big data will be realized only with a thoughtful understanding of the implications of big data availability and, increasingly, a deliberate shift in IS research practices. We advocate for a need to re-visit and extend traditional models that are commonly used to guide much of IS research. Based on our analysis, we propose a research approach that incorporates consideration of big data—and associated implications such as data abundance—into a classic approach to building and testing theory. We close our commentary by discussing the implications of this hybrid approach for the organization, execution, and evaluation of theory-informed research. Our recommendations on how to update one approach to IS research practice may have relevance to all theory-informed researchers who seek to leverage big data. 相似文献