Search

Showing top 10 results for "AI training data"

Advances in private training for production on-device language models

… While training on-device models directly from user data effectively improves the utility performance for applications such as NWP and smart text selection , protecting the privacy of user data for model training is important. …

Feb 21, 2024

Croissant: a metadata format for ML-ready datasets

… However, these formats were designed for data discovery rather than for the specific needs of ML data, such as the ability to extract and combine data from structured and unstructured sources, to include metadata that would enable responsible use of the data, or to describe ML usage characteristics… …

Mar 6, 2024

Intervening on early readouts for mitigating spurious features and simplicity bi

… We evaluated our approach on standard benchmark datasets known to contain spurious correlations Waterbirds , CelebA , CivilComments , MNLI . Each of these datasets contain groupings of data that share an attribute potentially correlated with the label in a spurious manner. …

Feb 2, 2024

Using AI to expand global access to reliable flood forecasts

… Training data are daily streamflow values from the Global Runoff Data Center over the time period 1980 - 2023. A single streamflow forecast model is trained using data from 5,680 diverse watershed streamflow gauges shown below to improve accuracy . …

Mar 20, 2024

Cappy: Outperforming and boosting large multi-task language models with a small scorer

… This process involves creating a separate regression dataset specific to the downstream training data with the same data annotation process used to construct the pre-training data. …

Mar 14, 2024

Graph neural networks in TensorFlow

… Like most neural networks, a GNN is trained on a dataset of many labeled examples ~millions , but each training step consists only of a much smaller batch of training examples say, hundreds . …

Feb 6, 2024

A decoder-only foundation model for time-series forecasting

… This helps TimesFM understand the bigger picture and generalize better when provided with domain-specific contexts not seen during training. Zero-shot evaluation results We evaluate TimesFM zero-shot on data not seen during training using popular time-series benchmarks. …

May 8, 2024

Health-specific embedding tools for dermatology and pathology

… To evaluate this approach, we trained models on a downstream task using teledermatology data. Model training involved varying dataset sizes 12.5%, 25%, 50%, 100% to compare embedding-based linear classifiers against fine-tuning. …

Mar 8, 2024

Talk like a graph: Encoding graphs for large language models

… Additionally, GraphQA includes generating random graphs using various algorithms like Erdős-Rényi , scale-free networks , Barabasi-Albert model , and stochastic block model , as well as simpler graph structures like paths, complete graphs, and star graphs, providing a diverse set of data for traini… …

Mar 12, 2024

HEAL: A framework for health equity assessment of machine learning performance

… For example, the availability of health outcomes data in step 2 can inform the choice of demographic factors and brackets in step 1 , and the framework can be applied again with new datasets, models and populations. …

Mar 15, 2024

To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.

Followed topics